Google is indexing blocked content in robots.txt
-
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
-
I think you will find that the URL´s in Google´s index are either:
- indexed prior to putting in the robots.txt disallow in place - check in the google serp and click on "in cache" to see the date.
- Heavily linked to by other external domains.
- Both of the above.
@cleverphd has a great solution. Follow that.
-
This will sound backwards but it works.
-
Add the meta noindex tag to all pages you want out of the index.
-
Take those same pages out of the robots.txt and allow them to be crawled.
The meta noindex tells Google to remove the page from the index. It is preferred over using robots.txt
http://moz.com/learn/seo/robotstxt
The robot.txt - blocks Google from crawling the page, but things can still show up if there are other pages linking to the page you are trying to remove.
http://www.youtube.com/watch?v=KBdEwpRQRD0
You have to allow Google to crawl the pages (by taking them out of the robots.txt) so it can read the noindex meta tags that then tell Google to take them out of the index.
-
-
Thank you, but that is not the problem. The file robots.txt is done since a long time ago.
-
It seems you have added/modified Robot.txt file later. Wait for some time, Say 15 days.
Also ensure syntax for robot.txtRegards,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sudden Indexation of "Index of /wp-content/uploads/"
Hi all, I have suddenly noticed a massive jump in indexed pages. After performing a "site:" search, it was revealed that the sudden jump was due to the indexation of many pages beginning with the serp title "Index of /wp-content/uploads/" for many uploaded pieces of content & plugins. This has appeared approximately one month after switching to https. I have also noticed a decline in Bing rankings. Does anyone know what is causing/how to fix this? To be clear, these pages are **not **normal /wp-content/uploads/ but rather "index of" pages, being included in Google. Thank you.
Technical SEO | | Tom3_150 -
Google Search console says 'sitemap is blocked by robots?
Google Search console is telling me "Sitemap contains URLs which are blocked by robots.txt." I don't understand why my sitemap is being blocked? My robots.txt look like this: User-Agent: *
Technical SEO | | Extima-Christian
Disallow: Sitemap: http://www.website.com/sitemap_index.xml It's a WordPress site, with Yoast SEO installed. Is anyone else having this issue with Google Search console? Does anyone know how I can fix this issue?1 -
Is my website indexed correctly in Google - www.couponshop.co.uk
Our website www.couponshop.co.uk has just had a relaunch after a change of direction.
Technical SEO | | LaurenGT
A lot of the pages were redirected. When I checked the indexing of the website on Google, I put site:couponshop.co.uk and only two pages come up, but when I put site:www.couponshop.co.uk they all show up.
Is this correct or are we doing something wrong?0 -
Google only indexed 19/94 images
I'm using Yoast SEO and have images (attachments) excluded from sitemaps, which is the recommended method (but could this be wrong?). Most of my images are in my posts; here's the sitemap for posts: https://edwardsturm.com/post-sitemap.xml I also appear on p1 for some good keywords, and my site is getting organic traffic, so I'm not sure why the images aren't being indexed. Here's an example of a well performing article: https://edwardsturm.com/best-games-youtube-2016/ Thanks!
Technical SEO | | Edward_Sturm0 -
Google+ Contibutor to: Link To Main Domain or Content Page?
Which is the best practice for the link to claim authorship for a guest post? I have tried both the main domain URL in the "contributor to" section of my Google plus and the page URL where the post is and both show my picture when testing in the Structured Data Testing Tool. Which is best to use? Thanks in advance.
Technical SEO | | WSIDW0 -
Site blocked by robots.txt and 301 redirected still in SERPs
I have a vanity URL domain that 301 redirects to my main site. That domain does have a robots.txt to disallow the entire site as well. However, for a branded enough search that vanity domain still shows up in SERPs and has the new Google message of: A description for this result is not available because of this site's robots.txt I get why the message is there - that's not my , my question is shouldn't a 301 redirect trump this domain showing in SERPs, ever? Client isn't happy about it showing at all. How can I get the vanity domain out of the SERPs? THANKS in advance!
Technical SEO | | VMLYRDiscoverability0 -
Is my robots.txt file working?
Greetings from medieval York UK 🙂 Everytime to you enter my name & Liz this page is returned in Google:
Technical SEO | | Nightwing
http://www.davidclick.com/web_page/al_liz.htm But i have the following robots txt file which has been in place a few weeks User-agent: * Disallow: /york_wedding_photographer_advice_pre_wedding_photoshoot.htm Disallow: /york_wedding_photographer_advice.htm Disallow: /york_wedding_photographer_advice_copyright_free_wedding_photography.htm Disallow: /web_page/prices.htm Disallow: /web_page/about_me.htm Disallow: /web_page/thumbnails4.htm Disallow: /web_page/thumbnails.html Disallow: /web_page/al_liz.htm Disallow: /web_page/york_wedding_photographer_advice.htm Allow: / So my question is please... "Why is this page appearing in the SERPS when its blocked in the robots txt file e.g.: Disallow: /web_page/al_liz.htm" ANy insights welcome 🙂0 -
I am Posting an article on my site and another site has asked to use the same article - Is this a duplicate content issue with google if i am the creator of the content and will it penalize our sites - or one more than the other??
I operate an ecommerce site for outdoor gear and was invited to guest post on a popular blog (not my site) for a trip i had been on. I wrote the aritcle for them and i also will post this same article on my website. Is this a dup content problem with google? and or the other site? Any Help. Also if i wanted to post this same article to 1 or 2 other blogs as long as they link back to me as the author of the article
Technical SEO | | isle_surf0