Google is indexing blocked content in robots.txt
-
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
-
I think you will find that the URL´s in Google´s index are either:
- indexed prior to putting in the robots.txt disallow in place - check in the google serp and click on "in cache" to see the date.
- Heavily linked to by other external domains.
- Both of the above.
@cleverphd has a great solution. Follow that.
-
This will sound backwards but it works.
-
Add the meta noindex tag to all pages you want out of the index.
-
Take those same pages out of the robots.txt and allow them to be crawled.
The meta noindex tells Google to remove the page from the index. It is preferred over using robots.txt
http://moz.com/learn/seo/robotstxt
The robot.txt - blocks Google from crawling the page, but things can still show up if there are other pages linking to the page you are trying to remove.
http://www.youtube.com/watch?v=KBdEwpRQRD0
You have to allow Google to crawl the pages (by taking them out of the robots.txt) so it can read the noindex meta tags that then tell Google to take them out of the index.
-
-
Thank you, but that is not the problem. The file robots.txt is done since a long time ago.
-
It seems you have added/modified Robot.txt file later. Wait for some time, Say 15 days.
Also ensure syntax for robot.txtRegards,
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How long does Google/Bing take to index
Hello we have 2-3 new pages being submitted every night to google/bing via our sitemap. Two issues I am noticing. Wondering if anyone else has same issues. a) 22 URL submitted via sitemap but only 1 indexed in two weeks. there are no errors showing b) If i submit manually using "Fetch As Google" and request indexing - the page gets indexed right way but after a day it seems to be unindexed - it will show up when i search (site:domain.com) but then disappear from the results doing the same search a few days later. Is this normal or do i have a problem that needs addressing? thank you
Technical SEO | | sancarlos0 -
Robots.txt vs. meta noindex, follow
Hi guys, I wander what your opinion is concerning exclution via the robots.txt file.
Technical SEO | | AdenaSEO
Do you advise to keep using this? For example: User-agent: *
Disallow: /sale/*
Disallow: /cart/*
Disallow: /search/
Disallow: /account/
Disallow: /wishlist/* Or do you prefer using the meta tag 'noindex, follow' instead?
I keep hearing different suggestions.
I'm just curious what your opinion / suggestion is. Regards,
Tom Vledder0 -
How to block text on a page to be indexed?
I would like to block the spider indexing a block of text inside a page , however I do not want to block the whole page with, for example , a noindex tag. I have tried already with a tag like this : chocolate pudding chocolate pudding However this is not working for my case, a travel related website. thanks in advance for your support. Best regards Gianluca
Technical SEO | | CharmingGuy0 -
Robots.txt
I have a client who after designer added a robots.txt file has experience continual growth of urls blocked by robots,tx but now urls blocked (1700 aprox urls) has surpassed those indexed (1000). Surely that would mean all current urls are blocked (plus some extra mysterious ones). However pages still listing in Google and traffic being generated from organic search so doesnt look like this is the case apart from the rather alarming webmaster tools report any ideas whats going on here ? cheers dan
Technical SEO | | Dan-Lawrence0 -
Would Google Call These Pages Duplicate Content?
Our Web store, http://www.audiobooksonline.com/index.html, has struggled with duplicate content issues for some time. One aspect of duplicate content is a page like this: http://www.audiobooksonline.com/out-of-publication-audio-books-book-audiobook-audiobooks.html. When an audio book title goes out-of-publication we keep the page at our store and display a http://www.audiobooksonline.com/out-of-publication-audio-books-book-audiobook-audiobooks.html whenever a visitor attempts to visit a specific title that is OOP. There are several thousand OOP pages. Would Google consider these OOP pages duplicate content?
Technical SEO | | lbohen0 -
Quickest way to remove content from Google index?
We had some content on our own website indexed by Google and the content was changed later. But that content is still showing up in Google results. Of course because it was indexed. Its very important for us that content should not show up in Google. So how to remove that content quickly from Google Index? I know normally when it crawl again it will show new content. Google url removal tool or Google url fetch ? or anything else?
Technical SEO | | Personnel_Concept0 -
Robots.txt versus sitemap
Hi everyone, Lets say we have a robots.txt that disallows specific folders on our website, but a sitemap submitted in Google Webmaster Tools that lists content in those folders. Who wins? Will the sitemap content get indexed even if it's blocked by robots.txt? I know content that is blocked by robot.txt can still get indexed and display a URL if Google discovers it via a link so I'm wondering if that would happen in this scenario too. Thanks!
Technical SEO | | anthematic0 -
Mobile Google Not Indexing Mobile Website
Google currently does not index our mobile website. It has the WWW website in it's index. When a user from a mobile phone clicks on a mobile search result for WWW we redirect them to our mobile website. This is posing problems for us as our mobile website is a fraction of the # of pages/sections as our WWW. So for example, mobile search results show that we have a "careers" section; but that's not the case for the mobile website. As a result a user gets a 404. How do we force mobile Google to index our mobile website instead of our WWW?
Technical SEO | | RBA0