Pages getting into Google Index, blocked by Robots.txt??
-
Hi all,
So yesterday we set up to Remove URL's that got into the Google index that were not supposed to be there, due to faceted navigation... We searched for the URL's by using this in Google Search.
site:www.sekretza.com inurl:price=
site:www.sekretza.com inurl:artists=So it brings up a list of "duplicate" pages, and they have the usual: "A description for this result is not available because of this site's robots.txt – learn more."
So we removed them all, and google removed them all, every single one.
This morning I do a check, and I find that more are creeping in - If i take one of the suspecting dupes to the Robots.txt tester, Google tells me it's Blocked. - and yet it's appearing in their index??
I'm confused as to why a path that is blocked is able to get into the index?? I'm thinking of lifting the Robots block so that Google can see that these pages also have a Meta NOINDEX,FOLLOW tag on - but surely that will waste my crawl budget on unnecessary pages?
Any ideas?
thanks.
-
Oh, ok. If that's the case, pls don't worry about those in the index. You can get them removed using remove URL feature in webmaster tools account.
-
It doesn't show any result for the "blocked page" when I do that in Google.
-
Hi,
Please try this and let us know the results:
Suppose this is one of the pages in discussion:
http://www.yourdomain.com/blocked-page.html
Go to Google, type the following along with double quotes. Replace with the actual page:
"yourdomain.com/blocked-page.html" -site:yourdomain.com
-
Hi!
From what I could tell, it wasn't that many pages already in the index, so it could be worth trying to lift the block, at least for a short while, to see if it will have an impact.
In addition - how about configuring how GoogleBot should threat your URLs via the URL parameter tool in Google Webmaster Tools. Here's what Google has to say about this. https://support.google.com/webmasters/answer/1235687
Best regards,Anders
-
Hi Devanur.
What I'm guessing is the problem here, is that as of now, GoogleBot is restricted from accessing the pages (because of robots.txt), leading to it never going into the page and updateing its index regarding the "noindex, follow" declaration in the that seems to be in place.
One other thing that could be considered, is to add "rel=nofollow" to all the faceted navigation links on the left.
Fully agreeing with you on the "crawl budget" part
Anders
-
Hi guys,
Appreciate your replies, but as far as I checked last time, if the URL is blocked by a Robots.txt file, it cannot read the Meta Noindex, Follow tag within the page.
There are no external references to these URL's, so Google is finding them within the site itself.
In essence, what you are recommending is that I lift the robots block and let google crawl these pages (which could be infinite as it is faceted navigation).
This will waste my crawl budget.
Any other ideas?
-
Anderss has pointed out to the right article. With robots.txt blocking, Google bot will not do the crawl (link discovery) from within the website but what if references to these blocked pages are found else where on third-party websites? This is the case you have been into. So to fully block Google from doing the link discovery and indexing these blocked pages, you should go in for the page-level meta robots tag to block these pages. Once this is in place, this issue will fade away.
This issue has been addressed many times here on Moz.
Coming to your concern about the crawl budget. There is nothing to worry about this as Google will not crawl those blocked pages while its on your website as these are already been blocked using robots.txt file.
Hope it helps my friend.
Best regards,
Devanur Rafi
-
Hi!
It could be that that pages has already been indexed before you added the directives to robots.txt.
I see that you have added the rel=canonical for the pages and that you now have noindex,follow. Is that recently added? If so, it could be wise to actually let GoogleBot access and crawl the pages again - and then they'll go away after a while. Then you could add the directive again later. See https://support.google.com/webmasters/answer/93710?hl=en&ref_topic=4598466 for more about this.
Hope this helps!
Anders -
For example:
http://www.sekretza.com/eng/best-sellers-sekretza-products.html?price=1%2C1000Is blocked by using:
Disallow: /*price=.... ?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Reviews not pulling through to Google My Business page
OK, a local SEO question! We are working with a plumbing company. A search for (Google UK) shows the knowledge panel with 20+ reviews. This is good! However, if you search for "plumbers norwich" and look at the map, thecompany is on the third page and has no reviews. I've logged into Google My Business, and it says the profile is not up to date and only 70% complete with no reviews. This is odd, as there was a fully complete profile recently. Any ideas on how best to reconcile the two? Thanks!
Intermediate & Advanced SEO | | Ad-Rank1 -
Google indexed wrong pages of my website.
When I google site:www.ayurjeewan.com, after 8 pages, google shows Slider and shop pages. Which I don't want to be indexed. How can I get rid of these pages?
Intermediate & Advanced SEO | | bondhoward0 -
Big discrepancies between pages in Google's index and pages in sitemap
Hi, I'm noticing a huge difference in the number of pages in Googles index (using 'site:' search) versus the number of pages indexed by Google in Webmaster tools. (ie 20,600 in 'site:' search vs 5,100 submitted via the dynamic sitemap.) Anyone know possible causes for this and how i can fix? It's an ecommerce site but i can't see any issues with duplicate content - they employ a very good canonical tag strategy. Could it be that Google has decided to ignore the canonical tag? Any help appreciated, Karen
Intermediate & Advanced SEO | | Digirank0 -
Google Is Indexing The Wrong Page For My Keyword
For a long time (almost 3 mounth) google indexing the wrong page for my main keyword.
Intermediate & Advanced SEO | | Tiedemann_Anselm
The problem is that each time google indexed another page each time for a period of 4-7 days, Sometimes i see the home page, sometimes a category page and sometimes a product page.
It seems though Google has not yet decided what his favorite / better page for this keyword. This is the pages google index: (In most cases you can find the site on the second or third page) Main Page: http://bit.ly/19fOqDh Category Page: http://bit.ly/1ebpiRn Another Category: http://bit.ly/K3MZl4 Product Page: http://bit.ly/1c73B1s All links I get to the website are natural links, therefore in most cases the anchor we got is the website name. In addition I have many links I get from bloggers that asked to do a review on one of my products, I'm very careful about that and so I'm always checking the blogger and their website only if it is something good, I allowed it. also i never ask for a link back (must of the time i receive without asking), and as I said, most of their links are anchor with my website name. Here some example of links that i received from bloggers: http://bit.ly/1hF0pQb http://bit.ly/1a8ogT1 http://bit.ly/1bqqRr8 http://bit.ly/1c5QeC7 http://bit.ly/1gXgzXJ Please Can I get a recommendation what should you do?
Should I try to change the anchor of the link?
Do I need to not allow bloggers to make a review on my products? I'd love to hear what you recommend,
Thanks for the help0 -
What Sources to use to compile an as comprehensive list of pages indexed in Google?
As part of a Panda recovery initiative we are trying to get an as comprehensive list of currently URLs indexed by Google as possible. Using the site:domain.com operator Google displays that approximately 21k pages are indexed. Scraping the results however ends after the listing of 240 links. Are there any other sources we could be using to make the list more comprehensive? To be clear, we are not looking for external crawlers like the SEOmoz crawl tool but sources that would be confidently allow us to determine a list of URLs currently hold in the Google index. Thank you /Thomas
Intermediate & Advanced SEO | | sp800 -
Site Indexed by Google but not Bing or Yahoo
Hi, I have a site that is indexed (and ranking very well) in Google, but when I do a "site:www.domain.com" search in Bing and Yahoo it is not showing up. The team that purchased the domain a while back has no idea if it was indexed by Bing or Yahoo at the time of purchase. Just wondering if there is anything that might be preventing it from being indexed? Also, Im going to submit an index request, are there any other things I can do to get it picked up?
Intermediate & Advanced SEO | | dbfrench0 -
Why are new pages not being indexed, and old pages (now in robots.txt) remain in the index?
I currently have a site that was recently restructured, causing much of its content to be reposted, creating new URL's for each page. To avoid duplicates, all of the existing pages were added to the robots file. That said, it has now been over a week - I know Google has recrawled the site - and when I search for term X, it is stil the old page that is ranking, with the new one nowhere to be seen. I'm assuming it's a cached version, but why are so many of the old pages still appearing in the index? Furthermore, all "tags" pages (it's a Q&A site, like this one) were also added to the robots a few months ago, yet I think they are all still appearing in the index. Anyone got any ideas about why this is happening, and how I can get my new pages indexed?
Intermediate & Advanced SEO | | corp08030 -
Sitemap not indexing pages
My website has about 5000 pages submitted in the sitemap but only 900 being indexed. When I checked Google Webmaster Tools about a week ago 4500 pages were being indexed. Any suggestions about what happened or how to fix it? Thanks!
Intermediate & Advanced SEO | | theLotter0