Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
High Competitive Keyword
I am working on a keyword which has 72 difficulty score, I have tried getting quality links with high DA, also there is no any on page issue, landing page is loading very fast and fixed all site loading issues, the keyword is stuck on 3rd page of Google and shuffling between 20 to 30 but not ranking on 1st page, I need expert opinions on it.
Intermediate & Advanced SEO | | UmairGadit0 -
How to get a large number of urls out of Google's Index when there are no pages to noindex tag?
Hi, I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down. The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag. I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler. Any ideas? Thanks! Best... Michael P.S., All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.
Intermediate & Advanced SEO | | 945010 -
Weird Site is linking to our site and links appears to be broken
I have got a lot of weird links indexed from this page: http://kzs.uere.info/files/images/dining-table-and-2-upholstered-chairs.html When clicking the link it shows 404. Also, the spam score is huge. What do you guys suggest to do with this?
Intermediate & Advanced SEO | | Miniorek
Could it be done by somebody to get our rankings down or domain penalized? Best Regards
Mike & Alex0 -
New site, new URL, lots of custom content. Load it all or "trickle" it over time?
New site, new URL, lots of custom content. Load it all or "trickle" it over time? Would it make a difference in terms of ranking the site? Interested in your thoughts. Thanks! BBuck!
Intermediate & Advanced SEO | | BBuck0 -
Canonical Issue with urls
I saw some urls of my site showing duplicate page content, duplicate page title issues on crawl reports. So I have set canonical url for every urls , that has dupicate content / page title. But still SeoMoz crawl test is showing issue. I am giving here one url with issue. The below given urls shown duplicate content and duplicate page title with some other urls all are given below. Checked URL http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7635 dup page content http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622&category_id=270&colors=Black_Tones&click=colors&ci=1
Intermediate & Advanced SEO | | trixmediainc
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622 dup page Title http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7636&category_id=270&sizes=12x15,12x18&click=sizes
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7636
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622&category_id=270&colors=Black_Tones&click=colors&ci=1
http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622 But I have set canonical url for all these urls already , that is :- http://www.cyrusrugs.com/bridge-traditional-area-rug-item-7622 This should actually solve the problem right ? Search engine should identify the canonical url as original url and only should consider that. Thanks0 -
Any Suggestions For My Site?
I've recently started a website that is based on movie posters. The site has fundamentally been built for users and not SEO but I'm wondering if anyone can see any problems or just general advice that may help with our SEO efforts? The "content" on the website are the movie posters. I know Google likes text content, but I don't see what else we could add that wouldn't be purely for SEO. My site is: http://www.bit.ly/ZSPbTA
Intermediate & Advanced SEO | | whispertera0 -
My site still out of rank
Hello, I am working on a site for past 3 months, here are the problems with this site, 1. It had a forum full of spam becuase initially captcha was not included, 10000 spam backlinks 2. Affiliate page was also hit by spam about 4000 spam backlinks which were either not existing or porn etc.... 3. Too many internal links which were indexed, these additional links were generated due to tags, ids, filters etc. Existing SEO team decided to remove the forum and after 30 days they blocked it in robots. But within 30 days the site moved from 3rd page to no where. Now after few days lator internal links are also cleaned by putting following in the robots, Dissallow: / *? Dissallow: / *id Dissallow: / *tag Links are now cleaning up, all the spam and bad links are now put into disavow file and sent to google via disavow tool. On daily bases good quality links are been produced such as through content, article submission, profile linking, Bookmarks etc. The site is still not any where on top 50 results. The impressions are decreasing, traffic also do not rise as much. How do you see all this situation. What do you suggest and how long do you think it will take to return to top 10 when good linking is being done and all preventive measures are being taken. I would appreciate any feedback on it. Thank you. Site URL: http://www.creativethemes.net keywords: magento themes, magento templates
Intermediate & Advanced SEO | | MozAddict0 -
Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site
Hi, Any assistance would be greatly appreciated. Basically, our rankings and traffic etc have been dropping massively recently google sent us a message stating " Googlebot found an extremely high number of URLs on your site". This first highligted us to the problem that for some reason our eCommerce site has recently generated loads (potentially thousands) of rubbish urls hencing giving us duplication everywhere which google is obviously penalizing us with in the terms of rankings dropping etc etc. Our developer is trying to find the route cause of this but my concern is, How do we get rid of all these bogus urls ?. If we use GWT to remove urls it's going to take years. We have just amended our Robot txt file to exclude them going forward but they have already been indexed so I need to know do we put a redirect 301 on them and also a HTTP Code 404 to tell google they don't exist ? Do we also put a No Index on the pages or what . what is the best solution .? A couple of example of our problems are here : In Google type - site:bestathire.co.uk inurl:"br" You will see 107 results. This is one of many lot we need to get rid of. Also - site:bestathire.co.uk intitle:"All items from this hire company" Shows 25,300 indexed pages we need to get rid of Another thing to help tidy this mess up going forward is to improve on our pagination work. Our Site uses Rel=Next and Rel=Prev but no concanical. As a belt and braces approach, should we also put concanical tags on our category pages whereby there are more than 1 page. I was thinking of doing it on the Page 1 of our most important pages or the View all or both ?. Whats' the general consenus ? Any advice on both points greatly appreciated? thanks Sarah.
Intermediate & Advanced SEO | | SarahCollins0