Googlebot found an extremely high number of URLs on your site
-
I keep getting the "Googlebot found an extremely high number of URLs on your site" message in the GWMT for one of the sites that I manage.
The error is as below-
Googlebot encountered problems while crawling your site.
Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.
I understand the nature of the message - the site uses a faceted navigation and is genuinely generating a lot of duplicate pages. However in order to stop this from becoming an issue we do the following;
- No-index a large number of pages using the on page meta tag.
- Use a canonical tag where it is appropriate
But we still get the error and a lot of the example pages that Google suggests are affected by the issue are actually pages with the no-index tag.
So my question is how do I address this problem?
I'm thinking that as it's a crawling issue the solution might involve the no-follow meta tag.
any suggestions appreciated.
-
I feel we are missing some information here.
For example, for our site we have done a canonical on the pages where we have query parameters. We have also specified these parameters as representative URL in Google Webmaster - URL parameters. Even after this we received this message "Googlebot found an extremely high number of URLs on your site".
The surprising thing is that these parameters are existing on the site for a long time, and the total URL count is reducing. Even after this Google has started sending this message to us since Feb 2014. Seems there has been some algorithmic change because of which some additional conditions that have not been highlighted in this thread have to be taken care of.. Not sure what..
-
Although I generally find NOINDEX works better than Google claims, I think @donford is essentially right - you still need to solve some of the architecture issues, or Google will attempt to re-crawl.
It's a complex problem, and sometimes a combination of NOINDEX, canonical, 301s, 404s, rel=prev/next, etc. all come into play. You don't usually need a "perfect" solution, but one tool rarely fits all situations these days.
Google has suggested that you try parameter handling in GWT. NOINDEX won't prevent crawling (just indexation), but GWT parameters help save crawler bandwidth. I've had mixed results on large sites, honestly, but it may be worth a try.
-
I was afraid that this might be the case.
Thanks for the help.
-
Hi Ben,
You are attempting to fix your SEO issue by using NOINDEX & CANONICAL but you are not fixing the main issue which is the URL's are still there.
NOINDEX will not stop Google from recognizing the link nor will NOFOLLOW. They actually use every link's information in one form or another regardless of the tag attributes.
Here is a direct quote from Matt Cutts about NOINDEX:
"Our highest duty has to be to our users, not to an individual webmaster. When a user does a navigational query and we don’t return the right link because of a NOINDEX tag, it hurts the user experience (plus it looks like a Google issue).....
REF: http://www.mattcutts.com/blog/google-noindex-behavior/
The first solution I would be interested in is working on the architecture of the site to see if there is a way to stop the crazy amount of URL's being generated and/or consolidate them to a single point. The next step would be to see if there is any commonality between these extra URL's and if there is any possibility to use a 301 redirect to consolidate these extra urls.
I think what you're really after was a way to fix this with a tag or patch, but I think the best way to fix this is to replace the engine that is driving these URL's. You're going to have to be a bit more specific in such case as to what kind of site you're using (Joomla, WordPress, Oscommerce, etc) for a more specific answer.
Hope it helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
All URLs in the site is 302 redirected to itself
Hi everyone, I have a problem with a website wherein all URLs (homepage, inner pages) are 302 redirected. This is based on Screaming Frog crawl. But the weird thing is that they are 302 redirected to themselves which doesn't make any sense. Example:
Intermediate & Advanced SEO | | alex_goldman
https://www.example.com.au/ is 302 redirected to https://www.example.com.au/ https://www.example.com.au/shop is 302 redirected to https://www.example.com.au/shop https://www.example.com.au/shop/dresses is 302 redirected to https://www.example.com.au/shop/dresses Have you encountered this issue? What did you do to fix it? Would be very glad to hear your responses. Cheers!0 -
Migrating From Parameter-Driven URL's to 'SEO Friendly URL's (Slugs)
Hi all, hope you're all good and having a wonderful Friday morning. At the moment we have over 20,000+ live products on our ecomms site, however, all of the products are using non-seo friendly URL's (/product?p=1738 etc) and we're looking at deploying SEO friendly url's such as (/product/this-is-product-one) etc. As you could imagine, making such a change on a big ecomms site will be a difficult task and we will have to take on A LOT of content changes, href-lang changes, affiliate link tests and a big 301 task. I'm trying to get some analysis together to pitch the Tech guys, but it's difficult, I do understand that this change has it's benefits for SEO, usability and CTR - but I need some more info. Keywords in the slugs - what is it's actual SEO weight? Has anyone here recently converted from using parameter based URL's to keyword-based slugs and seen results? Also, what are the best ways of deploying this? Add a canonical and 301? All comments greatly appreciated! Brett
Intermediate & Advanced SEO | | Brett-S0 -
Significantly reducing number of pages (and overall content) on new site - is it a bad idea?
Hi Mozzers - I am looking at new site (not launched yet) - it contains significantly fewer pages than the previous site - 35 pages rather than 107 before - content on the remaining pages is plentiful but I am worried about the sudden loss of a significant "chunk" of the website - significantly cutting the size of a website must surely increase the risks of post-migration performance problems? Further info - the site has run an SEO contract with a large SEO firm for several years. They don't appear to have done anything beyond tinkering with homepage content - all the header and description tags are the same across the current website. 90% of site traffic currently arrives on the homepage. Content quality/volume isn't bad across most of the current site. Thanks in advance for your input!
Intermediate & Advanced SEO | | McTaggart0 -
New site now links disappearing in Open Site Explorer and GWT
We launched a new site at the beginning of December 2012 and carefully 301'd all URLs from the old site to the new (custom CMS on old site wordpress on new). Our rankings have slipped quite badly but the most worrying thing is that we used to have about 1200 backlinks according to GWT/OSE before the new site launched and now we're down to about 30. Can anyone help shed some light on this please? The site is www.littleoneslondon.co.uk A few things that might help: 1. We were getting a lot of links through our job feeds (it's a nanny recruitment site) on indeed and trovitt, for some reason no new ones from these have appeared in site explorer and all the old jobs are gone completely. 2. We had 1000s of not found errors in google webmaster tools and once these were redirected and marked as fixed this is when the links disappeared. 3. We are getting quite a few 504 errors on the site due to an old proxy redirect (/blog was hosted on a different server on the old site and has not been removed yet), this will be fixed tomorrow but could this be a factor? 4. The developer seems to have redirected all the links through wordpress directly some how (I don't see any redirect plugins but there are lots of pages called 'redirect'). There are no references in the htaccess file for any redirects other than from the /blog folder that the wordpress instance sits in. Sorry for the long post, I hope I've given any details you'd need and I really appreciate any help anyone can give. Thanks, Karl
Intermediate & Advanced SEO | | Bdig0 -
My site rank is not consistent. Once it at first page , then for the next week it is not found in top 100 position. Again two/ three weeks later it ranked automatically without any work. Why this is happening?
Here's the following are available in my site: robot.txt file is included sitemap available Natural link building going on. in a week total 100 links we are creating. 30 social bookmarks, 30 directory submission, 20 blog comments, 20 forum links All the blog and forum links are from relevant sources. Please help me ..
Intermediate & Advanced SEO | | coldfireinc0 -
Redirect micro-niche site to bigger niche site?
I have a micro niche site that performs reasonably well (page 1 at least) for it's main keywords. It is an exact match domain. To save the ongoing maintenance of a site that gets less than 10 visitors a day, I was thinking of redirecting this micro niche site to a bigger site (a niche site that the micro niche fits into, if that makes sense!) Would I lose rankings because of the power that the EMD provided? Would it be better keeping it there for the backlink it provides to the bigger site (although on the same C Class IP)
Intermediate & Advanced SEO | | BigMiniMan0 -
Does URL format affect Keyword effectiveness for a URL?
I am looking at our site structure, and don't want to have to rebuild the way the site was linked together based on it's current folder structure so I am wondering what option would work better for our URL structure. I will uses car categories as an example of what I am talking about, but you can insert any category structure you like. For example I would like to have pages like this: www.example.com/ford-convertibles
Intermediate & Advanced SEO | | SL_SEM
www.example.com/chevy-convertibles But instead due to the site structure I will need to have pages like this: www.example.com/ford/convertibles
www.example.com/chevy/convertibles But wonder if I shouldn't do the following to ensure the proper phrase is known for the page: www.example.com/ford/ford-convertibles
www.example.com/chevy/chevy-convertibles The "/ford/ford-convertibles" just seems odd to me as a human, but I haven't seen anything on how well a keyphrase in a URL split by /'s does and I know dashes for phrases are fine. This means I am inclined to go with the"/ford/ford-convertibles"style because it keeps the keyphrase separated by dashes even if it is a bit repetitive. There will be other pages too like "/ford/top-10-fords-ever" but I don't wonder about that since it isnt "ford/ford-xxxxx" Thoughts on whether /'s in a keyphrase are as good as dashes?0 -
High Ranking Site with tons of junk on the server
Hey All, So to make a long story short, we own a site that has been passed through many hand and many strategies. We are in the financial field and rank high for many relevant search terms. My job is now to audit/optimize and purge out site of the garbage that has collected over the years (since 2002). During the audit I have found many issues, fized them, but I am not sure own how to proceed with the follwing issues. Any advice to solve the following would be greatly appreciated! 9932 orphan files - does just removing them affect my SEO.. I like a clean house, can I somehow use them to my benefit? Hundreds of 404s with many external "follow" links that we are no longer getting juice from 8 Sitelinks in webmaster tools, but only 4 show in our search I am straight n00b so sorry if this is 101 for anyone you but your input would be greatly appreciated!! Thanks!
Intermediate & Advanced SEO | | deuce1s0