Would you rate-control Googlebot? How much crawling is too much crawling?
-
One of our sites is very large - over 500M pages. Google has indexed 1/8th of the site - and they tend to crawl between 800k and 1M pages per day.
A few times a year, Google will significantly increase their crawl rate - overnight hitting 2M pages per day or more. This creates big problems for us, because at 1M pages per day Google is consuming 70% of our API capacity, and the API overall is at 90% capacity. At 2M pages per day, 20% of our page requests are 500 errors.
I've lobbied for an investment / overhaul of the API configuration to allow for more Google bandwidth without compromising user experience. My tech team counters that it's a wasted investment - as Google will crawl to our capacity whatever that capacity is.
Questions to Enterprise SEOs:
*Is there any validity to the tech team's claim? I thought Google's crawl rate was based on a combination of PageRank and the frequency of page updates. This indicates there is some upper limit - which we perhaps haven't reached - but which would stabilize once reached.
*We've asked Google to rate-limit our crawl rate in the past. Is that harmful? I've always looked at a robust crawl rate as a good problem to have.
- Is 1.5M Googlebot API calls a day desirable, or something any reasonable Enterprise SEO would seek to throttle back?
*What about setting a longer refresh rate in the sitemaps? Would that reduce the daily crawl demand? We could set increase it to a month, but at 500M pages Google could still have a ball at the 2M pages/day rate.
Thanks
-
I agree with Matt that there can probably be a reduction of pages, but that aside, how much of an issue this is comes down to what pages aren't being indexed. It's hard to advise without the site, are you able to share the domain? If the site has been around for a long time, that seems a low level of indexation. Is this a site where the age of the content matters? For example Craigslist?
Craig
-
Thanks for your response. I get where you're going with that. (Ecomm store gone bad.) It's not actually an Ecomm FWIW. And I do restrict parameters - the list is about a page and a half long. It's a legitimately large site.
You're correct - I don't want Google to crawl the full 500M. But I do want them to crawl 100M. At the current crawl rate we limit them to, it's going to take Google more than 3 months to get to each page a single time. I'd actually like to let them crawl 3M pages a day. Is that an insane amount of Googlebot bandwidth? Does anyone else have a similar situation?
-
Gosh, that's a HUGE site. Are you having Google crawl parameter pages with that? If so, that's a bigger issue.
I can't imagine the crawl issues with 500M pages. A site:amazon.com search only returns 200M. Ebay.com returns 800M so your site is somewhere in between these two? (I understand both probably have a lot more - but not returning as indexed.)
You always WANT a full site crawl - but your techs do have a point. Unless there's an absolutely necessary reason to have 500M indexed pages, I'd also seek to cut that to what you want indexed. That sounds like a nightmare ecommerce store gone bad.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How Much Domain Age Matter In Ranking?
I am very confused about domain age. I read many articles about domain age, some experts say domain age does matter in ranking and some experts say it doesn't matter in the ranking. Kindly guide me about domain age.
Intermediate & Advanced SEO | | MuhammadQasimAttari0 -
Why is Amazon crawling my website? Is this hurting us?
Hi mozzers, I discovered that Amazon is crawling our site and exploring thousands of profile pages. In a single day it crawled 75k profile pages. Is this related to AWS? Is this something we should worry about or not? If so what could be a solution to counter this? Could this affect our Google Analytics organic traffic?
Intermediate & Advanced SEO | | Ty19860 -
Google can't access/crawl my site!
Hi I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings. [URL Errors: 1st photo] 8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up. The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages. After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked. Also when i go to WMT, and try to Fetch as Google the site, this is what i get: [Fetch as Google: 2nd photo] From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles). What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings? Thanks a lot
Intermediate & Advanced SEO | | granitgash
Granit FvhvDVR.png dKx3m1O.png0 -
How much does dirty html/css etc impact SEO?
Good Morning! I have been trying to clean up this website and half the time I can't even edit our content without breaking the WYSIWYG Editor. Which leads me to the next question. How much, if at all, is this impacting our SEO. To my knowledge this isn't directly causing any broken pages for the viewer, but still, it certainly concerns me. I found this post on Moz from last year: http://moz.com/community/q/how-much-impact-does-bad-html-coding-really-have-on-seo We have a slightly different set of code problems but still wanted to revisit this question and see if anything has changed. I also can't imagine that all this broken/extra code is helping our page load properly. Thanks everybody!
Intermediate & Advanced SEO | | HashtagHustler0 -
Improve Response Rate
Hi fellow Mozzers, I have recently been trying to earn new links for my company websites by appealing to local public libraries that line up with our geographic footprint. We have store locations in 569 city, state combinations, so I am finding the local library for each city, state and asking them to add us to their website as a community reference for child care. Here is my pitch email that I am sending out.... Hi There, _ I hope this email finds you in good health and good spirits. _ My name is XXXXX and I represent <insert company="" name="">. I am reaching out to local libraries in the cities and counties where our schools are located in an effort to request local support for our schools. Many libraries across the country have reached out to us asking for parent resources and child care options in their local city. I would like to extend the same value to you.</insert> _ I would love to have our school in <insert city="" name="">be considered for placement on your public library site found at <insert target="" website="" url="">. With ample parent resources and child related materials, our school would be a great fit for your local community. We have many great parenting articles, blogs and local child care options in <insert city="" name="">.</insert></insert></insert>_ Our local school website can be found at : Our parent resource section can be found at: Who do you suggest I speak with regarding earning a placement on your site? Any information would be greatly appreciated. So far I have a response rate of about 1% and of those that do respond, none are willing to add a link on their site in any capacity since our company is privately held and not a public school option. Can anyone suggest either a better way to approach link earning on a city by city basis or help me improve my current initial email to get a better response? I really appreciate it! p.s.... some of my other link targets are city run websites, local chamber of commerce chapters (suspect this will be hard since most require a paid membership), local public school sites, and keyword targeted lists such as "child care in phoenix, az" to see what sites come for each term. Advice on these and anything I may be missing would be of great value.
Intermediate & Advanced SEO | | dsinger0 -
How to avoid content canibalizm? How do I control which page is the landing page?
Hi All, To clarify my question I will give an example. Let's assume that I have a laptop e-commerce site and that one of my main categories is Samsung Laptops. The category page shows lots of laptops and a small section of text. On the other hand, in my article section I have a HUGE article about Samsung Laptops. If we consider the two word phrases each page is targeting then the answer is the same - Samsung Laptops. On the article i point to the category page using anchor such as "buy samsung laptops" or "samsung laptops" and on the category page (my wishful landing page) I point to the article with "learn about samsung laptops" or "samsung laptops pros and cons". Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
When doing email blogger outreach what response rate do you expect?
How many emails do you send in the average outreach campaign? How many links would you expect from that? Also - when doing email blogger outreach do you need to offer the blogger something other than (great) content, in order to get a link? (maybe cash, or a link back?) I'm doing email blogger outreach for a number of clients and types of content, but am finding it hard to get links from bloggers. Any help is appreaciated! Thanks
Intermediate & Advanced SEO | | kevinmorley0 -
We would like to know where googlebots search from - IP location?
We have a client who has two sites for different countries, 1 US, 1 UK and redirects visitors based on IP. In order to make sure that the English site is crawable, we need to know where the googlebot searches from. Is this a US IP or a UK IP for a UK site / server?
Intermediate & Advanced SEO | | AxonnMedia0