Googlebot on steroids... Why?

Olaf

We launched a new website (www.gelderlandgroep.com). The site contains 500 pages, but some pages (like https://www.gelderlandgroep.com/collectie/) contains filters (so there are a lot possible url parameters). Last week we mentioned a tremendous amount of traffic (25 GB!!) and CPU usage on the server.

2017-12-04 16:11:57 W3SVC66 IIS14 83.219.93.171 GET /collectie model=6511,6901,7780,7830,2105-illusion&ontwerper=henk-vos,foklab 443 - 66.249.76.153 HTTP/1.1 Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Build/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - - www.gelderlandgroep.com 200 0 0 9445 501 312

We find out that "Googlebot" was firing many, many requests. At first we did a nslookup for the IPadres where it actually seems to be googlebot.

Second we visited Google Searchconsole and I was really surprised... Googlebot on steroids? Googlebot requested 922.565 different url's and made combinations for every filter/ parameter combination on the site. Why? The sitemap.xml contains 500 url's... The authority of the site isn't very high, no other signal that this is a special website... Why so much "Google resources"?

Of course we will exclude the parameters in SearchConsole, but I never saw a Googlebot activity for a small website like this before! Does anybody have any clue?

Regards Olaf

searchconsole.png nslookup.png

Olaf

We got an answer from JohnMu - Webmaster Trends Analyst at Google. The reason of crawling is (as we find out) the filters which have infinite variations (one of developers was sleeping), we will correct this. Disallowing in Robot.txt is adviced as the quickest fix to stop the mega-crawling. This case will be used for further research because of the disproportionate capacity usage. You're right, Google initially will crawl everything, but they don't want Googlebot crawling looks like a "mini-Ddos-like attack".

seoman10

Glad to help!

The large volume could well be to do with the way the filters are set up. There is also a possibility you could be sending some sort of authority signal somehow to Google, for instance if it is using the same Search Console as other valued brands or same WHOIS information.

My gut feeling is after the initial crawl the traffic will reduce, if it doesn't, it probably means Google is finding something new to index, may be dynamically created pages?

Olaf

Thanks for your help!

I think you're probably right. The initial crawling must be complete if Google wants to put everything into the right perspective. But we manage en host more than 300 sites, including large A-brand sites. And even at those sites I had not seen this kind of volumes before.

The server logs also show the same amount of request this night (day five). I will keep you posted if this still continues after the weekend.

seoman10

As far as I know, Google will attempt to find every single page it can possibly find regardless of authority. The frequency after the initial crawl will be affected by the site authority, volume and frequency of updates.

Virtually every page on every website that is publicly accessible will be index and rank somewhere, where you rank will be determined by Google ranking factors.

Keep in mind that search console stats will be a few days out of date (2 or 3 days) and it will normally take two or three days to crawl.

Olaf

Mmm, is that correct? I thought that the amount of resources Google will put in crawling your (new) website also depends of it's authority. 9 million url's, for four days now... It seems to bee so much for this small website...

seoman10

I would say your filters are creating pages in their own right, or at least as Google bot sees it. I have seen a similar thing happen on a site redesign. Potentially, if you can access each filter with a URL that could be listed as an individual page, assuming the content is different.

The first time Google crawls your site, it will try to find everything it possibly can to put it in the index, Google will eat data like no tomorrow

At this stage I wouldn't be too worried about it, just keep an eye out for duplicate content. I guess you'll see both graphs dipped down again to normal levels within a few days.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Googlebot on steroids... Why?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt - Googlebot - Allow... what's it for?

Can Googlebots read canonical tags on pages with javascript redirects?

Why would our server return a 301 status code when Googlebot visits from one IP, but a 200 from a different IP?

Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.

Googlebot Can't Access My Sites After I Repair My Robots File

Manipulate Googlebot

Lots of incorrect urls indexed - Googlebot found an extremely high number of URLs on your site

Googlebot + Meta-Refresh