Manipulate Googlebot

semer

**Problem: I have found something wierd on the server log as below. the googlebot visit the folders and files which do not exist at all. there is no photo folder on the server, but googlebot visit the files inside the photo folder and return 404 error. **

I wonder if it is SEO hacking attempts, and how can someone manage to Manipulate Googlebot.

==================================================

**66.249.71.200 - - [22/Aug/2012:02:31:53 -0400] "GET /robots.txt HTTP/1.0" 200 2255 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **

**66.249.71.25 - - [22/Aug/2012:02:36:55 -0400] "GET /photo/pic24.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:03 -0400] "GET /photo/pic20.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:11 -0400] "GET /photo/pic22.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:28 -0400] "GET /photo/pic19.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.26 - - [22/Aug/2012:02:37:36 -0400] "GET /photo/pic17.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.71.200 - - [22/Aug/2012:02:37:44 -0400] "GET /photo/pic21.html HTTP/1.1" 404 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" **

Igal_Zeifman

Hi

This is a valid concert.

As Mat correctly stated, Googlebot is not easily manipulated.
Having said that, Googlebot impersonation is a sad fact.

Recently we released a Fake Googlebot study in which we've found out that 21% of all Googlebot visits are made by different impersonators - fairly "innocent" SEO tools used for competition check-ups, various spammer and even malicious scanner that will use Googlebot user-agent to try and slip in between the cracks and lay a path for a more serious attack to come (DDoS, IRA and etc).

To identify your visitor can use Botopedia's "IP check tool" - it will cross-verify the IP and help reveal most fake bots.
(I`ve already searched for 66.249.71.25 and it's legit - see attached image)

Still, IPs can be spoofed.
So, if in doubt, I would promote a "better safe than sorry" approach and advise you to look into free bad bot protection services (there are several good ones).

GL

TBHPK

matbennett

If anyone did manage to get control of googlebot they could find better uses to put it to than that.

Much more likely is that there are links somewhere to those URLs - they may well be on someone else's site. Google is following the link to see what it there, then finding nothing. However it works on a file by file basis rather than by directory so it could happen quite a bit.

If you want to stop it clogging up your error logs (and ensure that googlebot cycles are spent indexing better stuff) just block that directory in your robots.txt file.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Manipulate Googlebot

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Suggested Screaming Frog configuration to mirror default Googlebot crawl?

How does Googlebot evaluate performance/page speed on Isomorphic/Single Page Applications?

Mobile Googlebot vs Desktop Googlebot - GWT reports - Crawl errors

Would you rate-control Googlebot? How much crawling is too much crawling?

Received "Googlebot found an extremely high number of URLs on your site:" but most of the example URLs are noindexed.

Does Googlebot Read Session IDs?

Googlebot found an extremely high number of URLs on your site

Fetch as GoogleBot "Unreachable Page"