Rogerbot getting cheeky?
-
Hi SeoMoz,
From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20.
I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below).
I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO.
example 1
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659Please advise.
-
Hi BM7,
I'm going to open up a ticket on this to have our engineers take a closer look at your site. Once we have an overall response, I'll post it here for other community members to view.
Cheers!
-
Thanks Megan for your reply,
Will give that a try and have blocked 2 addresses so you are reduced to 2 crawler sessions. These two measures should reduce the load considerably as long as Rogerbot respects the 7 second delay.
IMHO ignoring the Crawl-Delay set by the webmaster of the site you are crawling, which crawlers are supposed to respect, is wrong. I got a Google WMT nasty for being down 5 hours due to Rogerbot as it was the middle of the night so only got restarted in the morning.
Also, my site has around 600 discrete pages of which you crawl about 500, so even at the original 10 seconds crawl delay you could do my whole site in less than 1.5 hours, which is only required once a week. So in my mind that suggests there is no need to overrule my settings in robots.txt 'so he (Roger) can complete the crawl'.
Regards,
-
Hi there,
This is Megan from the SEOmoz Help Team. I'm so sorry Rogerbot is causing you grief! This actually might be happening because your crawl delay is too long, so rogerbot just ends up ignoring it so he can complete the crawl. If you set your crawl delay to a max of 7, then it should solve your problem. If you're still running into issues, though, please send us a message to help@seomoz.org and we'll check it out asap!
Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting spam Links pointing to our wrong url, what to do?
Hey Mozzers, Looking in my Google Search Console (Webmaster Tools), I'm getting links pointing to bogus pages on my website that result in a 404. What does one do so you can tell Google that it has been "fixed"? Do i just 301 it to another website? If I add it to my disavow list, does Google remove the error in my webmaster tools? Thank you!
Moz Pro | | Shawn1240 -
Getting my top keywords separated out in SEOmoz reports
I am using the standard functionality to produce weekly Moz reports. There does not seem to be a setting to show rankings of my most important keywords. It would be nice to have those high-level keywords on the first page. For example, I have 200 keywords in an account. I want the report to show on a page my 10 most important keywords. Is there a way to set up a Label to my keywords in order to product a report page just for those keywords?
Moz Pro | | clicktoshop0 -
Rogerbot does not catch all existing 4XX Errors
Hi I experienced that Rogerbot after a new Crawl presents me new 4XX Errors, so why doesn't he tell me all at once? I have a small static site and had 9 crawls ago 10 4XX Errors, so I tried to fix them all.
Moz Pro | | inlinear
The next crawl Rogerbot fount still 5 Errors so I thought that I did not fix them all... but this happened now many times so that I checked before the latest crawl if I really fixed all the errors 101%. Today, although I really corrected 5 Errors, Rogerbot digs out 2 "new" Errors. So does Rogerbot not catch all the errors that have been on my site many weeks before? Pls see the screenshot how I was chasing the errors 😉 404.png0 -
Rogerbot not showing in logs
Hi All Rogerbot has recently thrown up 403 errors for all our pages - no changes had been made to the site so I asked our ISP for assistance. They wanted to have a look at what rogerbot was doing and so went to the logs but rogerbot was not listed anywhere in the logs by name - any ideas why? Regards Craig
Moz Pro | | CraigWiltshire0 -
Getting your site totally indexed by SEOMOZ
Hi guys! Ijust started using SEOMOZ software and wondered how it could be that my site has over 10.000 pages but in the Pro Dashboard it only indexed about 1500 of them. I've been waiting a few weeks now but the number has been stable ever since. Is there a way to get the whole site indexed by SEOMoz software? Thanks for your answers!
Moz Pro | | ssiebn70 -
<bs>Will someone give me a "thumbs up" so I can become an Authority and get my SEOMOZ T-Shirt?</bs>
<bs>I have helped many people (probably 100s) here in the forums. My fingers are swollen and I can't answer questions right now. I want my SEOmoz t-Shirt for becoming an Authority, but I'm a few points shy. Any help would be appreciated.</bs> Watch people give me a thumbs down. LOL
Moz Pro | | Francisco_Meza3 -
Can I get a list of all links on a given domain?
Sorry, this is actually kind of a tripartite question: I was looking at the Competitive Link Analysis on one my clients' campaigns. Sometime between June and September their total links went up by about 120,000. We have no idea where those links came from (although the numbers would indicate that they're mostly internal). Question 1: In none of the other tools can I figure out how to list these links on a domain level. Is there a way to get a list of all links for our given domain? I've been playing around with the page-by-page and even that doesn't show me everything. For example, I'm looking at OSE for their homepage and it lists 45 links for a page that it claims has 151 total. Question 2: How did it pick those 45 to display out of the 151 possible? If these are only external links, why do half of them come from one of our subdomains? Also... Question 3: If our client hasn't made any major changes recently, why has the number of internal links gone up so dramatically? Thanks.
Moz Pro | | MackenzieFogelson1 -
How are our competitors getting these inbound linking domains?
I'm currently managing SEO for my company's website, and I'm getting into link building for the first time. As part of the process, I'm using Open Site Explorer to see who's linking into our competitor sites, to get a better sense of what's available to us in our particular avenue of e-commerce. However, I'm finding that our competitors are getting inbound links from high-authority sites pretty far afield from selling jewelry - census.gov, parallels.com, warnerbros.com, and others. I try clicking through to these links, but each link starts a download of a file. I've seen .f4v, .7z, and .apk files listed as inbound links to our competitor. How is this happening? Again, I'm new to link building, so there may be a simple answer here, and if so I apologize for asking. However, this seems really strange to me, and a difficult situation to confront.
Moz Pro | | jozaksut0