Rogerbot getting cheeky?
-
Hi SeoMoz,
From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20.
I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below).
I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO.
example 1
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659Please advise.
-
Hi BM7,
I'm going to open up a ticket on this to have our engineers take a closer look at your site. Once we have an overall response, I'll post it here for other community members to view.
Cheers!
-
Thanks Megan for your reply,
Will give that a try and have blocked 2 addresses so you are reduced to 2 crawler sessions. These two measures should reduce the load considerably as long as Rogerbot respects the 7 second delay.
IMHO ignoring the Crawl-Delay set by the webmaster of the site you are crawling, which crawlers are supposed to respect, is wrong. I got a Google WMT nasty for being down 5 hours due to Rogerbot as it was the middle of the night so only got restarted in the morning.
Also, my site has around 600 discrete pages of which you crawl about 500, so even at the original 10 seconds crawl delay you could do my whole site in less than 1.5 hours, which is only required once a week. So in my mind that suggests there is no need to overrule my settings in robots.txt 'so he (Roger) can complete the crawl'.
Regards,
-
Hi there,
This is Megan from the SEOmoz Help Team. I'm so sorry Rogerbot is causing you grief! This actually might be happening because your crawl delay is too long, so rogerbot just ends up ignoring it so he can complete the crawl. If you set your crawl delay to a max of 7, then it should solve your problem. If you're still running into issues, though, please send us a message to help@seomoz.org and we'll check it out asap!
Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do we get "Removal of "nofollow" from first custom URL on profile" when we cross 200 Moz Points? I have not received it yet, anything I can do?
Though I have only recently subscribed to Moz Pro, I have been using Moz Blog for quite some time. I recently crossed 200 Moz Points. As per Moz Points, it says "Removal of "nofollow" from first custom URL on profile" for crossing 200 points. I still dont see any links from Moz when I am using OSE. Can anyone suggest what i need to do?
Moz Pro | | vinodh-spintadigital2 -
Domain authority get down significantly. Internal MOZ Issue? Google Algoritm Update?
I've noticed that several websites have lost DA since middle of october. I attached a picture. Could be this a MOZ internal issue like happened a time ago? Not only the DA get down, I have lost C-blocks links, too
Moz Pro | | NachoRetta
I check in http://health.moz.com/ and found this Crawl Services Outage: http://health.moz.com/incidents/fbn00m7xxzx2
But not appear to have direct relation with my question.
Or may be a google algoritm update? 3anVE5J.png2 -
How To Ques. Getting ranked on page one for a keyword when you compete with bigger websites/companies/stores
Can David Beat Goliath. I work with small businesses with top products that are up against big brands and their online presence. If I am working with them to create content that meets the needs of all their stakeholders/customers/prospects to generate revenue I wonder if keyword targeting with content can really pay off to get them page one, #1 position ranking. So I ask you this question? How do you create a story for a small online store that can get ranked on page one for a keyword when you compete with bigger websites (or sites with higher domain authority)? I don't need all the basics, I'm just looking for a key insight or tip that you have found or heard is working for a David to beat a Goliath (and hold their position rank once they get highly ranked). We are up against sites - for viable keywords -who have higher domain authority and in some cases more content or link backs. Also, I've notice in situations when I do get to page one and I'm in position 7 MOZ analytics show low to no traffic coming from it? Yikes, what do I do to improve that? These are top keywords.
Moz Pro | | brandawakening0 -
Why am I getting different ranking results on Google?
Because the rank checker on SEOMOZ is temporarily down for upgrade, I followed the link over to the other suggested rank checker on seobook.com which installs a plug-in to Firefox and it seems to be working fine….BUT my question is when I check out my two highest traffic keywords which are “Whitby holiday cottages” and “Whitby cottages” against my website www.endeavourcottage.co.uk it tells me I'm ranking position 2 on both Google.com and Google.co.uk but I don't think I am. When I do the same check using a proxy server called hidemyass.com to check my rankings again on Google.co.uk and .com I appeared to be lower down the rankings like position 8. Does anyone have any idea of which is the accurate reference to my rankings? What do you do to check your rankings ? And one last quick secondary question is when I searching Google chrome using same key words again I come at the top position A on the Google maps listing but using Firefox or Internet Explorer there is no Google maps visible ways that? At the end of the day out rankings are important but where are working towards more traffic...... Cheers
Moz Pro | | whitbycottages0 -
Does Rogerbot respect the robots.txt file for wildcards?
Hi All, Our robots.txt file has wildcards in it, which Googlebot recognizes. Can anyone tell me whether or not Rogerbot recognizes wildcards in the robots.txt file? We've done a Rogerbot site crawl since updating the robots.txt file and the pages that are set to disallow using the wildcards are still showing. BTW, Googlebot is not crawling these pages according to Webmaster Tools. Thanks in advance, Robert
Moz Pro | | AC_Pro0 -
I have another Duplicate page content Question to ask.Why does my blog tags come up as duplicates when my page gets crawled,how do I fix it?
I have a blog linked to my web page.& when rogerbot crawls my website it considers tags for my blog pages duplicate content.is there any way I can fix this? Thanks for your advice.
Moz Pro | | PCTechGuy20120 -
How do I get back my archived campaigns??
Hi there, Can anyone help! I archived a campaign a month ago and now I want it back - can anyone help? How do I get back my archived campaign? Or do I need to start again?? Thanks Gareth
Moz Pro | | GAZ090 -
Most of the time getting error.
Hi, i am getting this error most of the time in linkscape since last month. Sorry dude, no inlinks found matching this criteria. Pl guide is this a bug and the sites I am trying to use linkscape for were having lot of pages crawled earlier by SEOMOZ. Thanks, Preet
Moz Pro | | PreetSibia0