Rogerbot getting cheeky?
-
Hi SeoMoz,
From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20.
I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below).
I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO.
example 1
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659Please advise.
-
Hi BM7,
I'm going to open up a ticket on this to have our engineers take a closer look at your site. Once we have an overall response, I'll post it here for other community members to view.
Cheers!
-
Thanks Megan for your reply,
Will give that a try and have blocked 2 addresses so you are reduced to 2 crawler sessions. These two measures should reduce the load considerably as long as Rogerbot respects the 7 second delay.
IMHO ignoring the Crawl-Delay set by the webmaster of the site you are crawling, which crawlers are supposed to respect, is wrong. I got a Google WMT nasty for being down 5 hours due to Rogerbot as it was the middle of the night so only got restarted in the morning.
Also, my site has around 600 discrete pages of which you crawl about 500, so even at the original 10 seconds crawl delay you could do my whole site in less than 1.5 hours, which is only required once a week. So in my mind that suggests there is no need to overrule my settings in robots.txt 'so he (Roger) can complete the crawl'.
Regards,
-
Hi there,
This is Megan from the SEOmoz Help Team. I'm so sorry Rogerbot is causing you grief! This actually might be happening because your crawl delay is too long, so rogerbot just ends up ignoring it so he can complete the crawl. If you set your crawl delay to a max of 7, then it should solve your problem. If you're still running into issues, though, please send us a message to help@seomoz.org and we'll check it out asap!
Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Getting warning message when attach facebook account with my campaign
Hi I am getting warning message "Our access to this account will expire in 1¼ hours. Please reauthorize now to ensure that we can continue to collect data for this account." When i attach feacbook account with in campaign and social tab of seomoz application. can you tell me what is wrong here? Thank you.
Moz Pro | | Webworld_Norway0 -
Did Profiles on SEOmoz get de-indexed?
Hi fellow mozzers My profile page on SEOmoz has always ranked well for my full name "Jacob Eeckhout". Recently I noticed that it doesn't show anymore, even for the query site:seomoz.org "jacob eeckhout". Does anyone know what happened?
Moz Pro | | Jacobe0 -
Is there a report I can run to get a list of all pages indexed by Google for my website?
I want to get a CSV file of all the pages that are indexed by Google and other search engines so I can create and .htaccess file of 301 redirects
Moz Pro | | etraction0 -
What value am I getting from SEOmoz?
Okay, I've been using the trial version of SEOmoz for almost a week, but, I'm just not sure what to do with it to be honest. I'm not an SEO expert, so, a lot of the terminology and reports are confusing. So far the only thing I've found useful are the domain and page duplication errors, but, I can get the same info from Google Webmaster Tools. Am I missing something? We have a very simple site for a small business, not sure I can justify $99 per month for a service I don't really understand. Is this service more for SEO professionals than business owners? -Tom
Moz Pro | | TomHu0 -
What should be cols value if I want to get Backlinks?
Hi, I am forming below url to get backlinks. http://lsapi.seomoz.com/linkscape/url-metrics/".$trimurl."?Cols=2048 &AccessID=".$accessID." &Expires=".$expires." &Signature=".$urlSafeSignature; For Example, if I keep $trimurl = "www.tatvic.com/" , I get [uid] => 633. Is this a right way to get number of backlinks ? If not, what should be the 'Cols' value? Also, how can I ensure that the number of links I am getting is correct ? Is there any way to compare this number with Google search results? This is very essential to check as I got different number of backlinks on different APIs. Thank you.
Moz Pro | | Ravi_Pathak0 -
What do you get with mozpoints?
What is the point of collecting mozpoints? I read that you are able to purchase features, but what other perks are there with collecting mozpoints?
Moz Pro | | ReadyArtwork0 -
How do I get a MozRank?
Hi all, Hoping that one of you Guru's might be able to shed a little light for me please. we launched the online arm of our gold bullion business on the 21st of February and I signed up for an account here on the 23rd of Feb. I don't have a MozRank for my site yet and I'd love to get one. The mozbar that I installed shows o linkes from 0 root domains etc. but google webmaster can see links that are inbound to my site. My questions are: Do I have to wait the 45-60 days that I believe it might take SEOmoz to give me a rank- or is there a process that I manually kick off? Is there anything other than google webmaster that I should be looking at to try and make sure that I am on the right track; I'd hate to go 45-60 days in the wrong direction before realising there is an issue. thanks in advance, YGF
Moz Pro | | YGF0 -
How do I get back my archived campaigns??
Hi there, Can anyone help! I archived a campaign a month ago and now I want it back - can anyone help? How do I get back my archived campaign? Or do I need to start again?? Thanks Gareth
Moz Pro | | GAZ090