Rogerbot getting cheeky?
-
Hi SeoMoz,
From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20.
I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below).
I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO.
example 1
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659Please advise.
-
Hi BM7,
I'm going to open up a ticket on this to have our engineers take a closer look at your site. Once we have an overall response, I'll post it here for other community members to view.
Cheers!
-
Thanks Megan for your reply,
Will give that a try and have blocked 2 addresses so you are reduced to 2 crawler sessions. These two measures should reduce the load considerably as long as Rogerbot respects the 7 second delay.
IMHO ignoring the Crawl-Delay set by the webmaster of the site you are crawling, which crawlers are supposed to respect, is wrong. I got a Google WMT nasty for being down 5 hours due to Rogerbot as it was the middle of the night so only got restarted in the morning.
Also, my site has around 600 discrete pages of which you crawl about 500, so even at the original 10 seconds crawl delay you could do my whole site in less than 1.5 hours, which is only required once a week. So in my mind that suggests there is no need to overrule my settings in robots.txt 'so he (Roger) can complete the crawl'.
Regards,
-
Hi there,
This is Megan from the SEOmoz Help Team. I'm so sorry Rogerbot is causing you grief! This actually might be happening because your crawl delay is too long, so rogerbot just ends up ignoring it so he can complete the crawl. If you set your crawl delay to a max of 7, then it should solve your problem. If you're still running into issues, though, please send us a message to help@seomoz.org and we'll check it out asap!
Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Rogerbot blocked by cloudflare and not display full user agent string.
Hi, We're trying to get MOZ to crawl our site, but when we Create Your Campaign we get the error:
Moz Pro | | BB_NPG
Ooops. Our crawlers are unable to access that URL - please check to make sure it is correct. If the issue persists, check out this article for further help. robot.txt is fine and we actually see cloudflare is blocking it with block fight mode. We've added in some rules to allow rogerbot but these seem to be getting ignored. If we use a robot.txt test tool (https://technicalseo.com/tools/robots-txt/) with rogerbot as the user agent this get through fine and we can see our rule has allowed it. When viewing the cloudflare activity log (attached) it seems the Create Your Campaign is trying to crawl the site with the user agent as simply set as rogerbot 1.2 but the robot.txt testing tool uses the full user agent string rogerbot/1.0 (http://moz.com/help/pro/what-is-rogerbot-, rogerbot-crawler+shiny@moz.com) albeit it's version 1.0. So seems as if cloudflare doesn't like the simple user agent. So is it correct the when MOZ is trying to crawl the site it uses the simple string of just rogerbot 1.2 now ? Thanks
Ben Cloudflare activity log, showing differences in user agent strings
2022-07-01_13-05-59.png0 -
How can we get follow links from Pinterest and Instagram
While using Open Site Explorer from Moz I came across follow links from Instagram and Pinterest. Now my question is how can we get follow links from Pinterest and Instagram
Moz Pro | | rakesh10 -
Ajax4SEO and rogerbot crawling
Has anyone had any experience with seo4ajax.com and moz? The idea is that it points a bot to a html version of an ajax page (sounds good) without the need for ugly urls. However, I don't know how this will work with rogerbot and whether moz can crawl this. There's a section to add in specific user agents and I've added "rogerbot". Does anyone know if this will work or not? Otherwise, it's going to create some complications. I can't currently check as the site is in development and the dev version is noindexed currently. Thanks!
Moz Pro | | LeahHutcheon0 -
Wher is the API id and key? I tried generating it several times but I get nothing...?
Wher is the API id and key? I tried generating it several times but I get nothing...?
Moz Pro | | breezego0 -
Why am I getting an access error when creating my first campaign?
The exact error message is: The change you wanted was rejected. Maybe you tried to change something you didn't have access to. I checked the Word Count and I am definitely <300 keywords. I have also made sure that the Branded Keywords are entered 1 per entry form. My cookies are clear and there should be no issue with my browser.
Moz Pro | | trufflelabs0 -
Not getting numbers on Exact and Broad match
Is there something wrong in Seomoz keyword analysis tool.I'm not getting the Exact and Broad match search volume. EDIT: It's been resolved.No need to answer this question
Moz Pro | | JohnChena0 -
Why do I keep getting "more than one canonical URL tag" on-page factor when, in fact, there is always only one?
The following are pages that SEOMOZ says have "more than one canonical URL tag" but they all have only one. Can someone help me understand this?http://www.lasercenterny.com/Laser-Hair-Removal-Binghamton/tabid/1950/Default.aspxhttp://www.lasercenterny.com/Hair-Removal-Binghamton-NY/tabid/1949/Default.aspxhttp://www.lasercenterny.com/Hair-Removal-Binghamton/tabid/1948/Default.aspx
Moz Pro | | SmartWebPros0 -
Newbie - help me get started, please :)
Hi Guy's, I am super excited to be here and looking fwd to getting to know you all. As the title suggests I am a complete newbie to the world of SEO I am very keen to learn, I just need pointing in the right direction. My website is live! Now I need to make it super populated, my site is an e-commerce website selling natural / organic beauty products for men women & children. I would like to make it one of the best out there and I am not worried about putting the hours in to achieve that goal. So my question to you guys is this.... where do I start?
Moz Pro | | dan1el0