Rogerbot getting cheeky?
-
Hi SeoMoz,
From time to time my server crashes during Rogerbot's crawling escapades, even though I have a robots.txt file with a crawl-delay 10, now just increased to 20.
I looked at the Apache log and noticed Roger hitting me from from 4 different addresses 216.244.72.3, 72.11, 72.12 and 216.176.191.201, and most times whilst on each separate address, it was 10 seconds apart, ALL 4 addresses would hit 4 different pages simultaneously (example 2). At other times, it wasn't respecting robots.txt at all (see example 1 below).
I wouldn't call this situation 'respecting the crawl-delay' entry in robots.txt as other question answered here by you have stated. 4 simultaneous page requests within 1 sec from Rogerbot is not what should be happening IMHO.
example 1
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage1.html" 200 77813
216.244.72.12 - - [05/Sep/2012:15:54:27 +1000] "GET /store/product-info.php?mypage2.html HTTP/1.1" 200 74058
216.244.72.12 - - [05/Sep/2012:15:54:28 +1000] "GET /store/product-info.php?mypage3.html HTTP/1.1" 200 69772
216.244.72.12 - - [05/Sep/2012:15:54:37 +1000] "GET /store/product-info.php?mypage4.html HTTP/1.1" 200 82441example 2
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage1.html HTTP/1.1" 200 70209
216.244.72.11 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage2.html HTTP/1.1" 200 82384
216.244.72.12 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage3.html HTTP/1.1" 200 83683
216.244.72.3 - - [05/Sep/2012:15:46:15 +1000] "GET /store/mypage4.html HTTP/1.1" 200 82431
216.244.72.3 - - [05/Sep/2012:15:46:16 +1000] "GET /store/mypage5.html HTTP/1.1" 200 82855
216.176.191.201 - - [05/Sep/2012:15:46:26 +1000] "GET /store/mypage6.html HTTP/1.1" 200 75659Please advise.
-
Hi BM7,
I'm going to open up a ticket on this to have our engineers take a closer look at your site. Once we have an overall response, I'll post it here for other community members to view.
Cheers!
-
Thanks Megan for your reply,
Will give that a try and have blocked 2 addresses so you are reduced to 2 crawler sessions. These two measures should reduce the load considerably as long as Rogerbot respects the 7 second delay.
IMHO ignoring the Crawl-Delay set by the webmaster of the site you are crawling, which crawlers are supposed to respect, is wrong. I got a Google WMT nasty for being down 5 hours due to Rogerbot as it was the middle of the night so only got restarted in the morning.
Also, my site has around 600 discrete pages of which you crawl about 500, so even at the original 10 seconds crawl delay you could do my whole site in less than 1.5 hours, which is only required once a week. So in my mind that suggests there is no need to overrule my settings in robots.txt 'so he (Roger) can complete the crawl'.
Regards,
-
Hi there,
This is Megan from the SEOmoz Help Team. I'm so sorry Rogerbot is causing you grief! This actually might be happening because your crawl delay is too long, so rogerbot just ends up ignoring it so he can complete the crawl. If you set your crawl delay to a max of 7, then it should solve your problem. If you're still running into issues, though, please send us a message to help@seomoz.org and we'll check it out asap!
Cheers!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have a duplicate content on my Moz crawler, but google hasn't indexed those pages: do I still need to get rid of the tags?
I received an urgent error from the Moz crawler that I have duplicate content on my site due to the tags I have. For example: http://www.1forjustice.com/graves-amendment/ The real article found here: http://www.1forjustice.com/car-accident-rental-car/ I didn't think this was a big deal, because when I looked at my GWT these pages weren't indexed (picture attached). Question: should I bother fixing this from an SEO perspective? If Google isn't indexing the pages, then am I losing link juice? 6c2kxiZ
Moz Pro | | Perenich0 -
Its been over a month, rogerbot hasn't crawled the entire website yet. Any ideas?
Rogerbot has stopped crawling the website at 308 pages past week and has not crawled the website with over 1000+ pages. Any ideas on what I can do to get this fixed & crawling?
Moz Pro | | TejaswiNaidu0 -
Getting odd results with MOZbar. (Some pages are 0,0,0)
I'm trying to review the Domain aurhotiry, Page Authority, and MozRank & Moztrust for some news websites and I found it odd that may sites will have excellent DA,MT,MR & PA on most of the pages but then when I view one of their blog posts the PA,MR & MT are 0. Here's are two examples Site
Moz Pro | | SheffieldMarketing
http://www.washingtonpost.com/lifestyle Individual Post
http://www.washingtonpost.com/lifestyle/home/checking-in-with-thomas-pheasant/2012/10/30/a2920ed4-1df5-11e2-ba31-3083ca97c314_story.html Site
http://www.philly.com/philly/living/ Individual Post
http://www.philly.com/philly/home/Home_Style_Silver_makes_holiday_decorations_really_shine_.html Does that mean that links from blog posts would not be very benificial? The domain authoity is still crazy high but everything elese is 0. Anyone know why? I'm new to using the Moz bar. Thanks0 -
Do SEOmoz private questions get answered?
I submitted a question on September 30 and never received a response. I then followed up on October 6 and had an exchange with the support staff, who first told me that the question would be answered and then a couple of days later told me that they had no record of the question. By chance, I had taken a screen shot showing confirmation that the question had been sent (just something I did to remind myself to follow up if I didn't get a response!), which I sent to them along with the question again. That was four days ago, and I've heard nothing since then - not even an automated email saying that the question was received. Is it common for it to take almost a month before a response is received, and to have to follow up multiple times just to get an answer?
Moz Pro | | csmm0 -
Issue getting total links, page & domain authority
Hi guys, I am trying to get total links, page & domain authority using the API. I am requesting the following columns: Cols=6871947673632768328204816384343597383681653687091214 { "fjid": 207343179, "ued": 43324279, "pib": 255645, "ptrr": 0.0056131743357352125, "fmrp": 8.246626591590841, "unid": 954915, "fjf": 4003651, "fjr": 0.00040067116628622016, "ftrp": 8.308303969566644, "ftrr": 0.0012189619975325583, "fejp": 9.265328830369816, "pnid": 45883246, "fjd": 2480265, "ujfq": 1277385, "pjip": 1230240, "fjp": 9.586342983782004, "fuid": 294877628, "uu": "www.google.com/", "pejr": 0.0004768398971363439, "ufq": "www.google.com/", "pejp": 9.647424778525615, "ujp": 300689, "utrp": 7.916901429865898, "ptrp": 9.487254666203722, "utrr": 0.001639219985667878, "fmrr": 0.000731592927123369, "pda": 100, "pjd": 5600882, "ulc": 1342758719, "fnid": 12165784, "fejr": 0.00016052965883996156, "ujb": 107264 } I cannot see the UPA column returned in the JSON object. Im using 34359738368 for the UPA column. I need to retrieve the three fields (page authority, domain authority and total links) in the same query. Is it possible?
Moz Pro | | Srvwiz0 -
Tracking External Followed Inbound Links - Are We Getting Fooled?
This question is not about how to use SEOMOZ Link tools. It's about the numbers themselves and how their fluctuations are causing confidence issues.... My client wants to track inbound followed links over time. But as I understand it, this number will vary depending on the Linkscape crawling success (or even outages or partial crawls.) In the past few reports, we've had wild fluctuations which have made them very nervous. If my client wants to track inbound followed links over time, what SEOMOZ metric should we use for the most reliable progress checks?
Moz Pro | | scottclark0 -
How do I get back my archived campaigns??
Hi there, Can anyone help! I archived a campaign a month ago and now I want it back - can anyone help? How do I get back my archived campaign? Or do I need to start again?? Thanks Gareth
Moz Pro | | GAZ090 -
Can you help me get started using the crawl diagnostics report?
After getting the crawl diagnostics report for the first time my boss and I looked over it and we have tried to fix the problems but we are stumped.I have tried and watched videos , read books, etc.. but have found nothing to help. I need assistance getting started on improving my website. Can you help?
Moz Pro | | WVInjuryLawyer0