Moz & Xenu Link Sleuth unable to crawl a website (403 error)
-
It could be that I am missing something really obvious however we are getting the following error when we try to use the Moz tool on a client website. (I have read through a few posts on 403 errors but none that appear to be the same problem as this)
Moz Result
Title 403 : Error
Meta Description 403 Forbidden
Meta Robots_Not present/empty_
Meta Refresh_Not present/empty_
Xenu Link Sleuth Result
Broken links, ordered by link:
error code: 403 (forbidden request), linked from page(s): Thanks in advance!
-
Hey Liam,
Thanks for following up. Unfortunately, we use thousands of dynamic IPs through Amazon Web Services to run our crawler and the IP would change from crawl to crawl. We don't even have a set range for the IPs we use through AWS.
As for throttling, we don't have a set throttle. We try to space out the server hits enough to not bring down the server, but then hit the server as often as necessary in order to crawl the full site or crawl limit in a reasonable amount of time. We try to find a balance between hitting the site too hard and having extremely long crawl times. If the devs are worried about how often we hit the server, they can add a crawl delay of 10 to the robots.txt to throttle the crawler. We will respect that delay.
If the devs use Moz, as well, they would also be getting a 403 on their crawl because the server is blocking our user agent specifically. The server would give the same status code regardless of who has set up the campaign.
I'm sorry this information isn't more specific. Please let me know if you need any other assistance.
Chiaryn
-
Hi Chiaryn
The sage continues....this is the response my client got back from the developers - please could you let me have the answers to the two questions?
Apparently as part of their ‘SAF’ (?) protocols, if the IT director sees a big spike in 3<sup>rd</sup> party products trawling the site he will block them! They did say that they use moz too. What they’ve asked me to get from moz is:
- Moz IP address/range
- Level of throttling they will use
I would question that if THEY USE MOZ themselves why would they need these answers but if I go back with that I will be going around in circles - any chance of letting me know the answer(s)?
Thanks in advance.
Liam
-
Awesome - thank you.
Kind Regards
Liam
-
Hey There,
The robots.txt shouldn't really affect 403s; you would actually get a "blocked by robots.txt" error if that was the cause. Your server is basically telling us that we are not authorized to access your site. I agree with Mat that we are most likely being blocked in the htaccess file. It may be that your server is flagging our crawler and Xenu's crawler as troll crawlers or something along those lines. I ran a test on your URL using a non-existent crawler, Rogerbot with a capital R, and got a 200 status code back but when I run the test with our real crawler, rogerbot with a lowercase r, I get the 403 error (http://screencast.com/t/Sv9cozvY2f01). This tells me that the server is specifically blocking our crawler, but not all crawlers in general.
I hope this helps. Let me know if you have any other questions.
Chiaryn
Help Team Ninja -
Hi Mat
Thanks for the reply - robots.txt file is as follows:
## The following are infinitely deep trees User-agent: * Disallow: /cgi-bin Disallow: /cms/events Disallow: /cms/latest Disallow: /cms/cookieprivacy Disallow: /cms/help Disallow: /site/services/megamenu/ Disallow: /site/mobile/ I can't get access to the .htaccess file at present (we're not the developers) Anyone else any thoughts? Weirdly I can get Screaming Frog info back on the site :-/
-
403s are tricky to diagnose because they, by their very nature, don't tell you much. They're sort of the server equivalent of just shouting "NO!".
You say Moz & Xenu are receiving the 403. I assume that it loads properly from a browser.
I'd start looking at the .htaccess . Any odd deny statements in there? It could be that an IP range or user agent is blocked. Some people like to block common crawlers (Not calling Roger names there). Check the robots.txt whilst you are there, although that shouldn't return a 403 really.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
404 error for unknown URL that Moz is finding in our blog
I'm receiving 404 errors on my site crawl for messinastaffing.com. They seem to be generating only from our blog posts which sit on Hubspot. I've searched high and low and can't identify why our site URL is being added at the end - I've tried every link in our blog and cannot repeat the error the crawl is finding. For instance: Referer is: http://blog.messinastaffing.com/take-charge-career-story-compelling-cover-letter/ 404 error is: http://blog.messinastaffing.com/take-charge-career-story-compelling-cover-letter/www.messinastaffing.com I agree that the 404 error URL doesn't exist but I can't identify where Moz is finding it. I have approximately 75 of these errors - one for every blog on our site. Beth Morley Vice President, Operations Messina Group Staffing Solutions
Moz Pro | | MessinaGroup
(847) 692-0613 www.messinastaffing.com0 -
Does MOZ still do deep crawls of the website?
In the past you could get MOZ to crawl your website, now I don't see this option, no do I see a crawl at the beginning of the month. Has this change? I saw this as a useful feature.
Moz Pro | | cdgospel0 -
Moz metrics
This discussion is strictly theoretical... I won't hold anyone to their answer. If I have 2 websites that are identical in every way and let's say the domain authority for both is 40, and I 301 redirect one site to the other, what would the DA become? Same question for single pages, both with a PA of 40. If I 301 redirect one page to the other, what does the PA become for the remaining page?
Moz Pro | | AMHC0 -
Where are my SEO Moz resources
I logged in, changed my password and now I can't find my SEO resources. I need to pull a report quickly, and can't find what I need. Please help ASAP!
Moz Pro | | bcbsm0 -
Mobile Website Resources
Hey everyone, Can you please recommend great resources for building mobile website and using proper SEO techniques for mobile? Just a list of resources would be great. I understand that this is SEO forum so at least basics for mobile SEO would do. I'm currently using http://www.howtogomo.com and WPTouch PRO (for WordPress) but would love to learn to build mobile sites myself, at least with templates or basic tools provided. Just want to know what's there to know and how hard it is. And if I can handle it - what SEO practices for mobile I should keep in mind. Thank you! Max
Moz Pro | | MaxMinzer0 -
Link from home page
I see an abundance of websites that don't link out at all, and there is no data available to seoMoz for the sites, some like http:// regal - diving dot com (trying not to get moderated AGAIN Innocent need to show which link and its a competitor kind of) are not linking to anyone else just internal links. How come they dont show info to SEOMOZ Should I link out from home page, is this detrimental does ANY single link from home page show google that I am not a PROPER web business. thanks
Moz Pro | | landed0 -
Only 1 page is being crawled by SEOmoz for the last 2 crawls
I would like to ask for the possible problem plus solution on one of our campaigns. Only 1 page is being crawled by SEOmoz for the last 2 crawls. Before the last two crawls, SEOmoz crawls numerous pages and we can’t think of a possible reason for this error. For this particular campaign , there are no data --- no errors, warnings and notices. Thanks!
Moz Pro | | TheNorthernOffice790 -
Why doesn't the BBB / Trustlink.org links show up in the Link Analysis?
I am curious why one of my client's main competitors (www.allbayhardwood.com) shows links from the Better Business Bureau and Trustlink.org (associated with BBB) but links from those sources do not show up for his domain (www.sanjosehardwoodfloors.com). He has been a BBB Acredited Business since 12/2010 and on file with them for probably as long as they have had the online version, which seems like plenty of time for the link to have been picked up. BBB has a very nice domain authority and it would be great to see these links show up. (they don't show up in webmaster tools either) Is there something I am missing? Thanks in advance guys and gals! (I know the site has other SEO issues - just getting started on pounding everything out.)
Moz Pro | | SnoBaer0