Moz & Xenu Link Sleuth unable to crawl a website (403 error)
-
It could be that I am missing something really obvious however we are getting the following error when we try to use the Moz tool on a client website. (I have read through a few posts on 403 errors but none that appear to be the same problem as this)
Moz Result
Title 403 : Error
Meta Description 403 Forbidden
Meta Robots_Not present/empty_
Meta Refresh_Not present/empty_
Xenu Link Sleuth Result
Broken links, ordered by link:
error code: 403 (forbidden request), linked from page(s): Thanks in advance!
-
Hey Liam,
Thanks for following up. Unfortunately, we use thousands of dynamic IPs through Amazon Web Services to run our crawler and the IP would change from crawl to crawl. We don't even have a set range for the IPs we use through AWS.
As for throttling, we don't have a set throttle. We try to space out the server hits enough to not bring down the server, but then hit the server as often as necessary in order to crawl the full site or crawl limit in a reasonable amount of time. We try to find a balance between hitting the site too hard and having extremely long crawl times. If the devs are worried about how often we hit the server, they can add a crawl delay of 10 to the robots.txt to throttle the crawler. We will respect that delay.
If the devs use Moz, as well, they would also be getting a 403 on their crawl because the server is blocking our user agent specifically. The server would give the same status code regardless of who has set up the campaign.
I'm sorry this information isn't more specific. Please let me know if you need any other assistance.
Chiaryn
-
Hi Chiaryn
The sage continues....this is the response my client got back from the developers - please could you let me have the answers to the two questions?
Apparently as part of their ‘SAF’ (?) protocols, if the IT director sees a big spike in 3<sup>rd</sup> party products trawling the site he will block them! They did say that they use moz too. What they’ve asked me to get from moz is:
- Moz IP address/range
- Level of throttling they will use
I would question that if THEY USE MOZ themselves why would they need these answers but if I go back with that I will be going around in circles - any chance of letting me know the answer(s)?
Thanks in advance.
Liam
-
Awesome - thank you.
Kind Regards
Liam
-
Hey There,
The robots.txt shouldn't really affect 403s; you would actually get a "blocked by robots.txt" error if that was the cause. Your server is basically telling us that we are not authorized to access your site. I agree with Mat that we are most likely being blocked in the htaccess file. It may be that your server is flagging our crawler and Xenu's crawler as troll crawlers or something along those lines. I ran a test on your URL using a non-existent crawler, Rogerbot with a capital R, and got a 200 status code back but when I run the test with our real crawler, rogerbot with a lowercase r, I get the 403 error (http://screencast.com/t/Sv9cozvY2f01). This tells me that the server is specifically blocking our crawler, but not all crawlers in general.
I hope this helps. Let me know if you have any other questions.
Chiaryn
Help Team Ninja -
Hi Mat
Thanks for the reply - robots.txt file is as follows:
## The following are infinitely deep trees User-agent: * Disallow: /cgi-bin Disallow: /cms/events Disallow: /cms/latest Disallow: /cms/cookieprivacy Disallow: /cms/help Disallow: /site/services/megamenu/ Disallow: /site/mobile/ I can't get access to the .htaccess file at present (we're not the developers) Anyone else any thoughts? Weirdly I can get Screaming Frog info back on the site :-/
-
403s are tricky to diagnose because they, by their very nature, don't tell you much. They're sort of the server equivalent of just shouting "NO!".
You say Moz & Xenu are receiving the 403. I assume that it loads properly from a browser.
I'd start looking at the .htaccess . Any odd deny statements in there? It could be that an IP range or user agent is blocked. Some people like to block common crawlers (Not calling Roger names there). Check the robots.txt whilst you are there, although that shouldn't return a 403 really.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to know exactly which page links to a 404 page on my website?
Hi Moz users, Sometimes I get 404 crawl errors using Moz Pro and when my website has a few dozen pages it is hard for me to find the original page that links to a 404 page. Is there a way to find this automatically using Moz or do I have to look for it manually? I just need to find the original link and delete it to fix my 404 issue. Please let me know thank you for you help. -Marc
Moz Pro | | marcandre0 -
What exactly is Moz Analytics?
I've taken a look at this page - http://moz.com/products/tour and I'm trying to determine if Moz Analytics is live yet? I don't think it is, just wanted to make sure. Because right now I'm just seeing all the same tools as we've always had + Fresh Web Explorer. Anyways, let me know if I'm missing something.
Moz Pro | | CAndrew14.1 -
Campaign link not working?
I imagine most of the initial problems with the domain migration to moz.com will be cleaned up rather quickly. I did however notice right away my links to my campaigns from the dashboard were not working. I then click the campaigns tab up top and my campaign links worked from there. Just thought I'd share real quick incase anyone else is having the same issue.
Moz Pro | | CDUBP0 -
Does linking to relevant high authority websites effect your MozTrust or Rank?
Basically what the title says. I am having a hard time understanding why a compeitor with less linking domains and none of any real quality, they're all membership sites or partnerships, nothing to relevant to the industry. While we have links and articles on us from multiple magazines in our industry. As well as to relevant directories with high domain ranks. The only thing I noticed is they're linking to their clients website, which are all high authority websites. So do external links count towards your MozTrust or Rank?
Moz Pro | | SeanConroy1 -
Did moz stop doing webinars?
The last recorded webinar is from april did moz stop doing these? Luckily i have all the moz con videos t go thru (which are awesome by the way-thanks)
Moz Pro | | DavidKonigsberg1 -
OSE lists dead links
Going over the link profile of a competitor who gets 5x the traffic we do.... of course frustrated that the majority of their links are spam blogs (full of words but don't communicate anything) and forum profiles. Thanks Google for telling me what not to do, then rewarding my competitor for doing it shamelessly. Question regarding sites listed by Open Site Explorer as linking to said competitor, but that don't even load when I visit their url. Some go to a godaddy parked page, like the domain name expired long ago. Is this simply a limitation of OSE, and can I assume Google has indexed differently and therefore awarding no link juice from these urls?
Moz Pro | | jotham20 -
& VS & - Title too long?
It seems that SEO Moz inturprets & as the html ascii character code: & in my titles. This is pushing the titles over the limit by 1 or 2 characters in some cases. Does this matter? does google actually treat & the same way? or is this an SEO Moz bug?
Moz Pro | | adriandg1 -
Competitive Link Analysis
How can I make the a new report for the Competitive Link Analysis? My report has a date from two weeks ago and I would like to see an update.
Moz Pro | | CalgaryRealtor390