Crawl Diagnostics 403 on home page...

martJ

In the crawl diagnostics it says oursite.com/ has a 403. doesn't say what's causing it but mentions no robots.txt. There is a robots.txt and I see no problems. How can I find out more information about this error?

ChiarynMiranda

Hi Dana,

Thanks for writing in. The robots.txt file would not cause a 403 error. That type of error is actually related to the way the server responds to our crawler. Basically, this means the server for the site is telling our crawler that we are not allowed to access the site. Here is a resource that explains the 403 http status code pretty thoroughly: http://pcsupport.about.com/od/findbyerrormessage/a/403error.htm

I looked at both of the campaigns on your account and I am not seeing a 403 error for either site, though I do see a couple of 404 page not found errors on one of the campaigns, which is a different issue.

If you are still seeing the 403 error message on one of your crawls, you would just need to have the webmaster update the server to allow rogerbot to access the site.

I hope this helps. Please let me know if you have any other questions.

-Chiaryn

martJ

Okay, so I couldn't find this thread and started a new one. Sorry...

... The problem persists.

RECAP

I have two blocks in my htaccess both are for amazonaws.com.

I have gone over our server block logs and see only amazon addresses and bot names.

I did a fetch as google with our WM Tools and fetch it did. Success!

Why isn't thiscrawler able to access? Many other bots are crawling right now.

Why can I use the seomoz on-page feature to crawl a single page but the automatic crawler wont access the site? Just took a break from typing this to try the on-page on our robots.txt, worked fine. Use the keyword "Disallow" and it gave me a C. =0)

... now if we could just crawl the rest of the site...

any help on this would be greatly appreciated.

martJ

I think I do. I just (a few minutes ago) went through a 403 problem being reported by another site trying access an html file for verification. Apparently they are connecting with an ip that's blocked by our htaccess. I removed the blocks told them to try again and it worked no problem. I see that SEOMoz has only crawled 1 page. Off to see if I can trigger a re-crawl now...

jesse-landry

hmmm... not sure why this is happening. maybe add this line to the top of your robots.txt and see if it fixes by next week. it certainly won't hurt anything:

User-agent: *
Allow: /

martJ

No problem. Looking at my Google WM Tools , crawl stats don't show any errors.

Thanks

User-Agent: *
Disallow: /*?zenid=
Disallow: /editors/
Disallow: /email/
Disallow: /googlecheckout/
Disallow: /includes/
Disallow: /js/
Disallow: /manuals/

jesse-landry

OH this is only in SEOmoz's crawl diagnostics that you're seeing this error. That explains why robots.txt could be affecting it. I misread this earlier and thought you were finding the 403 on your own in-browser.

Can you paste the robots.txt file into here so we can see it? I would imagine that has everything to do with it now that I've correctly read your post --my apologies

martJ

apache

jesse-landry

a 403 is a Forbidden code usually pertaining to Security and Permissions.

Are you running your server in an Apache or IIS environment? Robots.txt shouldn't affect a site's visibility to the public it only talks to site crawlers.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Crawl Diagnostics 403 on home page...

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Unsolved Make Moz reindex the pages for keywords

Aren't domain.com/page and domain.com/page/ the same thing?

Crawl Diagnostics - Crawling way more pages than my site has?

Duplicate page reported in Wordpress site, but I can't find it in All Pages list

How to force SeoMoz to re-crawl my website?

In my errors I have 2 different products on the same page?

Settings to crawl entire site

I'm getting "Issue: Title Element Too Long" when the title of the overall website + page title are being combined, shouldn't this solely depend on the page title itself?