Moz "Crawl Diagnostics" doesn't respect robots.txt

Vitalized

Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like:

Duplicate content
Overly dynamic URLs
Duplicate Page Titles

The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored):

Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/

Many thanks for any info on this issue.

Christy-Correll

Hi Si, has this issue been resolved?

ChiarynMiranda

Hey Si,

Thanks for writing in. It doesn't seem that we are having an overarching issue with our crawler ignoring robots.txt files so I did some research in Google Webmaster Tools and it looks like most crawlers require an asterisk in the disallow directive to recognize that all pages of a dynamic URL are being disallowed. If you look in the "Pattern Matching" section of this resource here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449, that should give you more information about setting up the robots.txt with the correct disallow directives to block those pages.

If you add in the astrisk to the disallow directive and you are still seeing these pages crawled, it would help if you sent in an email with your campaign information to our support desk at help@moz.com so we can have our engineers look into this more directly.

I hope this helps.

Chiaryn

helgeolaussen

If you have an "index,(no)follow" meta on those pages I think they will be crawled even though you have them blocked in robots.txt. So by adding "noindex" on those pages it might work as you want it to.

MattAntonino

Is the / actually in the URL at that spot? Or is your link like http://www.example.com/abcd?p=147

If you give an example full URL that includes one of your blocked dynamic URLs we can take a better look. If your robots is setup correctly, it shouldn't find that stuff but give us more info if you're able.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz "Crawl Diagnostics" doesn't respect robots.txt

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Crawl Test is now On-Demand Crawl!

That URL is inaccessible Moz grader?

Does Moz's keyword tool pull data from your IP address?

How can I find the old ERRORS and WARNINGS report in the NEW Moz design?

Moz analytics not updating

Crawl Diagnostics: Exlude known errors and others that have been detected by mistake? New moz analytics feature?

"Avoid Keyword Self-Cannibalization" - can't find the problem

Since the revised website was launched, I can't find the "Crawl Test" function showing Titles and Descriptions of other websites. Anyone know where that link is located?