Moz "Crawl Diagnostics" doesn't respect robots.txt
-
Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like:
- Duplicate content
- Overly dynamic URLs
- Duplicate Page Titles
The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored):Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/Many thanks for any info on this issue.
-
Hi Si, has this issue been resolved?
-
Hey Si,
Thanks for writing in. It doesn't seem that we are having an overarching issue with our crawler ignoring robots.txt files so I did some research in Google Webmaster Tools and it looks like most crawlers require an asterisk in the disallow directive to recognize that all pages of a dynamic URL are being disallowed. If you look in the "Pattern Matching" section of this resource here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449, that should give you more information about setting up the robots.txt with the correct disallow directives to block those pages.
If you add in the astrisk to the disallow directive and you are still seeing these pages crawled, it would help if you sent in an email with your campaign information to our support desk at help@moz.com so we can have our engineers look into this more directly.
I hope this helps.
Chiaryn
-
If you have an "index,(no)follow" meta on those pages I think they will be crawled even though you have them blocked in robots.txt. So by adding "noindex" on those pages it might work as you want it to.
-
Is the / actually in the URL at that spot? Or is your link like http://www.example.com/abcd?p=147
If you give an example full URL that includes one of your blocked dynamic URLs we can take a better look. If your robots is setup correctly, it shouldn't find that stuff but give us more info if you're able.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Crawl Test is now On-Demand Crawl!
If you've been with Moz a while, you may have used our old Crawl Test tool. A year ago we launched an all new, campaign-based Site Crawl (with an entirely rebuild crawl engine), but Crawl Test fell into disrepair and we haven't had a solid tool for crawling non-campaign domains. I'm happy to announce that we've just launched an all new On-Demand Crawl, built on the new Site Crawl engine, with a UI that's focused on quick insights. Moz Pro Standard tier customers can run up to 5 crawls per month at 3,000 page per crawl (crawls are saved for 90 days), with per-month limits increasing at higher levels. Most On-Demand Crawls should run in a few minutes, making the tool perfect to get quick insights for sales meetings, vetting prospects, or analyzing competitors. We've written up a sample case study or logged-in customers can go directly to On-Demand Crawl. Try it out -- we'd love to hear your use cases (either here or in the blog post comments).
Moz Bar | | Dr-Pete6 -
That URL is inaccessible Moz grader?
Hi all, I'm having some issues getting my site graded www.balihaiphoto.com + Kauai wedding photographer with On Page grader, where as when I enter another photographer example www.jmoellerphoto.com and getting results. Is there any reason this is going on that I can correct? Many thanks for any help all! -Jon
Moz Bar | | Jon_Gibb0 -
Does Moz's keyword tool pull data from your IP address?
Does anyone know how Moz's keyword tool pulls their keyword ranks? Do they take it based off of the IP (history and cookies) that is being used? I am trying to find a way to collect keyword data that is neutral and not based off of my previous searches, etc. TIA
Moz Bar | | ReviveMedia0 -
How can I find the old ERRORS and WARNINGS report in the NEW Moz design?
I'm looking for a complete list of errors and bugs that need to be fixed within a website. I used to use the MAIN tool (at least it seemed it was the most popular) but now that its just MOZ.com I can't seem to find that great report. It had data such as: 1. List of pages with Title Tags too long 2. List of pages with Description Tags too long 3. List of RED errors and YELLOW warnings, BLUE somethings... etc... Ring a bell? I LOVED this report, where can I find this data? Thanks! Derek
Moz Bar | | DerekM42420 -
Moz analytics not updating
Okay so I was invited to moz analytics. When I received the email I was stoked to get to use the new beta software. My campaigns transferred over ,but when I began to look at the data, it said updating check back in 24 hours or something along those lines. I thought okay that is fine, but to my suprise it has been around four days since then and it still says it is updating. It also shows weekly stats of visits but the number there is definitely wrong. It said I only had around 2,100 but I get more than that daily. Anyone in support that can help? I'm confused on what I can do to fix this issue. I understand it is just a beta ,but other people, from what I have seen, haven't had a similar issue. If anyone can point me in the right direction I'd appreciate it!
Moz Bar | | ithvac0 -
Crawl Diagnostics: Exlude known errors and others that have been detected by mistake? New moz analytics feature?
I'm curious if the new moz analytics will have the feature (filter) to exclude known errors from the crwal diagnostics. For example, the attached screenshot shows the URL as 404 Error, but it works fine: http://en.steag.com.br/references/owners-engineering-services-gas-treatment-ogx.php To maintain a better overview which errors can't be solved (so I just would like to mark them as "don't take this URL into account...") I will not try to fix them again next time. On the other hand I have hundreds of errors generated by forums or by the cms that I can not resolve on my own. Also these kind of crawl errors I would like to filter away and categorize like "errors to see later with a specialist". Will this come with the new moz analytics? Anyway is there a list that shows which new features will still be implemented? knPGBZA.png?1
Moz Bar | | inlinear0 -
"Avoid Keyword Self-Cannibalization" - can't find the problem
Hi, I understand what this means (or at least I think I do!), but I can't find where the problem lies. The keyword is "fire warden training" and the url is http://www.tutis-fire.co.uk/fire-warden-training-courses/ If anyone could lend a helping hand, I'd appreciate it.
Moz Bar | | Gordon_Hall0 -
Since the revised website was launched, I can't find the "Crawl Test" function showing Titles and Descriptions of other websites. Anyone know where that link is located?
MOZ can "crawl" any website and show information like Title, Description, etc.....Can't find that link.
Moz Bar | | bpedrazas0