Moz "Crawl Diagnostics" doesn't respect robots.txt
-
Hello, I've just had a new website crawled by the Moz bot. It's come back with thousands of errors saying things like:
- Duplicate content
- Overly dynamic URLs
- Duplicate Page Titles
The duplicate content & URLs it's found are all blocked in the robots.txt so why am I seeing these errors?
Here's an example of some of the robots.txt that blocks things like dynamic URLs and directories (which Moz bot ignored):Disallow: /?mode=
Disallow: /?limit=
Disallow: /?dir=
Disallow: /?p=*&
Disallow: /?SID=
Disallow: /reviews/
Disallow: /home/Many thanks for any info on this issue.
-
Hi Si, has this issue been resolved?
-
Hey Si,
Thanks for writing in. It doesn't seem that we are having an overarching issue with our crawler ignoring robots.txt files so I did some research in Google Webmaster Tools and it looks like most crawlers require an asterisk in the disallow directive to recognize that all pages of a dynamic URL are being disallowed. If you look in the "Pattern Matching" section of this resource here: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449, that should give you more information about setting up the robots.txt with the correct disallow directives to block those pages.
If you add in the astrisk to the disallow directive and you are still seeing these pages crawled, it would help if you sent in an email with your campaign information to our support desk at help@moz.com so we can have our engineers look into this more directly.
I hope this helps.
Chiaryn
-
If you have an "index,(no)follow" meta on those pages I think they will be crawled even though you have them blocked in robots.txt. So by adding "noindex" on those pages it might work as you want it to.
-
Is the / actually in the URL at that spot? Or is your link like http://www.example.com/abcd?p=147
If you give an example full URL that includes one of your blocked dynamic URLs we can take a better look. If your robots is setup correctly, it shouldn't find that stuff but give us more info if you're able.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Calling all 301 htaccess Guru's - www to non www - then to https + Redirect homepage to inner page
I have tried searching, multiple opinions and multiple things that supposedly work. What I have now, seems to work from an end user perspective, but Roger tells me otherwise: Redirect Chain issue....redirect, which redirects which redirects etc..... FIRST, we need to redirect all www to non www. SECOND, we need to redirect all to https. THIRD, we need to redirect the homepage to an inner page. (Got to love BOGUS DMCA complaints! :)?) So far we have: RewriteEngine on
Moz Bar | | Jes-Extender-Australia
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTP_HOST} ^mydomain.com.au$ [OR]
RewriteCond %{HTTP_HOST} ^www.mydomain.com.au$
RewriteRule ^/?$ "https://mydomain.com.au/inner-page-here" [R=301,L] Plus down the page there is the usual wordpress settings: <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> So, why does it seem to work for the end user, but Roger has his knickers in a knot saying, redirect, to redirect to redirect etc? Namaste and many thank you's in advance 🙂0 -
Moz Bar not providing any data. Tried logging out/back in and un/re-installing, but no dice.
Used Mozbar for a long time, and normally works fine. Suddenly finding that it is not providing any data. All of the fields are there, but it does not provide me with PA/DA, etc, and all social metrics are at 0. This is across all sites, not just on in particular. Have tried logging out and in, deactivating and activating, and reinstalling. Nothing has worked.
Moz Bar | | SearchPros2 -
Moz Content --- for SEO or simply user engagement?
What is the primary function of Moz Content? It looks like it is most useful for managing content as a user engagement tool. Our content strategy is centered on boosting organic placement - with user engagement as a nice but unessential side product. Besides providing general descriptive details of a site's content / authorship - how can Moz Content help with SEO?
Moz Bar | | cvonhassell0 -
How can the Moz Page Grader support a 'keyword portfolio' approach?
I used to use the Page Grader tools to support the old philosophy of one page - one keyword. With more focus now being given to a portfolio of keywords around a topic area - what would be a good approach to using the page grader tool? Obviously getting A's and B's is impossible for multiple keywords. The only way i've seen suggested in moz tools to help with keyword portfolios is to use labels in the ranking measurement and then find averages of the results. Are there other strategies that I can try?
Moz Bar | | AISFM0 -
Clarify "broad keyword usage in page title"
Hello Page grader has two different grades for page title that I want clarification on. There is "Broad Keyword Usage in Page Title" and "Exact Keyword Usage in Page Title". Googling around about and searching here I have found that "broad" seems to mean the keywords should be used throughout the page, rather than just in the title and header. Which makes sense as this is a kind of check to ensure the page IS about the keywords and not something unrelated. But what is meant by "broad" usage in the page title? This refers specifically to the page title and not the whole document. My best guess got me to this, given the keyword "Visit London Today"; "Come and visit London today" - exact match only "London - visit today" - broad match only "Visit London the city of dreams | visit London today" - matches both That could be complete nonsense, but basically is broad usage the use of keywords scattered in the page title? Thanks.
Moz Bar | | yolkcreative0 -
Not getting foreign characters in crawl diagnostics .csv
The crawl diagnostics .csv file is showing high-ascii characters instead of the correct language (foreign language website) e.g. Vietnamese, Chinese (both kinds), etc. Is there a way to get this right?
Moz Bar | | trainSEM0