Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at help@moz.com so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at help@moz.com and we'll do our best to sort things out with you.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz Not Crawling Angular SPA
I have a client that just launched a redesigned website using Angular as a single page app. Google appears to be able to crawl the site just fine, but Moz crawl is only finding one page. We have updated the htaccess to allow for Rogerbot and Dotbot, but still unable to crawl any pages other than the home page. Does anyone have experience with this or ideas of why it won't crawl all pages, and how to allow for Moz to crawl all pages? There is a sitemap with approx. 390 pages. Thanks!
Getting Started | | PIN_Celler1 -
Crawl issues, how to see a referring link?
Hi There, We've got two crawl issues for pages that don't exist (and never existed). The links are strange and judging by the code in them, appear to be coming from our own CMS. How can we see which pages the links are on in Moz? Cheers Ben
Getting Started | | cmscss0 -
How Do I Scan My New Site & Grade My Work With The Robots Turned Off? For Pre-Inspection before I launch my Site?
I have a new site that has all the bots turned off so google can't index my site until I'm finished it. I've been working on this site for a couple months now optimizing and I was wondering if there was anyway I can run a preliminary scan on the site for my titles, URLs, Headers, Alt Tags and pretty much anything else that will grade my work and tell me if i did anything wrong? Can MOZ do this with the Bots turned off? Thanks
Getting Started | | Inframan0 -
New to Moz Pro? Join our upcoming free webinar this Friday!
Hello everyone! We'll be holding a webinar on Friday to help new members learn about what all Pro has to offer, show some off our most popular tools, and get you comfortable with the dashboard. Register here**: https://www3.gotomeeting.com/register/132016958** Date: Friday, October 17th Time: 10:00 AM - 11:00 AM PDT Hope to see you all there! If you can't make it, you can watch a previous webinar here: http://moz.com/help/guides/getting-started
Getting Started | | jennita4 -
I don't believe moz is seeing everything that is on my webpage
I used the page key word grader and got an "F" Moz said that my keyword employee handbook was not in my title nor was it found in the body of my page. But when I look at the page and double check everything it is there all over the place. I am not blaming moz this is a wiz site and while I am a beginner and very well could be wrong could anyone just take a look and tell me if I am nuts or what. The web page is http://www.cestoday.com/#!employee-handbook/co0h I now have the font so big I will have to fix that. Thank you
Getting Started | | redsman9440 -
Crawl Diagnostics
Hello Experts, today i was analyse one of my website with moz and get issue overview and get total 212 issue 37 high all derive to this same url http://blogname.blogspot.com/search?updated-max=2013-10-30T17:59:00%2B05:30&max-results=4&reverse-paginate=true so can anyone help me how to find this url and remove all high priority error. and even on page website get A grade then why not performing well in SE ?
Getting Started | | JulieWhite0 -
Does Moz Analytics need Google Analytics installed?
I have a few websites on my Moz Analytics account but only have data for one of them. All my other accounts keepy saying come back in 24hrs for more data, although there is no data available. Do I need to connect each account with Google Analytics before Moz populates its own analytics?
Getting Started | | david.smith.segarra1 -
Cant download my crawl csv
When I click on the [download csv] in my crawl campaign section nothing happens.
Getting Started | | digitalmedialounge0