Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block Moz (or any other robot) from crawling pages with specific URLs
-
Hello!
Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future.
I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt:
User-agent: dotbot
Disallow: /*numberOfStars=0User-agent: rogerbot
Disallow: /*numberOfStars=0My questions:
1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact?
2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?)
I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there.
Thank you for your help!
-
Hello!
Thanks a lot for your feedback and clearing this out! It worked well.
The robots.txt tester is a good tip!
Thanks!
-
Hi,
What you have there will work absolutely fine with a little tweak. And no need to leave spaces between lines.
Disallow: /numberOfStars=0
However, no need to add the wildcard at the end if there is nothing more after that.
The best way to test what works, is before you go and add it to live, use the Robots.txt test tool in Search Console (Webmaster Tools), add in the lines above and then check to make sure none of your other pages are blocked. They won't be, but it's a great way to test before going live.
I hope this helps
-Andy
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Is Moz Able to Track Internal Links Per Page?
I am trying to track internal links and identify orphan pages. What is the best way to do this?
Moz Pro | | WebMarkets0 -
Source page showsI have 2 h1 tags on my page. I can only find one.
When I grade my page it says I have more than one h1 tag. I view the source page and it shows there are two h1 headings with the same wording. If I delete the one h1 heading I can find, the page source shows I have deleted both of them. I don't know how to get to the other heading to delete it. And I'm off page one of google! Can anybody help? Clay Stephens
Moz Pro | | Coot0 -
Robots.txt blocking Moz
Moz are reporting the robots.txt file is blocking them from crawling one of our websites. But as far as we can see this file is exactly the same as the robots.txt files on other websites that Moz is crawling without problems. We have never come up against this before, even with this site. Our stats show Rogerbot attempting to crawl our site, but it receives a 404 error. Can anyone enlighten us to the problem please? http://www.wychwoodflooring.com -Christina
Moz Pro | | ChristinaRadisic0 -
Pages with URL Too Long
Hello Mozzers! MOZ keeps kindly telling me the URLs are too long. However, this is largely due to the structure of E-commerce site, which has to include 'brand' 'range' and 'products' keyword. For example -
Moz Pro | | tigersohelll
https://www.choicefurnituresuperstore.co.uk/Devonshire-Rustic-Oak-Bedside-Cabinet-1-Drawer-p40668.html MOZ recommends no more than 75 characters. This means we have 25-30 characters for both the brand name and product name. Questions:
If it is an issue, how to fix it on my site?
If it's not an issue, how can we turn off this alert from MOZ?
Anyone know how big an issue URLs are as a ranking factor? I thought pretty low.0 -
Pages with Temporary Redirects on pages that don't exist!
Hi There Another obvious question to some I hope. I ran my first report using the Moz crawler and I have a bunch of pages with temporary redirects as a medium level issue showing up. Trouble is the pages don't exist so they are being redirected to my custom 404 page. So for example I have a URL in the report being called up from lord only knows where!: www.domain.com/pdf/home.aspx This doesn't exist, I have only 1 home.aspx page and it's in the root directory! but it is giving a temp redirect to my 404 page as I would expect but that then leads to a MOZ error as outlined. So basically you could randomize any url up and it would give this error so I am trying to work out how I deal with it before Google starts to notice or before a competitor starts to throw all kinds at my site generating these errors. Any steering on this would be much appreciated!
Moz Pro | | Raptor-crew0 -
Canonical URLs all show trailing slash on main site pages - using Yoast SEO for Wordpress - how to correct
We are using Yoast for a number of our sites. We use naked domain as the canonical. I have noticed in the header tags that all our sites show the canonical URLs as having a trailing slash: Example: http;//foxspizzajc.com, when I look at the source code, it shows the canonical as http;//foxspizzajc.com/ Of course, it is much more likely that all sites that link to us will not use the trailing slash - so preferably we do not want that to be the canonical - among other reasons. Does this need to be fixed so the trailing slash is removed? I cannot see how to do this in Yoast SEO or in Permalinks structure for Wordpress. Sorry for my ignorance. Thanks for any help.
Moz Pro | | Adam_RushHour_Marketing1 -
How to increase page authority
I wonder how to increase the page authority or the domain authority to begin with. It seems you are putting a lot of weight on this in your analysis.
Moz Pro | | wcsinc0 -
Use of the tilde in URLs
I just signed up for SEOMoz and sent my site through the first crawl. I use the tilde in my rewritten URLs. This threw my entire site into the Notice section 301 (permanent redirect) since each page redirects to the exact URL with the ~, not the %7e. I find conflicting information on the web - you can use the tilde in more recent coding guidelines where you couldn't in the old. It would be a huge thing to change every page in my site to use an underscore instead of a tilde int he URL. If Google is like SEOMoz and is 301 redirecting every page on the site, then I'll do it, but is it just an SEOMoz thing? I ran my site through Firebug and and all my pages show the 200 response header, not the 301 redirect. Thanks for any help you can provide.
Moz Pro | | fdb0