Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to Disallow Tag Pages With Robot.txt
-
Hi i have a site which i'm dealing with that has tag pages for instant -
http://www.domain.com/news/?tag=choice
How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt
Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed.
Any suggestions?
Cheers,
Mark
-
Hi Nakul, its Drupal
Mark
-
What CMS is it Mark ?
-
Thanks, is there a way to test it out before actually implementing it with the site.
The site is non-wordpress aswell.
Cheers,
Mark
-
I agree. I would suggest adding the noindex on the pages and letting the bots crawl them. Blocking them would prevent future crawl of these pages, but I am guessing you would also want to remove the existing pages.
Therefore add the noindex first, wait a few days and then add the disallow (Although technically if they are noindex, you don't really need the disallow).
-
Hi Mark
If your using Wordpress then I would recommend SEO Yoast to resolve the tag issue. If not then I suggest you amend the robots.txt file to resolve.
Here is an example:
Disallow: /?tag=
Disallow: /?subcats=
Disallow: /*?features_hash=NOTE:
Be very careful when blocking search engines. Test and test again!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Htaccess - Redirecting TAG or Category pages
Hello Fellow Moz's, We have an issue redirecting some /TAG and /Category pages to inner pages. As an example we use: RedirectMatch 301 /category/Sample-Category(.*) https://OurDomain.com.au/New-Page//$1 That works well. The issue is we have other categories and tags that are named similar to /Sample-Category As an example, if we try to redirect /Sample-Category-1 to /New-Page-1 - it will not work, and redirects to /New-Page I assume this is because /Sample-Category is already being redirected, so anything after /Sample-Category like -1 or -2 or -3 etc, will not be recognized. Anyone know of a workaround?
Intermediate & Advanced SEO | | Jes-Extender-Australia0 -
Redirecting homepage to internal page (2nd Tier page)
We are planning to experiment redirecting our homepage to one of the 2nd tier page. I mean....example.com to example.com/page. We need this page to rank well, but it doesn't have much internal links or external back-links, so we opt for this redirect. Advantage with this page is, it has "keyword" we want to rank for in URL. "page" in example.com/page. Will this help or hurt us in SEO? I think we are missing keyword in our root domain, so interested to highlight this page. Thanks, Satish
Intermediate & Advanced SEO | | vtmoz0 -
Wildcarding Robots.txt for Particular Word in URL
Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon
Intermediate & Advanced SEO | | EvansHunt0 -
On 1 of our sites we have our Company name in the H1 on our other site we have the page title in our H1 - does anyone have any advise about the best information to have in the H1, H2 and Page Tile
We have 2 sites that have been set up slightly differently. On 1 site we have the Company name in the H1 and the product name in the page title and H2. On the other site we have the Product name in the H1 and no H2. Does anyone have any advise about the best information to have in the H1 and H2
Intermediate & Advanced SEO | | CostumeD0 -
H3 Tags - Should I Link to my content Articles- ? And do I have to many H3 tags/ Links as it is ?
Hello All, On my ecommerce landing pages, I currently have links to my products as H3 Tags. I also have useful guides displayed on the page with links useful articles we have written (they currently go to my news section). I am wondering if I should put those article links as additional H3 tags as well for added seo benefit or do I have to many tags as it is ?. A link to my Landing Page I am talking about is - http://goo.gl/h838RW Screenshot of my h1-h6 tags - http://imgur.com/hLtX0n7 I enclose screenshot my guides and also of my H1-H6 tags. Any advice would be greatly appreciated. thanks Peter
Intermediate & Advanced SEO | | PeteC120 -
Adding a Canonical Tag to each page referencing itself?
Hey Mozers! I've noticed that on www.Zappos.com they have a Canonical tag on each page referencing it self. I have heard that this is a popular method but I dont see the point in canon tagging a page to its self. Any thoughts?
Intermediate & Advanced SEO | | rpaiva0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Using 2 wildcards in the robots.txt file
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string. So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string? Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on? Thanks.
Intermediate & Advanced SEO | | seo1234560