Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to Disallow Tag Pages With Robot.txt
-
Hi i have a site which i'm dealing with that has tag pages for instant -
http://www.domain.com/news/?tag=choice
How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt
Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed.
Any suggestions?
Cheers,
Mark
-
Hi Nakul, its Drupal
Mark
-
What CMS is it Mark ?
-
Thanks, is there a way to test it out before actually implementing it with the site.
The site is non-wordpress aswell.
Cheers,
Mark
-
I agree. I would suggest adding the noindex on the pages and letting the bots crawl them. Blocking them would prevent future crawl of these pages, but I am guessing you would also want to remove the existing pages.
Therefore add the noindex first, wait a few days and then add the disallow (Although technically if they are noindex, you don't really need the disallow).
-
Hi Mark
If your using Wordpress then I would recommend SEO Yoast to resolve the tag issue. If not then I suggest you amend the robots.txt file to resolve.
Here is an example:
Disallow: /?tag=
Disallow: /?subcats=
Disallow: /*?features_hash=NOTE:
Be very careful when blocking search engines. Test and test again!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt & Disallow: /*? Question!
Hi, I have a site where they have: Disallow: /*? Problem is we need the following indexed: ?utm_source=google_shopping What would the best solution be? I have read: User-agent: *
Intermediate & Advanced SEO | | vetofunk
Allow: ?utm_source=google_shopping
Disallow: /*? Any ideas?0 -
Why does Google rank a product page rather than a category page?
Hi, everybody In the Moz ranking tool for one of our client's (the client sells sport equipment) account, there is a trend where more and more of their landing pages are product pages instead of category pages. The optimal landing page for the term "sleeping bag" is of course the sleeping bag category page, but Google is sending them to a product page for a specific sleeping bag.. What could be the critical factors that makes the product page more relevant than the category page as the landing page?
Intermediate & Advanced SEO | | Inevo0 -
Disallow URLs ENDING with certain values in robots.txt?
Is there any way to disallow URLs ending in a certain value? For example, if I have the following product page URL: http://website.com/category/product1, and I want to disallow /category/product1/review, /category/product2/review, etc. without disallowing the product pages themselves, is there any shortcut to do this, or must I disallow each gallery page individually?
Intermediate & Advanced SEO | | jmorehouse0 -
Should I use meta noindex and robots.txt disallow?
Hi, we have an alternate "list view" version of every one of our search results pages The list view has its own URL, indicated by a URL parameter I'm concerned about wasting our crawl budget on all these list view pages, which effectively doubles the amount of pages that need crawling When they were first launched, I had the noindex meta tag be placed on all list view pages, but I'm concerned that they are still being crawled Should I therefore go ahead and also apply a robots.txt disallow on that parameter to ensure that no crawling occurs? Or, will Googlebot/Bingbot also stop crawling that page over time? I assume that noindex still means "crawl"... Thanks 🙂
Intermediate & Advanced SEO | | ntcma0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Is it a problem to use a 301 redirect to a 404 error page, instead of serving directly a 404 page?
We are building URLs dynamically with apache rewrite.
Intermediate & Advanced SEO | | lcourse
When we detect that an URL is matching some valid patterns, we serve a script which then may detect that the combination of parameters in the URL does not exist. If this happens we produce a 301 redirect to another URL which serves a 404 error page, So my doubt is the following: Do I have to worry about not serving directly an 404, but redirecting (301) to a 404 page? Will this lead to the erroneous original URL staying longer in the google index than if I would serve directly a 404? Some context. It is a site with about 200.000 web pages and we have currently 90.000 404 errors reported in webmaster tools (even though only 600 detected last month).0 -
301 - should I redirect entire domain or page for page?
Hi, We recently enabled a 301 on our domain from our old website to our new website. On the advice of fellow mozzer's we copied the old site exactly to the new domain, then did the 301 so that the sites are identical. Question is, should we be doing the 301 as a whole domain redirect, i.e. www.oldsite.com is now > www.newsite.com, or individually setting each page, i.e. www.oldsite.com/page1 is now www.newsite.com/page1 etc for each page in our site? Remembering that both old and new sites (for now) are identical copies. Also we set the 301 about 5 days ago and have verified its working but haven't seen a single change in rank either from the old site or new - is this because Google hasn't likely re-indexed yet? Thanks, Anthony
Intermediate & Advanced SEO | | Grenadi0 -
Blocking Dynamic URLs with Robots.txt
Background: My e-commerce site uses a lot of layered navigation and sorting links. While this is great for users, it ends up in a lot of URL variations of the same page being crawled by Google. For example, a standard category page: www.mysite.com/widgets.html ...which uses a "Price" layered navigation sidebar to filter products based on price also produces the following URLs which link to the same page: http://www.mysite.com/widgets.html?price=1%2C250 http://www.mysite.com/widgets.html?price=2%2C250 http://www.mysite.com/widgets.html?price=3%2C250 As there are literally thousands of these URL variations being indexed, so I'd like to use Robots.txt to disallow these variations. Question: Is this a wise thing to do? Or does Google take into account layered navigation links by default, and I don't need to worry. To implement, I was going to do the following in Robots.txt: User-agent: * Disallow: /*? Disallow: /*= ....which would prevent any dynamic URL with a '?" or '=' from being indexed. Is there a better way to do this, or is this a good solution? Thank you!
Intermediate & Advanced SEO | | AndrewY1