Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to Disallow Tag Pages With Robot.txt
-
Hi i have a site which i'm dealing with that has tag pages for instant -
http://www.domain.com/news/?tag=choice
How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt
Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed.
Any suggestions?
Cheers,
Mark
-
Hi Nakul, its Drupal
Mark
-
What CMS is it Mark ?
-
Thanks, is there a way to test it out before actually implementing it with the site.
The site is non-wordpress aswell.
Cheers,
Mark
-
I agree. I would suggest adding the noindex on the pages and letting the bots crawl them. Blocking them would prevent future crawl of these pages, but I am guessing you would also want to remove the existing pages.
Therefore add the noindex first, wait a few days and then add the disallow (Although technically if they are noindex, you don't really need the disallow).
-
Hi Mark
If your using Wordpress then I would recommend SEO Yoast to resolve the tag issue. If not then I suggest you amend the robots.txt file to resolve.
Here is an example:
Disallow: /?tag=
Disallow: /?subcats=
Disallow: /*?features_hash=NOTE:
Be very careful when blocking search engines. Test and test again!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why does Google rank a product page rather than a category page?
Hi, everybody In the Moz ranking tool for one of our client's (the client sells sport equipment) account, there is a trend where more and more of their landing pages are product pages instead of category pages. The optimal landing page for the term "sleeping bag" is of course the sleeping bag category page, but Google is sending them to a product page for a specific sleeping bag.. What could be the critical factors that makes the product page more relevant than the category page as the landing page?
Intermediate & Advanced SEO | | Inevo0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Wildcarding Robots.txt for Particular Word in URL
Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon
Intermediate & Advanced SEO | | EvansHunt0 -
Is a 301 Redirect and a Canonical Tag on Uppercase to Lowercase Pages Correct?
We have a medium size site that lost more than 50% of its traffic in July 2013 just before the Panda rollout. After working with a SEO agency, we were advised to clean up various items, one of them being that the 10k+ urls were all mixed case (i.e. www.example.com/Blue-Widget). A 301 redirect was set up thereafter forcing all these urls to go to a lowercase version (i.e. www.example.com/blue-widget). In addition, there was a canonical tag placed on all of these pages in case any parameters or other characters were incorporated into a url. I thought this was a good set up, but when running a SEO audit through a third party tool, it shows me the massive amount of 301 redirects. And, now I wonder if there should only be a canonical without the redirect or if its okay to have tens of thousands 301 redirects on the site. We have not recovered yet from the traffic loss yet and we are wondering if its really more of a technical problem than a Google penalty. Guidance and advise from those experienced in the industry is appreciated.
Intermediate & Advanced SEO | | ABK7170 -
Baidu Spider appearing on robots.txt
Hi, I'm not too sure what to do about this or what to think of it. This magically appeared in my companies robots.txt file (literally magically appeared/text is below) User-agent: Baiduspider
Intermediate & Advanced SEO | | IceIcebaby
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: / I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website? Thanks for your help,
-Reed0 -
Should my back links go to home page or internal pages
Right now we rank on page 2 for many KWs, so should i now focus my attention on getting links to my home page to build domain authority or continue to direct links to the internal pages for specific KWs? I am about to write some articles for several good ranking sites and want to know whether to link my company name (same as domain name) or KW to the home page or use individual KWs to the internal pages - I am only allowed one link per article to my site. Thanks Ash
Intermediate & Advanced SEO | | AshShep10 -
Robots.txt: how to exclude sub-directories correctly?
Hello here, I am trying to figure out the correct way to tell SEs to crawls this: http://www.mysite.com/directory/ But not this: http://www.mysite.com/directory/sub-directory/ or this: http://www.mysite.com/directory/sub-directory2/sub-directory/... But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way: disallow: /directory/sub-directory/ disallow: /directory/sub-directory2/ disallow: /directory/sub-directory/sub-directory/ disallow: /directory/sub-directory2/subdirectory/ etc... I would end up having thousands of definitions to disallow all the possible sub-directory combinations. So, is the following way a correct, better and shorter way to define what I want above: allow: /directory/$ disallow: /directory/* Would the above work? Any thoughts are very welcome! Thank you in advance. Best, Fab.
Intermediate & Advanced SEO | | fablau1 -
NOINDEX listing pages: Page 2, Page 3... etc?
Would it be beneficial to NOINDEX category listing pages except for the first page. For example on this site: http://flyawaysimulation.com/downloads/101/fsx-missions/ Has lots of pages such as Page 2, Page 3, Page 4... etc: http://www.google.com/search?q=site%3Aflyawaysimulation.com+fsx+missions Would there be any SEO benefit of NOINDEX on these pages? Of course, FOLLOW is default, so links would still be followed and juice applied. Your thoughts and suggestions are much appreciated.
Intermediate & Advanced SEO | | Peter2640