Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to Disallow Tag Pages With Robot.txt
-
Hi i have a site which i'm dealing with that has tag pages for instant -
http://www.domain.com/news/?tag=choice
How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt
Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed.
Any suggestions?
Cheers,
Mark
-
Hi Nakul, its Drupal
Mark
-
What CMS is it Mark ?
-
Thanks, is there a way to test it out before actually implementing it with the site.
The site is non-wordpress aswell.
Cheers,
Mark
-
I agree. I would suggest adding the noindex on the pages and letting the bots crawl them. Blocking them would prevent future crawl of these pages, but I am guessing you would also want to remove the existing pages.
Therefore add the noindex first, wait a few days and then add the disallow (Although technically if they are noindex, you don't really need the disallow).
-
Hi Mark
If your using Wordpress then I would recommend SEO Yoast to resolve the tag issue. If not then I suggest you amend the robots.txt file to resolve.
Here is an example:
Disallow: /?tag=
Disallow: /?subcats=
Disallow: /*?features_hash=NOTE:
Be very careful when blocking search engines. Test and test again!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocked internal resources Wordpress
Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2 -
URL structure - Page Path vs No Page Path
We are currently re building our URL structure for eccomerce websites. We have seen a lot of site removing the page path on product pages e.g. https://www.theiconic.co.nz/liberty-beach-blossom-shirt-680193.html versus what would normally be https://www.theiconic.co.nz/womens-clothing-tops/liberty-beach-blossom-shirt-680193.html Should we be removing the site page path for a product page to keep the url shorter or should we keep it? I can see that we would loose the hierarchy juice to a product page but not sure what is the right thing to do.
Intermediate & Advanced SEO | | Ashcastle0 -
Is it okay to copy and paste on page content into the meta description tag?
I have heard conflicting answers to this. I always figured that it was okay to selectively copy and paste on page content into the meta description tag.....especially if the onpage content is well written. How can it be duplicate content if it's pulling from the exact same page? Does anybody have any feedback from a credible source about this? Thanks.
Intermediate & Advanced SEO | | VanguardCommunications1 -
Should comments and feeds be disallowed in robots.txt?
Hi My robots file is currently set up as listed below. From an SEO point of view is it good to disallow feeds, rss and comments? I feel allowing comments would be a good thing because it's new content that may rank in the search engines as the comments left on my blog often refer to questions or companies folks are searching for more information on. And the comments are added regularly. What's your take? I'm also concerned about the /page being blocked. Not sure how that benefits my blog from an SEO point of view as well. Look forward to your feedback. Thanks. Eddy User-agent: Googlebot Crawl-delay: 10 Allow: /* User-agent: * Crawl-delay: 10 Disallow: /wp- Disallow: /feed/ Disallow: /trackback/ Disallow: /rss/ Disallow: /comments/feed/ Disallow: /page/ Disallow: /date/ Disallow: /comments/ # Allow Everything Allow: /*
Intermediate & Advanced SEO | | workathomecareers0 -
Meta Tag Force Page Refresh - Good or Bad?
I had recently come across a meta tag that could cause a auto refresh on a users browser when implemented. I have been using it for a redesign and was curious if there could be any negative effects for using it, here is the code: All input is appreciated. Ciao, Todd Richard
Intermediate & Advanced SEO | | RichFinnSEO0 -
Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search
Hi! I have pages within my forum where visitors can upload photos. When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed. The industry however is one that really leans on images and having the images in Google Image search is important to us. The url structure is like such: domain.com/community/photos/~username~/picture111111.aspx I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results. This would be something like this: User-agent: googlebot Disallow: /community/photos/ Can I disallow Googlebot specifically rather than just using User-agent: * which would then allow googlebot-image to pick up the photos? I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible? Thanks! Leona
Intermediate & Advanced SEO | | HD_Leona0 -
Rel=canonical tag on original page?
Afternoon All,
Intermediate & Advanced SEO | | Jellyfish-Agency
We are using Concrete5 as our CMS system, we are due to change but for the moment we have to play with what we have got. Part of the C5 system allows us to attribute our main page into other categories, via a page alaiser add-on. But what it also does is create several url paths and duplicate pages depending on how many times we take the original page and reference it in other categories. We have tried C5 canonical/SEO add-on's but they all seem to fall short. We have tried to address this issue in the most efficient way possible by using the rel=canonical tag. The only issue is the limitations of our cms system. We add the canonical tag to the original page header and this will automatically place this tag on all the duplicate pages and in turn fix the problem of duplicate content. The only problem is the canonical tag is on the original page as well, but it is referencing itself, effectively creating a tagging circle. Does anyone foresee a problem with the canonical tag being on the original page but in turn referencing itself? What we have done is try to simplify our duplicate content issues. We have over 2500 duplicate page issues because of this aliasing add-on and want to automate the canonical tag addition, rather than go to each individual page and manually add this tag, so the original reference page can remain the original. We have implemented this tag on one page at the moment with 9 duplicate pages/url's and are monitoring, but was curious if people had experienced this before or had any thoughts?0 -
Should I prevent Google from indexing blog tag and category pages?
I am working on a website that has a regularly updated Wordpress blog and am unsure whether or not the category and tag pages should be indexable. The blog posts are often outranked by the tag and category pages and they are ultimately leaving me with a duplicate content issue. With this in mind, I assumed that the best thing to do would be to remove the tag and category pages from the index, but after speaking to someone else about the issue, I am no longer sure. I have tried researching online, but there isn't anything that provided any further information. Please can anyone with any experience of dealing with issues like this or with any knowledge of the topic help me to resolve this annoying issue. Any input will be greatly appreciated. Thanks Paul
Intermediate & Advanced SEO | | PaulRogers0