Block /tag/ or not?
-
I've asked this question in another area but now i want to ask it as a bigger question. Do we block /tag/ with robots.txt or not. Here's why I ask:
My wordpress site does not block /tag/ and I have many /tag/ results in the top 10 results of Google. Have for months. The question is, does Google see /tag/ on WordPress as duplicate content? SEOMoz says it's duplicate content but it's a tag. It's not really content per say.
I'm all for optimizing my site but Google is not penalizing me for /tag/ results.
I don't want to block /tag/ if Google is not seeing it as duplicate content for only one reason and that's because I have many results in the top 10 on G.
So, can someone who knows more about this weigh in on the subject for I really would like a accurate answer.
Thanks in advance...
-
Thanks for all the info. Last question, does having a list of monthly archives on the bottom of my site hurt in terms of dup content? I just have at the bottom the month/year and when you click it, it shows all the posts in that month. Should I be removing this or does it matter?
-
It would be meta noindex. Yoast is my plugin of choice. Happen to have a little article right here if you need to see if its "safe" to remove them from a traffic standpoint.
-Dan
-
I use All in one SEO pack and have checked noindex for the tags and the categories and the archives. I suppose it doesn't make any difference if I do it there or in the robots.txt file. Either way their being blocked. Do you know if there's a penalty for having blocked them in WP and the robots file?
-
I'd say noindex, follow them - many SEO plugins can do this for you, Yoast SEO for example. That way Googs can still crawl them, which may assist with discovery, but won't index them.
-
Exactly what I was looking for. Thank you!
So, I suppose the best and proper way to block it is by robots.txt correct?
-
You mean "more about this" than me? I run 3 businesses on 3 Wordpress blogs. I've done the research. Many of my clients are Wordpress users. But here's what others think:
- Yoast thinks it's duplicate content: http://yoast.com/articles/wordpress-seo/#advancedseo
- David Fuller ranked for tags then didn't: http://www.seomoz.org/q/wordpress-tags-duplicate-content Same link Dan at Evolving thinks you should noindex tags as well.
- WPMU and Matt Cutts think it's duplicate content: http://wpmu.org/categories-tags-and-how-to-avoid-duplicate-content-on-wordpress/
- How to Tech thinks it's duplicate content: http://howtotechtips.com/remove-wordpress-duplicate-content-search-results-and-tags-from-google/
- As you said, SEOMoz thinks it's duplicate content.
- Many Warriors suggest noindexing tags for dupe content reasons: http://www.warriorforum.com/adsense-ppc-seo-discussion-forum/373744-wordpress-tags-death-me-duplicate-content-question.html
- 3 other pro SEOs say to noindex here: http://www.seomoz.org/q/solving-link-and-duplicate-content-errors-created-by-wordpress-blog-and-tags
Google search shows
_No results found for _"tags do not create duplicate content".
No results found for "tags are not duplicate content".
And 2.5 million results for tags "duplicate content"
The short term answer is that you're ranking for them now so leave them be.
The long term answer is it's duplicate content and you need to fix it.
Even if your tag pages don't show the entire post, multiple tag pages show the same excerpt. This is duplicate content. By itself - not even talking about the post.
**You said: **_SEOMoz says it's duplicate content but it's a tag. It's not really content per say. _
If you want to see with your own eyes the duplicate content, please post a URL.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Canonical Tags for Legacy Duplicate Content
I've got a lot of duplicate pages, especially products, and some are new but most have been like this for a long time; up to several years. Does it makes sense to use a canonical tag pointing to one master page for each product. Each page is slightly different with a different feature and includes maybe a sentence or two that is unique but everything else is the same.
Technical SEO | | AmberHanson0 -
Blocking Google from telemetry requests
At Magnet.me we track the items people are viewing in order to optimize our recommendations. As such we fire POST requests back to our backends every few seconds when enough user initiated actions have happened (think about scrolling for example). In order to eliminate bots from distorting statistics we ignore their values serverside. Based on some internal logging, we see that Googlebot is also performing these POST requests in its javascript crawling. In a 7 day period, that amounts to around 800k POST requests. As we are ignoring that data anyhow, and it is quite a number, we considered reducing this for bots. Though, we had several questions about this:
Technical SEO | | rogier_slag
1. Do these requests count towards crawl budgets?
2. If they do, and we'd want to prevent this from happening: what would be the preferred option? Either preventing the request in the frontend code, or blocking the request using a robots.txt line? The latter question is given by the fact that a in-app block for the request could lead to different behaviour for users and bots, and may be Google could penalize that as cloaking. The latter is slightly less convenient from a development perspective, as all logic is spread throughout the application. I'm aware one should not cloak, or makes pages appear differently to search engine crawlers. However these requests do not change anything in the pages behaviour, and purely send some anonymous data so we can improve future recommendations.0 -
Blocking subdomains without blocking sites...
So let's say I am working for bloggingplatform.com, and people can create free sites through my tools and those sites show up as myblog.bloggingplatform.com. However that site can also be accessed from myblog.com. Is there a way, separate from editing the myblog.com site code or files, for me to tell google to stop indexing myblog.bloggingplatform.com while still letting them index myblog.com without inserting any code into the page load? This is a simplification of a problem I am running across. Basically, Google is associating subdomains to my domain that it shouldn't even index, and it is adversely affecting my main domain. Other than contacting the offending sub-domain holders (which we do), I am looking for a way to stop Google from indexing those domains at all (they are used for technical purposes, and not for users to find the sites). Thoughts?
Technical SEO | | SL_SEM1 -
Correct linking to the /index of a site and subfolders: what's the best practice? link to: domain.com/ or domain.com/index.html ?
Dear all, starting with my .htaccess file: RewriteEngine On
Technical SEO | | inlinear
RewriteCond %{HTTP_HOST} ^www.inlinear.com$ [NC]
RewriteRule ^(.*)$ http://inlinear.com/$1 [R=301,L] RewriteCond %{THE_REQUEST} ^./index.html
RewriteRule ^(.)index.html$ http://inlinear.com/ [R=301,L] 1. I redirect all URL-requests with www. to the non www-version...
2. all requests with "index.html" will be redirected to "domain.com/" My questions are: A) When linking from a page to my frontpage (home) the best practice is?: "http://domain.com/" the best and NOT: "http://domain.com/index.php" B) When linking to the index of a subfolder "http://domain.com/products/index.php" I should link also to: "http://domain.com/products/" and not put also the index.php..., right? C) When I define the canonical ULR, should I also define it just: "http://domain.com/products/" or in this case I should link to the definite file: "http://domain.com/products**/index.php**" Is A) B) the best practice? and C) ? Thanks for all replies! 🙂
Holger0 -
Wiki/Knowledge bases
Hi A client of mine is creating a knowledge base/wiki for their website. There using there suppliers own knowledge base (basically their a reseller). What would be the best practice with regards to duplicate content. Would it be best to make all the pages "no follow"? and block the pages by the robot.txt?
Technical SEO | | Cocoonfxmedia0 -
Too Many noindex,follow Tags
Can you have too many pages on one site with noindex,follow tags? Just curious, because we're looking to noindex,follow lesser important pages such as bios about our team, privacy and terms, etc.
Technical SEO | | Prospector-Plastics0 -
Title tag same text as H1?
What is the group's opinion on whether or not the <title>tag should have the exact same text as the <h1> tag on the same page? Obviously both should contain the phrase that page is optimized for but is it better to have them be variants of each other, or both the same and maybe equal to the key phrase that page is optimized for? Thanks.</p> <p>Example:</p> <blockquote style="background: none repeat scroll 0% 0% #f7f7f7; padding-top: 5px; margin-left: 0px; padding-left: 2px; padding-bottom: 5px; white-space: nowrap; overflow-y: auto; font-family: monospace;"> <p>title: los angeles blue widgets</p> <p>h1: los angeles blue widgets</p> </blockquote> <p>Or,</p> <blockquote style="background: none repeat scroll 0% 0% #f7f7f7; padding-top: 5px; margin-left: 0px; padding-left: 2px; padding-bottom: 5px; white-space: nowrap; overflow-y: auto; font-family: monospace;"> <p>title: los angeles blue widgets</p> <p>h1: blue widgets in los angeles</p> </blockquote> <p>Where the page is trying to optimize for "los angeles blue widgets"</p></title>
Technical SEO | | scanlin0 -
Robots.txt and canonical tag
In the SEOmoz post - http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts, it's being said - If you have a robots.txt disallow in place for a page, the canonical tag will never be seen. Does it so happen that if a page is disallowed by robots.txt, spiders DO NOT read the html code ?
Technical SEO | | seoug_20050