Blog tags are creating excessive duplicate content...should we use rel canonicals or 301 redirects?
-
We are having an issue with our cilent's blog creating excessive duplicate content via blog tags. The duplicate webpages from tags offer absolutely no value (we can't even see the tag). Should we just 301 redirect the tagged page or use a rel canonical?
-
The easiest way to resolve issues with tags is to noindex them. I wrote a post about how you can safely do this: http://www.evolvingseo.com/2012/08/10/clean-sweep-yo-tag-archives-now (you basically just double check to see if they are receiving traffic, and leave the few that receive traffic via search indexed).
But at the root level it comes down to knowing how to use tags correctly on a blogging platform to begin with - and knowing how they function, and what happens when you tag something.
First off, tagging any post creates a new page called a "tag archive". The only way someone can get to tag archives by default is if you allow some sort of navigation or links to them on the site itself. This is usually in the form of a "tag cloud" (sidebar or footer) or at the bottom of posts when it says "tagged in....." and links to the tags.
Then if they are internally linked to, they will get indexed (unless you noindex them like I have suggested above). They are typically low to no-value pages because most bloggers just tag everything, and use lots of tags per post. Then you end up with hundreds of pages (tag archives) with no value.
So noindexing them is the safest way to go, except for very extreme cases where a blogger uses them 100% perfect (which is rare, so I always assume most people asking should just noindex but use my post to check for traffic to any of them first).
-
Thanks for chiming in! Just to reiterate something - canonical tags are only a suggestion, not a hard directive. Google can and does ignore them. The canonical tag and also pass noindexing directives to the page you point them at. So with tag archives, if they are set to noindex and you canonical them to posts, you might deindex your posts.
And finally, canonical is only something that should be used that can't be solved via indexation, crawling or architecture solutions. In the case of tags in a blogging system (probably wordpress) the easiest and 100% definite way to handle tags is just to noindex them. Then you don't need to worry about canonicals or duplicate content.
Also, tags are no harmful because of duplicate content per se, but just that they add a lot of unneeded pages to the index.
-
You can set tags to noindex/follow. If you're using WordPress and one of the more popular SEO plugins, this could be done with a couple of clicks. But are these tags actually generating duplicate content? Usually a snippet of the tagged posts isn't considered duplicate.
Anyway, noindex should be more effective than it was in the past. And as Highland has said, setting a canonical would be a good idea as well.
If the tags aren't really helping out site users, they aren't using them - etc., and they don't have any link equity - you could just 410 them. Plus you could submit the tag URLs for removal in GWT.
So check the referral traffic and backlinks for those pages and go with either removal or noindex follow and a canonical.
-
Canonical hands down. This is what canonical was made for anyways: duplicate content you can't remove.
Canonical simply lets you tell Google which duplicate content should "win" the indexation race and Google will take it into consideration. I can think of many reasons why you'd have overlapping tags but would not want to remove them (which is what a 301 would do)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Are there any downsides to using a canonical tag temporarily?
I'm working on redesigning our website. One of the content types has a main archive page (/success-stories) containing all of the success stories (written by graduates of our program). Because we plan to have success stories for other people (non-graduates), I'm using category hierarchies (/success-stories/graduates and success-stories/nonprofits, for example). It will go one level deeper to organize graduates by graduation year (/success-stories/graduates/%year%). I think this will work out well. However, we won't have non-graduate success stories for a little while, probably at least a few weeks, which means that /success-stories and /.../graduates indices will contain the same content for a while. So my question is this: Will it hurt to use a canonical tag that points to /success-stories/graduates as the authority until the main archive page contains more than just graduates? Or would it be better to use a 302 redirect from /success-stories to /.../graduates until more diverse content is added?
Intermediate & Advanced SEO | | bcaples0 -
I've got duplicate pages. For example, blog/page/2 is the same as author/admin/page/2\. Is this something I should just ignore, or should I create the author/admin/page2 and then 301 redirect?
I'm going through the crawl report and it says I've got duplicate pages. For example, blog/page/2 is the same as author/admin/page/2/ Now, the author/admin/page/2 I can't even find in WordPress, but it is the same thing as blog/page/2 nonetheless. Is this something I should just ignore, or should I create the author/admin/page2 and then 301 redirect it to blog/page/2?
Intermediate & Advanced SEO | | shift-inc0 -
Removing Blogs and 301 redirect to blog home page?
Hi, I was at the MozCon conference in Seattle this Summer and heard great concepts about deleting a lot of pages on your site that are deemed excess. It got me thinking to remove all of our old blogs that were: Sales(ee) less than 400 words Flat out bad blogs When i begin removing these links, i know i will get a lot of 404 errors because of previous social links. So in your opinion, what would you do? Do i just 301 those blogs to my main /blog page? Thanks
Intermediate & Advanced SEO | | Shawn1240 -
How do I best handle Duplicate Content on an IIS site using 301 redirects?
The crawl report for a site indicates the existence of both www and non-www content, which I am aware is duplicate. However, only the www pages are indexed**, which is throwing me off. There are not any 'no-index' tags on the non-www pages and nothing in robots.txt and I can't find a sitemap. I believe a 301 redirect from the non-www pages is what is in order. Is this accurate? I believe the site is built using asp.net on IIS as the pages end in .asp. (not very familiar to me) There are multiple versions of the homepage, including 'index.html' and 'default.asp.' Meta refresh tags are being used to point to 'default.asp'. What has been done: 1. I set the preferred domain to 'www' in Google's Webmaster Tools, as most links already point to www. 2. The Wordpress blog which sits in a /blog subdirectory has been set with rel="canonical" to point to the www version. What I have asked the programmer to do: 1. Add 301 redirects from the non-www pages to the www pages. 2. Set all versions of the homepage to redirect to www.site.org using 301 redirects as opposed to meta refresh tags. Have all bases been covered correctly? One more concern: I notice the canonical tags in the source code of the blog use a trailing slash - will this create a problem of inconsistency? (And why is rel="canonical" the standard for Wordpress SEO plugins while 301 redirects are preferred for SEO?) Thanks a million! **To clarify regarding the indexation of non-www pages: A search for 'site:site.org -inurl:www' returns only 7 pages without www which are all blog pages without content (Code 200, not 404 - maybe deleted or moved - which is perhaps another 301 redirect issue).
Intermediate & Advanced SEO | | kimmiedawn0 -
Is it possible to "undo" canonical tags as unique content is created?
We will soon be launching an education site that teaches people how to drive (not really the topic, but it will do). We plan on being content rich and have plans to expand into several "schools" of driving. Currently, content falls into a number of categories, for example rules of the road, shifting gears, safety, etc. We are going to group content into general categories that apply broadly, and then into "schools" where the content is meant to be consumed in a specific order. So, for example, some URLs in general categories may be: drivingschool.com/safety drivingschool.com/rules-of-the-road drivingschool.com/shifting-gears etc. Then, schools will be available for specific types of vehicles. For example, drivingschool.com/cars drivingschool.com/motorbikes etc. We will provide lessons at the school level, and in the general categories. This is where it gets tricky. If people are looking for general content, then we want them to find pages in the general categories (for example, drivingschool.com/rules-of-the-road/traffic-signs). However, we have very similar content within each of the schools (for example, drivingschool.com/motorbikes/rules-of-the-road/traffic-signs). As you could imagine, sometimes the content is very unique between the various schools and the general category (such as in shifting), but often it is very similar or even nearly duplicate (as in the example above). The problem is that in the schools we want to say at the end of the lesson, "after this lesson, take the next lesson about speed limits for motorcycles" so there is a very logical click-path through the school. Unfortunately this creates potential duplicate content issues. The best solution I've come up with is to include a canonical tag (pointing to the general version of the page) whenever there is content that is virtually identical. There will be cases though where we adjust the content "down the road" 🙂 to be more unique and more specific for the school. At that time we'd want to remove the canonical tag. So two questions: Does anyone have any better ideas of how to handle this duplicate content? If we implement canonical tags now, and in 6 months update content to be more school-specific, will "undoing" the canonical tag (and even adding a self-referential tag) work for SEO? I really hope someone has some insight into this! Many thanks (in advance).
Intermediate & Advanced SEO | | JessicaB0 -
How to manage duplicate content?
I have a real estate site that contains a large amount of duplicate content. The site contains listings that appear both on my clients website and on my competitors websites(who have better domain authority). It is critical that the content is there because buyers need to be able to find these listings to make enquiries. The result is that I have a large number pages that contain duplicate content in some way, shape or form. My search results pages are really the most important ones because these are the ones targeting my keywords. I can differentiate these to some degree but the actual listings themselves are duplicate. What strategies exist to ensure that I'm not suffereing as a result of this content? Should I : Make the duplicate content noindex. Yes my results pages will have some degree of duplicate content but each result only displays a 200 character summary of the advert text so not sure if that counts. Would reducing the amount of visible duplicate content improve my rankings as a whole? Link back to the clients site to indicate that they are the original source Any suggestions?
Intermediate & Advanced SEO | | Mulith0 -
Duplicate page content
Hi. I am getting error of having duplicate content on my website and pages its showing there are: www.mysitename.com www.mysitename.com/index.html As my best knowledge it only one page, I know this can be solved with some conical tag used in header, but do not know how. Can anyone please tell me about that code or any other way to get this solved. Thanks
Intermediate & Advanced SEO | | onlinetraffic0 -
Use rel=canonical to save otherwise squandered link juice?
Oftentimes my site has content which I'm not really interested in having included in search engine results. Examples might be a "view cart" or "checkout" page, or old products in the catalog that are no longer available in our system. In the past, I'd blocked those pages from being indexed by using robots.txt or nofollowed links. However, it seems like there is potential link juice that's being lost by removing these from search engine indexes. What if, instead of keeping these pages out of the index completely, I use to reference the home page (http://www.mydomain.com) of the business? That way, even if the pages I don't care about accumulate a few links around the Internet, I'll be capturing the link juice behind the scenes without impacting the customer experience as they browse our site. Is there any downside of doing this, or am I missing any potential reasons why this wouldn't work as expected?
Intermediate & Advanced SEO | | cadenzajon1