Best way to deal with over 1000 pages of duplicate content?

benjmoz

Hi

Using the moz tools i have over a 1000 pages of duplicate content. Which is a bit of an issue!

95% of the issues arise from our news and news archive as its been going for sometime now.

We upload around 5 full articles a day. The articles have a standalone page but can only be reached by a master archive. The master archive sits in a top level section of the site and shows snippets of the articles, which if a user clicks on them takes them to the full page article. When a news article is added the snippets moves onto the next page, and move through the page as new articles are added.

The problem is that the stand alone articles can only be reached via the snippet on the master page and Google is stating this is duplicate content as the snippet is a duplicate of the article.

What is the best way to solve this issue?

From what i have read using a 'Meta NoIndex' seems to be the answer (not that i know what that is). from what i have read you can only use a canonical tag on a page by page basis so that going to take to long.

Thanks Ben

benjmoz

Hi Guys,

Thanks for your help.

I decided that updating the robot text would be the best option.

Ben

MattAntonino

Technically, your URL:

http://www.capitalspreads.com/news

is really:

http://www.capitalspreads.com/news/index.php

So just add this line to robots.txt:

Disallow: /news/index.php

You won't be disallowing the pages underneath it but you will be blocking the page that contains all dupe content.

Also, if you prefer to do this with a meta tag on the news page, you could always do "noindex, follow" to make sure Google follows the links - they just don't index the page.

Chris.Menke

It may not be helpful to you in this situation. I was just saying that if your server creates multiple URLs containing the same content, as long as those URLs also contain the identical rel=canonical directive, a single canonical version of that content will be established.

benjmoz

Hi Chris,

I've read about the canonicalization but from what i could work I'd have to tag each of the 400 plus page individually to solve the issue and i don't think this is the best use of anyone's time.

I don't under how placing the tag and pointing back at itself will help? Can you explain a little more.

Ideally i want the full article page to be indexed as this will be more beneficial to the user. By placing the canonical tag on the snippets page and pointing it to itself would i not be telling the spider this is the page to index?

Here some examples

http://www.capitalspreads.com/news - Snippets page

http://www.capitalspreads.com/news/uk-economic-recovery-will-take-years - Full article, that would ideally be the page that wants to be indexed.

Regards

Ben

Chris.Menke

Ben, you use the rel=canonical directive in the header of the page with the original source of the content (pointing to itself), every reproduction of that page that also contains the rel=canonical directive pointing to the original source. So it's not necessarily a page by page solution. Have you read through this yet? Canonicalization and the Canonical Tag - Learn SEO - Moz

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Best way to deal with over 1000 pages of duplicate content?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Duplicate Content Issues with Pagination

Handling of Duplicate Content

Do mobile and desktop sites that pull content from the same source count as duplicate content?

Duplicate Page Content and Titles from Weebly Blog

Duplicate Page Content

Duplicate Content Problems

Duplicate Content?

Once duplicate content found, worth changing page or forget it?