We've just discovered that there are multiple duplicate URLs indexed for a site that we're working on. It seems that when new versions of the site was developed in the last couple of years, there were new page names and URL structures that were used. All of these seem to be showing up as Duplicate Meta Descriptions in Google's WMT, which is not surprising as they are basically the same page with the same content that are just sitting on different page names/URLs.
This is an example of the situation, where URL 5 is the current version. Note: all the others are still live and resolve, although they are not linked to from the current site.
- URL 1: www.example.com/blue-tshirts.html (Version 1 - January 2010)
- URL 2: www.example.com/blue-t-shirts.html (Version 2 - July 2010)
- URL 3: www.example.com/blue_t_shirts.html (Version 3 - November 2010)
- URL 4: www.example.com/buy/blue_tshirts.html (Version 4 - January 2011)
- URL 5: www.example.com/buy/bluetshirts.html (Version 5 - April 2011)
Presumably, this is a clear case of duplicate content.
QUESTION: In order to solve it, shall we 301 all of the previous URLs to the current one - ie. Redirect URLs 1-4 to URL 5? Or, should some of them be NoIndexed?
To complicate matters, there is Pagination on most of them. For example:
-
URL 1: www.example.com/blue-tshirts.html (Version 1 - January 2010)
Since URL 5 is the current site, we are going to 'NoIndex, Follow' URLs 5a, 5b and 5c, which is what we understand to be the correct thing to do for paginated pages.
QUESTION: What shall we do with URLs 1a, 1b and 1c? Should we apply the same "No Index, Follow" OR should they be 301'd to their respective counterparts in 5a, 5b and 5c?
QUESTION: In the same way, since URL 4 is the version just before the current live Version 5, does it make a different on whether the paginated pages (ie 4a, 4b and 4c) should be No Indexed or 301'd?
Thanks in advance for all responses and suggestions, it's greatly appreciated.