"Duplicate" Page Titles and Content

Horizon

Hi All,

This is a rather lengthy one, so please bear with me!

SEOmoz has recently crawled 10,000 webpages from my site, FrenchEntree, and has returned 8,000 errors of duplicate page content. The main reason I have so many is because of the directories I have on site.

The site is broken down into 2 levels of hierachy. "Weblets" and "Articles". A weblet is a landing page, and articles are created within these weblets. Weblets can hold any number of articles - 0 - 1,000,000 (in theory) and an article must be assigned to a weblet in order for it to work. Here's how it roughly looks in URL form - http://www.mysite.com/[weblet]/[articleID]/

Now; our directory results pages are weblets with standard content in the left and right hand columns, but the information in the middle column is pulled in from our directory database following a user query. This happens by adding the query string to the end of the URL. We have 3 main directory databases, but perhaps around 100 weblets promoting various 'canned' queries that users may want to navigate straight into. However, any one of the 100 directory promoting weblets could return any query from the parent directory database with the correct query string. The problem with this method (as pointed out by the 8,000 errors) is that each possible permutation of search is considered to be it's own URL, and therefore, it's own page.

The example I will use is the first alphabetically. "Activity Holidays in France":

http://www.frenchentree.com/activity-holidays-france/ - This link shows you a results weblet without the query at the end, and therefore only displays the left and right hand columns as populated.

http://www.frenchentree.com/activity-holidays-france/home.asp?CategoryFilter= - This link shows you the same weblet with the an 'open' query on the end. I.e. display all results from this database. Listings are displayed in the middle.

There are around 500 different URL permutations for this weblet alone when you take into account the various categories and cities a user may want to search in.

What I'd like to do is to prevent SEOmoz (and therefore search engines) from counting each individual query permutation as a unique page, without harming the visibility that the directory results received in SERPs. We often appear in the top 5 for quite competitive keywords and we'd like it to stay that way. I also wouldn't want the search engine results to only display (and therefore direct the user through to) an empty weblet by some sort of robot exclusion or canonical classification.

Does anyone have any advice on how best to remove the "duplication" problem, whilst keeping the search visibility? All advice welcome.

Thanks

Matt

Horizon

Thanks for the swift response, Gianluca. I think I understand the problem you have pointed out, but I'm rather surprised that it has been set up in such a way... Or that that would have more of an adverse affect than multiple URLs with the same standard content. I'm willing to change that to see if it fixes the problem though.

Please take all of the time you need... It is a very large site which has been pieced together, bit-by-bit, over many years!

Matt

IPINGlobal54

In addition to Gianluca's response there, the pages that you tag with "noindex,follow" (i.e. the duplicates) add a canonical tag pointing at the original page.

gfiorelli1

I think your problem of duplicated content is also due the pagination your categories (or no categories search result) have. Checking the second url you gave http://www.frenchentree.com/activity-holidays-france/home.asp?order=Sort1&option=&CategoryFilter=&webname=activity-holidays-france&webname=activity-holidays-france&pagenumber=1 and it "second" page http://www.frenchentree.com/activity-holidays-france/home.asp?order=Sort1&option=&CategoryFilter=&webname=activity-holidays-france&pagenumber=2 I noticed that you have the meta robots in the head... therefore the bots see and index all these paginated content, that is a substantial duplicate of page 1. I suggest you to start adding the noindex,follow meta robots in these pages. About other duplication issues... give me time, as your site is not so easy

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

"Duplicate" Page Titles and Content

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Internal search pages (and faceted navigation) solutions for 2018! Canonical or meta robots "noindex,follow"?

"Null" appearing as top keyword in "Content Keywords" under Google index in Google Search Console

Duplicate Multi-site Content, Duplicate URLs

Redirect old "not found" url (at http) to new corresponding page (now at https)

Is This Considered Duplicate Content?

Content per page?

Best to Post Dynamic Content (Listings) under "Posts" in Wordpress?

Duplicate content - canonical vs link to original and Flash duplication