Finding the source of duplicate content URL's

DocdataCommerce

We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)

However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.

My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?

Mark_Ginsberg

Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.

Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.

good luck and let me know if you need any more help with this.

Mark

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Finding the source of duplicate content URL's

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Large site with content silo's - best practice for deep indexing silo content

Are the metrics in Moz's SERP analysis relevant in my area?

Where has the old seomoz crawl tool gone? I can't seem to find it

Duplicate Errors found in my search

What's the future of SERP Tracking? And... Is SEOMoz's SERP Rank Tracking in compliance with Google Adwords API Terms of Service?

Mozcape API Batching URLs LIMIT

Is there a way to specify what SEOmoz classes as duplicate content?

Why does my crawl diagnostics show duplicate content