Finding the source of duplicate content URL's

DocdataCommerce

We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)

However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.

My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?

Mark_Ginsberg

Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.

Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.

good luck and let me know if you need any more help with this.

Mark

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Finding the source of duplicate content URL's

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Find ClickBank Affiliates

Why SEOmoz bot consider these as duplicate pages?

URL paramters and duplicate content

Do I have to set up new SEOmoz campaigns after URL switch?

Excel tips or tricks for duplicate content madness?

My crawl diagnostic is showing 2 duplicate content and titles.

Can't find email address or contact form on website I want link from

Reducing duplicate content