Finding the source of duplicate content URL's
-
We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)
However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.
My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?
-
Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.
Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.
good luck and let me know if you need any more help with this.
Mark
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Source page showsI have 2 h1 tags on my page. I can only find one.
When I grade my page it says I have more than one h1 tag. I view the source page and it shows there are two h1 headings with the same wording. If I delete the one h1 heading I can find, the page source shows I have deleted both of them. I don't know how to get to the other heading to delete it. And I'm off page one of google! Can anybody help? Clay Stephens
Moz Pro | | Coot0 -
Duplicate content nightmare
Hey Moz Community I ran a crawl test, and there is a lot of duplicate content but I cannot work out why. It seems that when I publish a post secondary urls are being created depending on some tags and categories. Or at least, that is what it looks like. I don't know why this is happening, nor do I know if I need to do anything about it. Help? Please.
Moz Pro | | MobileDay0 -
How to Avoid Duplicate Page Content errors when using Wordpress Categories & Tags?
I get a lot of duplicate page errors on my crawl diagnostics reports from 'categories' and 'tags' on my wordpress sites. The post is 1x link and then the content is 'duplicated' on the 'category' or 'tag' that is added to the page. Should I exclude the tags and categories from my sitemap or are these issues not that important? Thanks for your help Stacey
Moz Pro | | skehoe1 -
How does SEOmoz pull its duplicate page title and content information?
I ask because I am getting errors based on URLs that do not even exist on our site. For example: http://www.robots.com/applications/abb/panasonic/robots this URL does not even exist for our site, but somehow it is listed in the error section of page title duplication tool. http://www.robots.com/applications/ exists, but there is no place to get to an ABB or a Panasonic robot from this page, not to mention an ABB/Panasonic (which for sure does not exist). ?? We have quite a few of these out there and just wondering how to find out where the link is coming from. When we checked our URLs through Integrity, links like the one listed above (which we had 29 of them listed) that do not show up. Thoughts? Thanks! Janelle
Moz Pro | | jwanner0 -
Crawl Diagnostics Warnings - Duplicate Content
Hi All, I am getting a lot of warnings about duplicate page content. The pages are normally 'tag' pages. I have some news stories or blog posts tagged with multiple 'tags'. Should I ask google not to index the tag pages? Does it really affect my site? Thanks
Moz Pro | | skehoe0 -
Finding page authority for a list of sites
I went to Distilled's linklove conference earlier this week. Many of the presenters showed spreadsheets where they had pulled in Page Authority for a long list of sites. What tool can you use to quickly do this? For instance, I want to be able to copy a list of 100 sites into a tool and have it spit out the PA for every site...anybody know how to do this?
Moz Pro | | znotes0 -
How Can I Find All Backlinks
How can I use site explorer to find out which sites are linking to us thousands of times. It says we have over 300,000 total links pointing to our site. I'm thinking there are some sitewide links from other site(s) making up most of that, but I can't seem to locate it/them?
Moz Pro | | poolguy0 -
What's the best & quick tool to spy or analyze backlinks of SINGLE Page?
hey, someone I want to know why this page or blog post of my competition website ranking better.. I've many tools to spy & analyze domain name, but personalty I want to spy just one page of this or that website or blog.. I hope you could help me and share some tools or the best once for this job. thnx again
Moz Pro | | akitmane0