Finding the source of duplicate content URL's
-
We have a website that displays a number of products. The product has variations (sizes) and unfortunately every size has its own URL (for now anyway). Needless to say, this causes duplicate content issues. (And of course, we are looking to change the URL's for our site as soon as possible)
However, even though these duplicate URL's exist, you should not be able to land on them by navigating through the site. In theory, the site should always display the link to the smallest size. It seems that there is a flaw in our system somewhere, as these links are now found in our campaign here on SEOmoz.
My question: is there any way to find the crawl path that lead to the URL's that shouldn't have been found, so we can locate the problem?
-
Using the Screaming Frog SEO Spider (free version to download will crawl 500 URLs, paid version [99 GBP for a yearly license] will crawl as much as you want), you can see all of the inlinks to a particular page. So run a crawl of the site, you should find those pages with Screaming Frog, and then you can view the inlinks to those pages. Visit the inlinks, and check the code for the links to the page you're looking for - this will quickly show you where the links are to the pages you're trying to hide.
Also, have you checked the sitemap - the CMS might create links to these pages in the sitemap.
good luck and let me know if you need any more help with this.
Mark
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Large site with content silo's - best practice for deep indexing silo content
Thanks in advance for any advice/links/discussion. This honestly might be a scenario where we need to do some A/B testing. We have a massive (5 Million) content silo that is the basis for our long tail search strategy. Organic search traffic hits our individual "product" pages and we've divided our silo with a parent category & then secondarily with a field (so we can cross link to other content silo's using the same parent/field categorizations). We don't anticipate, nor expect to have top level category pages receive organic traffic - most people are searching for the individual/specific product (long tail). We're not trying to rank or get traffic for searches of all products in "category X" and others are competing and spending a lot in that area (head). The intent/purpose of the site structure/taxonomy is to more easily enable bots/crawlers to get deeper into our content silos. We've built the page for humans, but included link structure/taxonomy to assist crawlers. So here's my question on best practices. How to handle categories with 1,000+ pages/pagination. With our most popular product categories, there might be 100,000's products in one category. My top level hub page for a category looks like www.mysite/categoryA and the page build is showing 50 products and then pagination from 1-1000+. Currently we're using rel=next for pagination and for pages like www.mysite/categoryA?page=6 we make it reference itself as canonical (not the first/top page www.mysite/categoryA). Our goal is deep crawl/indexation of our silo. I use ScreamingFrog and SEOMoz campaign crawl to sample (site takes a week+ to fully crawl) and with each of these tools it "looks" like crawlers have gotten a bit "bogged down" with large categories with tons of pagination. For example rather than crawl multiple categories or fields to get to multiple product pages, some bots will hit all 1,000 (rel=next) pages of a single category. I don't want to waste crawl budget going through 1,000 pages of a single category, versus discovering/crawling more categories. I can't seem to find a consensus as to how to approach the issue. I can't have a page that lists "all" - there's just too much, so we're going to need pagination. I'm not worried about category pagination pages cannibalizing traffic as I don't expect any (should I make pages 2-1,000) noindex and canonically reference the main/first page in the category?). Should I worry about crawlers going deep in pagination among 1 category versus getting to more top level categories? Thanks!
Moz Pro | | DrewProZ1 -
Are the metrics in Moz's SERP analysis relevant in my area?
Our site (https://loco2.com/) specialises in train journey planning at ticket booking, so we want to improve our ranking when people search Google for a specific journey they want to make (eg "London to Barcelona by train"). We have a set of about 400 Guide pages (eg London to Barcelona) which we want people to find when they search for those journeys. I've been trying to find out why some of those pages do better than others in search, and what our competitors are doing better. I was hoping that the SERP analysis and Keyword Reports tool could help me with this, so I ran analyses for a set of the keyword phrases we want to rank for, but I was surprised to find that most of the metrics in both the basic and full reports don't seem to relate to how high in the search result a page ranks. The screengrab attached shows the results for "Brussels to Paris train", which are fairly typical - many of the top-ranking pages have fairly low scores for most metrics, and the metrics don't seem to have much effect on how high they are in the search results. ( I assume N/A means the Page Authority is very low, or is it not calculated for some other reason?) I looked into this a bit further by downloading full reports for 11 keyword searches, and comparing the average scores for each metric with the rank, but I still couldn't see any effects of the metrics on the ranks. If anything, pages with higher Page Authority and Domain Authority were further down in search rankings. Does this mean that link metrics aren't very important for the kinds of pages we're looking at? This makes sense to me, because if I'm searching for tickets from London to Barcelona I want to find a very specific results page which is unlikely to have many links, but I keep reading that link metrics are one of the most important factors in SEO. Am I misunderstanding something? wJ9uUcG
Moz Pro | | MargotLoco20 -
Where has the old seomoz crawl tool gone? I can't seem to find it
I'm looking for the (SEO)moz crawl tool - but can't find it. Where has it gone?
Moz Pro | | SearchMotion0 -
Duplicate Errors found in my search
I have run my 1st site check with SEOMOZ and have 4000+ errors. The "duplicate Page Content" culprit appears to be a extended url that keeps showing as duplicating. This is only a customer log-in and can be redirected back to the main cust log in page, but is there a short way of doing it (rather than 4000x 301's)? The format of the url is: http://www.????.com.au/default/customer/account/login/referer/aSR0cDovL3d3dy1234YWNiYW Thanks
Moz Pro | | Paul_MC0 -
What's the future of SERP Tracking? And... Is SEOMoz's SERP Rank Tracking in compliance with Google Adwords API Terms of Service?
My question is: Is SEOMoz's SERP Rank Tracking in compliance with Google Adwords API Terms of Service? Background: The reason I ask is because Raven Tools is now removing their SERP Reporting tool because it uses scraped Google position data. So, it looks like SEO's will either have to find a new rank tracking tool or find new ways to traffic the effects that rankings have on a website traffic volumes. For instance, there is a way to get the position a search results was in Google when it was clicked. We could create a secondary profile in Google Analytics for each client and use a custom filter to record the position that the keywords was in when the search result was clicked ( http://www.seomoz.org/blog/show-keyword-position-using-filters-and-advanced-segments ) Or perhaps we'll have to use Google Webmaster Tools' SEO Report to get data somehow ( http://support.google.com/analytics/bin/answer.py?hl=en&answer=1308626 ) What are your thoughts on this? As you know, ranking data is still a great way to show clients if they are gaining or losing visibility in the search engines. It helps SEO's to report how effective their efforts have been. Because other ranking software companies uses Adwords API data to show the keyword search volume and advertiser competition of a keyword, they can not or eventually will not be able to use scraped ranking data any more. But, if another rank tracking tool out there doesn't need to be in compliance with the Adwords API TOS because they don't use that API to show search volume and advertiser competition, they can still technically provide their ranking data and not be violating any TOS, right? I'm just trying to understand the best way to continue reporting impact of organic keyword rankings on a website. Does the SEOMoz SERP Tracker comply with Adwords API? Is there another rank tracking tool out there that already is using Average Position data from the GWT SEO Report tool? Should we all just stop reporting rankings to clients altogether? Scott
Moz Pro | | OrionGroup2 -
Mozcape API Batching URLs LIMIT
Guys, there's an example to batching URLs using PHP: http://apiwiki.seomoz.org/php Which is the maximum number of URLs I can add to that batch?
Moz Pro | | Srvwiz0 -
Is there a way to specify what SEOmoz classes as duplicate content?
Hi all, Currently working through the laundry list of errors and warning on our company's 24 websites. Due to the ridiculous amount of on page links and the sheer volume of products on our sites, much of the descriptive text is similar, following a strict pattern to best mention our USPs and the like. Of course we use a CMS, which means that all the pages look the same and draw this information from the style sheet. Anyways, to the problem at hand. I have been tasked with reducing the "error" count on the SEOmoz admin panel, the problem being SEOmoz is reporting duplicate page content, when they are different, but similar products, for example, 35, 45 and 55 litre refrigeration units. Is there a way in which I can specify what classes as duplicate content, or make the duplicate content report more restrictive, so that everything HAS to be the same for this error to show. Any help is much appreciated, thanks in advance.
Moz Pro | | cmuknbb0 -
Why does my crawl diagnostics show duplicate content
My crawl diagnostics show duplicate content at mysite.com and mysite.com/index.html which are essentially the same file.
Moz Pro | | MSSBConsulting0