Duplicate content that looks unique
-
OK, bit of an odd one. The SEOmoz crawler has flagged the following pages up as duplicate content. Does anyone have any idea what's going on?
http://www.gear-zone.co.uk/blog/november-2011/gear$9zone-guide-to-winter-insulation
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
http://www.gear-zone.co.uk/blog/july-2011/telephone-issues-$9-2nd-july-2011
http://www.gear-zone.co.uk/blog/september-2011/gear$9zone-guide-to-nordic-walking-poles
http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone
https://www.google.com/webmasters/tools/googlebot-fetch?hl=en&siteUrl=http://www.gear-zone.co.uk/
-
Good question, because those pages look different to a human. The SEOmoz web app uses a similarity threshold of 95% of the html code. This takes everything on the page, both hidden and visible into account.
In this case, it's counting all of the navigation and sidebar as well, which is significant. What's left of the unique content - the part that matters, makes up less than 5% of the code.
Here's a tool you can use to check the similarity: http://www.duplicatecontent.net/
I ran the pages through a couple of tools which showed 96% HTML similarity.
(but only a 92% text similarity - which is good, but not great)
For perspective, take a look at Google's cached versions of one of these pages. This is how googlebot sees the page: http://webcache.googleusercontent.com/search?q=cache:4fKrbNTUnegJ:www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone+http://www.gear-zone.co.uk/blog/september-2011/win-a-the-north-face-nuptse-2-jacket-with-gear-zone&hl=en&gl=us&strip=1G
Since Panda, when I see a site with this many navigation links, I usually advise them to restructure their site architecture into more of a Pyramid shape, so that you reduce the overall navigation on each page.
There are 2 ways to look at this: First of all, Google is much more sophisticated than SEOmoz at detecting duplicate content, and they are also better at contextual analysis - so they can probably tell these are not true duplicates.
Hope this helps! Best of luck with your SEO.
-
SEOmoz looks at the code on the page when it looks at duplicate content scores. My hunch is that there's a lot of identical code on those pages, which is causing the warning.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 redirect to avoid duplicate content penalty
I have two websites with identical content. Haya and ethnic Both websites have similar products. I would like to get rid of ethniccode I have already started to de-index ethniccode. My question is, Will I get any SEO benefit or Will it be harmful if I 301 direct the below only URL’s https://www.ethniccode/salwar-kameez -> https://www.hayacreations/collections/salwar-kameez https://www.ethniccode/salwar-kameez/anarkali-suits - > https://www.hayacreations/collections/anarkali-suits
Intermediate & Advanced SEO | | riyaaaz0 -
How do we avoid duplicate/thin content on +150,000 product pages?
Hey guys! We got a rather large product range (books) on our eCommerce site (+150,000 titles). We get book descriptions as meta data from our publishers, which we display on the product pages. This obviously is not unique, as many other sites display the same piece of description of the book. It is important for us to rank on those book titles, so my question to You is: How would you go about it? I mean, it seems like a rather unrealistic task to paraphrase +150,000 (and growing) book descriptions. As I see it, there are these options: 1. Don't display the descriptions on the product pages (however then those pages will get even thinner!)
Intermediate & Advanced SEO | | Jacob_Holm
2. Display the (duplicate) descriptions, but put no-index on those product pages in order not to punish the rest of the site (not really an option, though).
3. Hire student workers to produce unique product descriptions for all 150,000 products (seems like a huge and expensive task) But how would You solve such a challenge?
Thanks a lot! Cheers, Tommy.0 -
Different language with direct translation: duplicate content, meta?
For a site that does NOT want a separate subdomain, or directory, or TLD for a country/language would the directly translated page (static) content/meta be duplicate? (NOT considering a translation of the term/acronym which could exist in another language) i.e. /SEO-city-state in English vs. /SEO-city-state Spanish -In this example a term/acronym that is the same in any language. Outside of duplicate content, are their other conflict potentials in rankings you can think of?
Intermediate & Advanced SEO | | bozzie3110 -
[E-commerce] Duplicate content due to color variations (canonical/indexing)
Hello, We currently have a lot of color variations on multiple products with almost the same content. Even with our canonicals being set, Moz's crawling tool seems to flag them as duplicate content. What we have done so far: Choosing the best-selling color variation (our "master product") Adding a rel="canonical" to every variation (with our "master product" as the canonical URL) In my opinion, it should be enough to address this issue. However, being given the fact that it's flagged as duplicate by Moz, I was wondering if there is something else we should do? Should we add a "noindex,follow" to our child products and "index,follow" to our master product? (sounds to me like such a heavy change) Thank you in advance
Intermediate & Advanced SEO | | EasyLounge0 -
Duplicate content issue - online retail site.
Hello Mozzers, just looked at a website and just about every product page (there are hundreds - yikes!) is duplicated like this at end of each url (see below). Surely this is a serious case of duplicate content? Any idea why a web developer would do this? Thanks in advance! Luke prod=company-081
Intermediate & Advanced SEO | | McTaggart
prod=company-081&cat=20 -
Duplicate content reported on WMT for 301 redirected content
We had to 301 redirect a large number of URL's. Not Google WMT is telling me that we are having tons of duplicate page titles. When I looked into the specific URL's I realized that Google is listing an old URL's and the 301 redirected new URL as the source of the duplicate content. I confirmed the 301 redirect by using a server header tool to check the correct implementation of the 301 redirect from the old to the new URL. Question: Why is Google Webmaster Tool reporting duplicated content for these pages?
Intermediate & Advanced SEO | | SEOAccount320 -
Duplicate Content Question
My understanding of duplicate content is that if two pages are identical, Google selects one for it's results... I have a client that is literally sharing content real-time with a partner...the page content is identical for both sites, and if you update one page, teh otehr is updated automatically. Obviously this is a clear cut case for canonical link tags, but I'm cuious about something: Both sites seem to show up in search results but for different keywords...I would think one domain would simply win out over the other, but Google seems to show both sites in results. Any idea why? Also, could this duplicate content issue be hurting visibility for both sites? In other words, can I expect a boost in rankings with the canonical tags in place? Or will rankings remain the same?
Intermediate & Advanced SEO | | AmyLB0 -
Duplicate Content from Article Directories
I have a small client with a website PR2, 268 links from 21 root domains with mozTrusts 5.5, MozRank 4.5 However whenever I check in google for the amount of link: Google always give the response none. My client has a blog and many articles on the blog. However they have submitted their blog article every time to article directories as well, plain and simle creating duplicate and content. Is this the reason why their link: is coming up as none? Is there something to correct the situation?
Intermediate & Advanced SEO | | danielkamen0