How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do content copycats (plagiarism) hurt original website rankings?
Hi all, Found some websites stolen our content and using the same sentences in their website pages. Does this content hurt our website rankings? Their DA is low, still we are worried about the damage about this plagiarism. Thanks
White Hat / Black Hat SEO | | vtmoz0 -
"Google chose different canonical than user" Issue Can Anyone help?
Our site https://www.travelyaari.com/ , some page are showing this error ("Google chose different canonical than user") on google webmasters. status message "Excluded from search results". Affected on our route page urls mainly. https://www.travelyaari.com/popular-routes-listing Our canonical tags are fine, rel alternate tags are fine. Can anyone help us regarding why it is happening?
White Hat / Black Hat SEO | | RobinJA0 -
How much does doing google search queries dilute your search console data
So, does performing dozens or hundreds of search queries a day dilute your search console data, or does google filter this out or how does this work exactly? When you do an icognito search and click on your site does this information get recorded in search console?
White Hat / Black Hat SEO | | jfishe19880 -
Are bloggs published on blog platforms and on our own site be considered duplicate content?
Hi, SEO wizards! My company has a company blog on Medium (https://blog.scratchmm.com/). Recently, we decided to move it to our own site to drive more traffic to our domain (https://scratchmm.com/blog/). We re-published all Medium blogs to our own website. If we keep the Medium blog posts, will this be considered duplicate content and will our website rankings we affected in any way? Thank you!
White Hat / Black Hat SEO | | Scratch_MM0 -
Ecommerce sites we own have similar products, is this OK?
Hello, In one of our niches, we have a big site with all products and a couple more sites that are smaller niches of the same niche. The product descriptions are different with different product names. Is this OK. We've got one big site and 2 smaller subsides in different niches that cross over with the big site. Let me know if Google is OK with this. We will have a separate blog for each with completely different content. There's not really duplicate content issues and although only the big site has a blog right now, the small ones eventually will have their own unique blog. Is this OK in Google's eyes now and in the future? What can we do to ensure we are OK? Thank you.
White Hat / Black Hat SEO | | BobGW1 -
Dynamic Content Boxes: how to use them without get Duplicate Content Penalty?
Hi everybody, I am starting a project with a travelling website which has some standard category pages like Last Minute, Offers, Destinations, Vacations, Fly + Hotel. Every category has inside a lot of destinations with relative landing pages which will be like: Last Minute New York, Last Minute Paris, Offers New York, Offers Paris, etc. My question is: I am trying to simplify my job thinking about writing some dynamic content boxes for Last Minute, Offers and the other categories, changing only the destination city (Rome, Paris, New York, etc) repeated X types in X different combinations inside the content box. In this way I would simplify a lot my content writing for the principal generic landing pages of each category but I'm worried about getting penalized for Duplicate Content. Do you think my solution could work? If not, what is your suggestion? Is there a rule for categorize a content as duplicate (for example number of same words in a row, ...)? Thanks in advance for your help! A.
White Hat / Black Hat SEO | | OptimizedGroup0 -
Keyword Duplication in the title
Hello, I read on this great SEO Blueprint Article here that you don't want to duplicate any words in the title tag, even one duplicate. But what if your branding and keywords both have the same word in it. For example, making the title here like this: NLP Training and Certification Center | NLP and Coaching Institute which is 66 characters by the way. Your thoughts on the duplicate word "NLP"?
White Hat / Black Hat SEO | | BobGW0 -
Separate Servers for Humans vs. Bots with Same Content Considered Cloaking?
Hi, We are considering using separate servers for when a Bot vs. a Human lands on our site to prevent overloading our servers. Just wondering if this is considered cloaking if the content remains exactly the same to both the Bot & Human, but on different servers. And if this isn't considered cloaking, will this affect the way our site is crawled? Or hurt rankings? Thanks
White Hat / Black Hat SEO | | Desiree-CP0