How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Export domain reference full on google search console
Hi all My website is https://simthanglong.vn/ and it have +7000 Referring domains from competitor use tools bad backlink. i want to disavow it but Google Search Console accept export up to 1000 domains. So, What I have to do. Help me Please.67oetuz.jpg
White Hat / Black Hat SEO | | simthanglongdotvn0 -
Technical : Duplicate content and domain name change
Hi guys, So, this is a tricky one. My server team just made quite a big mistake :We are a big We are a big magento ecommerce website, selling well, with about 6000 products. And we are about to change our domaine name for administrative reasons. Let's call the current site : current.com and the future one : future.com Right, here is the issue Connecting to the search console, I saw future.com sending 11.000 links to current.com. At the same time DA was hit by 7 points. I realized future.com was uncorrectly redirected and showed a duplicated site or current.com. We corrected this, and future.com now shows a landing page until we make the domain name change. I was wondering what is the best way to avoid the penalty now and what can be the consequences when changing domain name. Should I set an alias on search console or something ? Thanks
White Hat / Black Hat SEO | | Kepass0 -
Google Penguin penalty is automated or manual?
Hi, I have seen some of our competitors are missing from top SERP and seems to be penalised as per this penalty checker: http://pixelgroove.com/serp/sandbox_checker/. Is this right tool to check penalty? Or any other good tools available? Are these penalties because of recent Penguin update? If so, is this a automated or manual penalty from Google? I don't think all of these tried with black-hat techniques and got penalised. The new penguin update might triggered their back-links causing this penalty. Even we dropped for last 2 weeks. What's the solution for this? How effectively link-audit works? Thanks, Satish
White Hat / Black Hat SEO | | vtmoz0 -
Penguin: Is there a "safe threshold" for commercial links?
Hello everyone, Here I am with a question about Penguin. I am asking to all Penguin experts on these forums to help me understand if there is a "safe" threshold of unnatural links under which we can have peace of mind. I really have no idea about that, I am not an expert on Penguin nor an expert of unnatural back link profiles. I have a website with about 84% natural links and 16% affiliate/commercial links. Should I be concerned about possibly being penalized by an upcoming Penguin update? So far, I have never been hit by any previous Penguin released, but... just in case, you experts, do you know what's the "threshold" of unnatural links that shouldn't be exceeded? Or, in your experience, what's the classic threshold over which Google can penalize a website for unnatural back link profile? Thank you in advance to anyone helping me on this research!
White Hat / Black Hat SEO | | fablau0 -
Indexing content behind a login
Hi, I manage a website within the pharmaceutical industry where only healthcare professionals are allowed to access the content. For this reason most of the content is behind a login. My challenge is that we have a massive amount of interesting and unique content available on the site and I want the healthcare professionals to find this via Google! At the moment if a user tries to access this content they are prompted to register / login. My question is that if I look for the Google Bot user agent and allow this to access and index the content will this be classed as cloaking? I'm assuming that it will. If so, how can I get around this? We have a number of open landing pages but we're limited to what indexable content we can have on these pages! I look forward to all of your suggestions as I'm struggling for ideas now! Thanks Steve
White Hat / Black Hat SEO | | stever9990 -
Google messages & penalties
I just read the following comment in a response to someone else's question. The Responer is an SEOMoz Authority whose opinion I respect and have learned from (not sure if it's cool to mention names in a question) and it spurred my curiosity: "...Generally you will receive a warning from Google before your site is penalized, unless you are talking about just specific keywords." This is something I have been wondering about in relation to my own sudden ranking drop for 2 specific keywords as I did not receive any warnings or notices. I have been proceeding as if I had over used these keywords on my Home page due to an initial lesser drop, but identifying the cause for the huge drop still seems useful for a number of reasons. Can anyone explain this further?
White Hat / Black Hat SEO | | gfiedel0 -
Does having the same descrition for different products a bad thing the titles are all differnent but but they are the same product but with different designs on them does this count as duplicate content?
does having the same description for different products a bad thing the titles are all different but but they are the same product but with different designs on them does this count as duplicate content?
White Hat / Black Hat SEO | | Casefun1 -
Why is this ranking first in Google Places for this term....?
"Best Bar In Chicago" - http://www.google.com/search?gcx=w&sourceid=chrome&ie=UTF-8&q=best+bar+in+chicago They have only 5 Google reviews, and their local directory reviews are suspect. One of them goes to rateclubs.com and it's not even a page for their business, while one of them doesn't have user reviews, it's just an editorial review. The other one at superpages.com doesn't even link back to their site, it links to their restaurants.com profile. What is going on here? I've been trying to figure this out for a while as their first place ranking has been solidified for quite some time now. I can also tell you that a few of the bars listed below them have a MUCH higher profile and are better known. You can see that just by the reviews.
White Hat / Black Hat SEO | | MichaelWeisbaum0