How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplication content management across a subdir based multisite where subsites are projects of the main site and naturally adopt some ideas and goals from it
Hi, I have the following problem and would like which would be the best solution for it: I have a site codex21.gal that is actually part of a subdirectories based multisite (galike.net). It has a domain mapping setup, but it is hosted on a folder of galike.net multisite (galike.net/codex21). My main site (galike.net) works as a frame-brand for a series of projects aimed to promote the cultural & natural heritage of a region in NW Spain through creative projects focused on the entertainment, tourism and educational areas. The projects themselves will be a concretion (put into practice) of the general views of the brand, that acts more like a company brand. CodeX21 is one of those projects, it has its own logo, etc, and is actually like a child brand, yet more focused on a particular theme. I don't want to hide that it makes part of the GALIKE brand (in fact, I am planning to add the Galike logo to it, and a link to the main site on the menu). I will be making other projects, each of them with their own brand, hosted in subsites (subfolders) of galike.net multisites. Not all of them might have their own TLD mapped, some could simply be www.galike.net/projectname. The project codex21.gal subsite might become galike.net/codex21 if it would be better for SEO. Now, the problem is that my subsite codex21.gal re-states some principles, concepts and goals that have been defined (in other words) in the main site. Thus, there are some ideas (such as my particular vision on the possibilities of sustainable exploitation of that heritage, concepts I have developed myself as "narrative tourism" "geographical map as a non lineal story" and so on) that need to be present here and there on the subsite, since it is also philosophy of the project. BUT it seems that Google can penalise overlapping content in subdirectories based multisites, since they can seem a collection of doorways to access the same product (*) I have considered the possibility to substitute those overlapping ideas with links to the main page of the site, thought it seems unnatural from the user point of view to be brought off the page to read a piece of info that actually makes part of the project description (every other child project of Galike might have the same problem). I have considered also taking the subsite codex21 out of the network and host it as a single site in other server, but the problem of duplicated content might persist, and anyway, I should link it to my brand Galike somewhere, because that's kind of the "production house" of it. So which would be the best (white hat) strategy, from a SEO point of view, to arrange this brand-project philosophy overlapping? (*) “All the same IP address — that’s really not a problem for us. It’s really common for sites to be on the same IP address. That’s kind of the way the internet works. A lot of CDNs (content delivery networks) use the same IP address as well for different sites, and that’s also perfectly fine. I think the bigger issue that he might be running into is that all these sites are very similar. So, from our point of view, our algorithms might look at that and say “this is kind of a collection of doorway sites” — in that essentially they’re being funnelled toward the same product. The content on the sites is probably very similar. Then, from our point of view, what might happen is we will say we’ll pick one of these pages and index that and show that in the search results. That might be one variation that we could look at. In practice that wouldn’t be so problematic because one of these sites would be showing up in the search results. On the other hand, our algorithm might also be looking at this and saying this is clearly someone trying to overdo things with a collection of doorway sites and we’ll demote all of them. So what I recommend doing here is really trying to take a step back and focus on fewer sites and making those really strong, and really good and unique. So that they have unique content, unique products that they’re selling. So then you don’t have this collection of a lot of different sites that are essentially doing the same thing.” (John Mueller, Senior Webmaster Trend Analyst at Google. https://www.youtube.com/watch?time_continue=1&v=kQIyk-2-wRg&feature=emb_logo)
White Hat / Black Hat SEO | | PabloCulebras0 -
Why is this site ranked #1 in Google with such a low DA (is DA not important anymore?)
Hi Guys, Would you mind helping me with the below please? I would like to get your view on it and why Google ranks a really new domain name #1 with super low domain authority? Or is Domain Authority useless now in Google? It seems like from the last update that John Mueller said that they do not use Domain Authority so is Moz Domain Authority tool not to take seriously or am I missing something? There is a new rehab in Thailand called https://thebeachrehab.com/ (Domain authority 13)It's ranked #1 in Google.co.th for these phrases: drug rehab thailand but also for addiction rehab thailand. So when checking the backlink profile it got merely 21 backlinks from really low DA sites (and some of those are really spammy or not related). Now there are lots of sites in this industry here which have a lot higher domain authority and have been around for years. The beach rehab is maybe only like 6 months old. Here are three domains which have been around for many years and have much higher DA and also more relevant content. These are just 3 samples of many others... <cite class="iUh30">https://www.thecabinchiangmai.com (Domain Authority 52)</cite>https://www.hope-rehab-center-thailand.com/ (Domain Authority 40)https://www.dararehab.com (Domain Authority 32) These three sites got lots of high DA backlinks (DA 90++) from strong media links like time.com, theguardian.com, telegraph.co.uk etc. (especially thecabinchiangmai.com) but the other 2 got lots of solid backlinks from really high DA sites. So when looking at the content, thebeachrehab.com has less content as well. Can anyone have a look and let me know your thoughts why Google picks a brand new site, with DA 13 and little content in the top compared to competition? I do not see the logic in this? Cheers
White Hat / Black Hat SEO | | igniterman75
John0 -
Tool to check google index status for backlinks?
I would like to check to see which backlink urls are indexed in Google. Is there a tool that can automate this work or will I have to do it manually?
White Hat / Black Hat SEO | | Choice0 -
What is the difference between Positive Impact, No Impact, Negative Impact and Extremely Negative Impact in term of Google Update like panda or penguin etc.
What is the difference between Positive Impact, No Impact, Negative Impact and Extremely Negative Impact in term of Google Update like panda or penguin etc.
White Hat / Black Hat SEO | | dotlineseo0 -
Content ideas?
We run a printing company and we are struggling to come up with unique content people will actually want to know, is there any way of getting the ball rolling? We were thinking of ideas such as exhibition guide but this seems to have been overdone. Any help would be appreciated.
White Hat / Black Hat SEO | | BobAnderson0 -
Why does Google recommend schema for local business/ organizations?
Why does Google recommend schema for local business/ organizations? The reason I ask is I was in Structed Data Testing Tool, and I was running some businesses and organizations through it. Yet every time, it says this "information will not appear as a rich snippet in search results, because it seems to describe an organization. Google does not currently display organization information in rich snippets". Additionally, many of times when you do search the restaurant or a related query it will still show telephone number and reviews and location. Would it be better to list it as a place, since I want to have its reviews and location show up thanks? I would be interested to hear what everyone else opinions are on this thanks.
White Hat / Black Hat SEO | | PeterRota0 -
Impressions in Google SERP has declined from 3500 to 1600 after 5-25-2012\. Is it Penguin?
It's about the website http://www.apartments-houseboats-amsterdam.com/ The visitors had declined from 270 to 150 visitors per day. Is this caused by the Google update Penguin? If so what can I do to solve the problem? Thank you for your time and effort,
White Hat / Black Hat SEO | | letsbuilditnl0 -
Tricky Decision to make regarding duplicate content (that seems to be working!)
I have a really tricky decision to make concerning one of our clients. Their site to date was developed by someone else. They have a successful eCommerce website, and the strength of their Search Engine performance lies in their product category pages. In their case, a product category is an audience niche: their gender and age. In this hypothetical example my client sells lawnmowers: http://www.example.com/lawnmowers/men/age-34 http://www.example.com/lawnmowers/men/age-33 http://www.example.com/lawnmowers/women/age-25 http://www.example.com/lawnmowers/women/age-3 For all searches pertaining to lawnmowers, the gender of the buyer and their age (for which there are a lot for the 'real' store), these results come up number one for every combination they have a page for. The issue is the specific product pages, which take the form of the following: http://www.example.com/lawnmowers/men/age-34/fancy-blue-lawnmower This same product, with the same content (save a reference to the gender and age on the page) can also be found at a few other gender / age combinations the product is targeted at. For instance: http://www.example.com/lawnmowers/women/age-34/fancy-blue-lawnmower http://www.example.com/lawnmowers/men/age-33/fancy-blue-lawnmower http://www.example.com/lawnmowers/women/age-32/fancy-blue-lawnmower So, duplicate content. As they are currently doing so well I am agonising over this - I dislike viewing the same content on multiple URLs, and though it wasn't a malicious effort on the previous developers part, think it a little dangerous in terms of SEO. On the other hand, if I change it I'll reduce the website size, and severely reduce the number of pages that are contextually relevant to the gender/age category pages. In short, I don't want to sabotage the performance of the category pages, by cutting off all their on-site relevant content. My options as I see them are: Stick with the duplicate content model, but add some unique content to each gender/age page. This will differentiate the product category page content a little. Move products to single distinct URLs. Whilst this could boost individual product SEO performance, this isn't an objective, and it carries the risks I perceive above. What are your thoughts? Many thanks, Tom
White Hat / Black Hat SEO | | SoundinTheory0