How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What to do with internal spam url's google indexed?
I am in SEO for years but never met this problem. I have client who's web page was hacked and there was posted many, hundreds of links, These links has been indexed by google. Actually these links are not in comments but normal external urls's. See picture. What is the best way to remove them? use google disavow tool or just redirect them to some page? The web page is new, but ranks good on google and has domain authority 24. I think that these spam url's improved rankings too 🙂 What would be the best strategy to solve this. Thanks. k9Bviox
White Hat / Black Hat SEO | | AndrisZigurs0 -
Google spider
If someone provide 1 or more cent discount to our customers who put up a link on their site, and wanted to actually show the referral discount in their shopping cart for that customer, can Google see that and realize they are providing a discount for a link? Can Google see what's displayed in our their web application - like in the upload, shopping cart and complete transaction pages?
White Hat / Black Hat SEO | | K_Monestel0 -
On Page #2 of Bing But Nowhere on Google. Please Help !
Hi, community. I have a problem with the ranking of my blog and I hope anyone could help me to solve this problem. I have been trying to rank my blog post for a keyword for almost 6 months but still getting no success. My URL is: this blog post
White Hat / Black Hat SEO | | Airsionquin
Target keyword: best laptops for college The interesting fact is that the post has been on page #2 of BING but nowhere on google. It was on page #3 of google for about one month, but it's been 1-2 weeks gone(not ranked anymore but it's still well indexed). The post has been replaced by another post of my blog(let's say post A) which doesn't have any link. The Post A is ranking on page #4 right now.
The weird thing is my post which ranks for this keyword frequently changes. One day the Post A was on page#4 then after a few days it changed to the post B. Yesterday I searched on google for a keyword "number one on bing but nowhere on google" and then I
come across to read this article on MOZ community and one of the people here said that it was over optimization issue. I think my post has been suffering for an over optimization penalty algorithm. Just for your information, I have been building backlinks to this URL for the last 5 months(it's 1+ year old). It has backlinks only about 1,5k from 200 domains(according to ahref). I have used the exact match anchor only under +/- 2%. The rest is branded, naked URL and generic anchors.
So, in this case, I thought that I haven't done any over anchor optimization.
I have checked the keyword density and I found it was "safe". One important thing I can remember before the post has gone is I add a backlink from lifehack.org(guest post) with exact match anchor.
I suspect this is really the cause because 2-3 days after doing that then the post is gone(dropped) and replaced by another post of my blog(as I've mentioned before). But it's very strange because the amount of the anchor keyword(including the long tail) is only about 10(from 200 domains) or only 5% which mean it should be safe. I'm so Sorry. It's a long story 🙂 So, What is actually happening to my post? and How to fix this problem... Please..please help me... Any hep is appreciated. By the way, Sorry for my poor english.. 🙂0 -
How to add ">" category reveal in google search
When i look through google search and see some website categories their site this way. For example groupon www.groupon.com › Coupons › Browse Coupons by Store how do you do this for a website? for example wordpress. does this help with seo?
White Hat / Black Hat SEO | | andzon0 -
Disabling a slider with content...is considered cloaking?
We have a slider on our site www.cannontrading.com, but the owner didn't like it, so I disabled it. And, each slider contains link & content as well. We had another SEO guy tell me it considered cloaking. Is this True? Please give feedbacks.
White Hat / Black Hat SEO | | ACann0 -
Whats up with google scrapping keywords metrics
I've done a bit of reading on google now "scrapping" the keywords metrics from the analytics. I am trying to understand why the hell they would do that? To force people to run multiple adwords campaign to setup different keywords scenario? It just doesn't make sense to me...If i am a blogger or i run an ecommerce site...and i get a lot of visit regarding a particular post through a keyword they clicked on organically. Why would Google wanna hide this from people? It's great Data for us to carry on writing relevant content that appeals to people and therefore serves the need of those same people? There is the idea of doing White Hat SEO and focus on getting strong links and great content etc... How do we know we have great content if we are not seeing what is appealing to people in terms of keywords and how they found us organically... Is google trying to squash SEO as a profession? What do you guys think?
White Hat / Black Hat SEO | | theseolab0 -
Switching prices for google base
We would like to be able to submit lower prices to google than we do to other sources. How i see it working is that at the end of each url we submit to google base there is a tracking code (source=googlebase). When a user visits the site via one of these urls we would knock 10% of the price of that item and store the item in a cookie to ensure that the price of that item, for that user would stay at the low price for 24 hours. My question is whether google would have a problem with us doing this? The second part of my question is whether they check the full url including the query strings? If theyt just checked the canocial URL they would see a price thats 10% higher than the one we submitted to base - which, of course - would be bad
White Hat / Black Hat SEO | | supermarketonline0