How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Without prerender.io, is google able to render & index geographical dynamic content?
One section of our website is built as a single page application and serves dynamic content based on geographical location. Before I got here, we had used prerender.io so google can see the page, but now that prerender.io is gone, is google able to render & index geographical dynamic content? I'm assuming no. If no is the answer, what are some solutions other than converting everything to html (would be a huge overhaul)?
White Hat / Black Hat SEO | | imjonny1231 -
How does Google handle product detail page links hiden in a <noscript>tag?</noscript>
Hello, During my research of our website I uncovered that our visible links to our product detail pages (PDP) from grid/list view category-nav/search pages are <nofollowed>and being sent through a click tracking redirect with the (PDP) appended as a URL query string. But included with each PDP link is a <noscript>tag containing the actual PDP link. When I confronted our 3rd party e-commerce category-nav/search provider about this approach here is the response I recieved:</p> <p style="padding-left: 30px;">The purpose of these links is to firstly allow us to reliably log the click and then secondly redirect the visitor to the target PDP.<br /> In addition to the visible links there is also an "invisible link" inside the no script tag. The noscript tag prevents showing of the a tag by normal browsers but is found and executed by bots during crawling of the page.<br /> Here a link to a blog post where an SEO proved this year that the noscript tag is not ignored by bots: <a href="http://www.theseotailor.com.au/blog/hiding-keywords-noscript-seo-experiment/" target="_blank">http://www.theseotailor.com.au/blog/hiding-keywords-noscript-seo-experiment/<br /> </a> <br /> So the visible links are not obfuscating the PDP URL they have it encoded as it otherwise cannot be passed along as a URL query string. The plain PDP URL is part of the noscript tag ensuring discover-ability of PDPs by bots.</p> <p>Does anyone have anything in addition to this one blog post, to substantiate the claim that hiding our links in a <noscript> tag are in fact within the SEO Best Practice standards set by Google, Bing, etc...? </p> <p>Do you think that this method skirts the fine line of grey hat tactics? Will google/bing eventually penalize us for this?</p> <p>Does anyone have a better suggestion on how our 3rd party provider could track those clicks without using a URL redirect & hiding the actual PDP link?</p> <p>All insights are welcome...Thanks!</p> <p>Jordan K.</p></noscript></nofollowed>
White Hat / Black Hat SEO | | eImprovement-SEO0 -
Google Images and slideshow copyright
Hello, I made a slideshow and referenced Google Images without searching with advanced copyright settings. Can I just put a copyright disclaimer in my video, or do I need to reshoot it? Thanks!
White Hat / Black Hat SEO | | BobGW0 -
What tools do you use to find scraped content?
This hasn’t been an issue for our company so far, but I like to be proactive. What tools do you use to find sites that may have scraped your content? Looking forward to your suggestions. Vic
White Hat / Black Hat SEO | | VicMarcusNWI0 -
Google +1s Quality Factors?
It is apparent that Google +1s are becoming an increasingly large factor in results pages and I had a few questions about some of the dynamics. Do +1s take into account factors such as c-blocks, location diversity based on IP, and similar elements? To what degree? Do +1s from well-diversified and historically more active/authoritative G+ accounts carry more weight than someone who simply has a G+ account because they use Gmail and were prompted? What is the spectrum here? How much weight would a +1 from Rand Fishkin hold in contrast to an account created one year ago with little activity? I know Google has a great deal of user data from Gmail, YouTube, Calendar, Docs, search history and many more so would imagine this plays a role. Do +1s from newly created accounts that only target one business or niche cause damage? I am assuming that +1s should accumulate naturally just as backlinks so if what would be considered an unnatural amount of +1s in what time period? Any insights here are greatly appreciated!
White Hat / Black Hat SEO | | SEOGroup1230 -
Does Google Consider a Follow Affiliate Link into my site a paid link?
Let's say I have a link coming into my domain like this http://www.mydomain.com/l/freerol.aspx?AID=674&subid=Week+2+Freeroll&pid=120 Do you think Google recognizes this as paid link? These links are follow links. I am working on a site that has tons of these, but ranks fairly well. They did lose some ranking over the past month or so, and I am wondering if it might be related to a recent iteration of Penguin. These are very high PR inbound links and from a number of good domains, so I would not want to make a mistake and have client get affiliates to no follow if that is going to cause his rankings to drop more. Any thoughts would be appreciated.
White Hat / Black Hat SEO | | Robertnweil10 -
Can i 301 redirect a website that does not have manual penalty - but definetly affected by google
ok, i have a website (website A) which has been running since 2008, done very nicely in search results, until january of this year... it dropped siginificantly, losing about two thirds of visitors etc... then in may basically lost the rest... i was pulling my hair out for months trying to figure out why, i "think" it was something to do with links and anchor text, i got rid of old SEO company, got a new SEO company, they have done link analysis, trying to remove lots of links, have dissavowed about 500 domains... put in a reconsideration request... got a reply saying there is no manual penalty... so new seo company says all they can do is carry on removing links, and wait for penguin to update and hopefully that will fix it... this will take as along as it takes penguin to update again... obviously i can not wait indefinetely, so they have advised i start a new website (website B)... which is a complete duplicate of website A. Now as we do not know whats wrong with website A - (we think its links - and will get them removed) my seo company said we cant do a 301 redirect, as we will just cause what ever is wrong to pass over to website B... so we need to create a blank page for every single page at website A, saying we have moved and put a NO FOLLOW link to the new page on website B.... Personally i think the above will look terrible, and not be a very user friendly experience - but my seo company says it is the only way to do it... before i do it, i just wanted to check with some experts here, if this is right? please advise if 301 redirects are NOT correct way to do this. thanks
White Hat / Black Hat SEO | | isntworkdull
James0 -
Merging four sites into one... Best way to combine content?
First of all, thank you in advance for taking the time to look at this. The law firm I work for once took a "more is better" approach and had multiple websites, with keyword rich domains. We are a family law firm, but we have a specific site for "Arizona Child Custody" as one example. We have four sites. All four of our sites rank well, although I don't know why. Only one site is in my control, the other three are managed by FindLaw. I have no idea why the FindLaw sites do well, other than being in the FindLaw directory. They have terrible spammy page titles, and using Copyscape, I realize that most of the content that FindLaw provides for it's attorneys are "spun articles." So I have a major task and I don't know how to begin. First of all, since all four sites rank well for all of the desired phrases-- will combining all of that power into one site rocket us to stardom? The sites all rank very well now, even though they are all technically terrible. Literally. I would hope that if I redirect the child custody site (as one example) to the child custody overview page on the final merged site, we would still maintain our current SERP for "arizona child custody lawyer." I have strongly encouraged my boss to merge our sites for many reasons. One of those being that it's playing havoc with our local places. On the other hand, if I take down the child custody site, redirect it, and we lose that ranking, I might be out of a job. Finally, that brings me down to my last question. As I mentioned, the child custody site is "done" very poorly. Should I actually keep the spun content and redirect each and every page to a duplicate on our "final" domain, or should I redirect each page to a better article? This is the part that I fear the most. I am considering subdomains. Like, redirecting the child custody site to childcustody.ourdomain.com-- I know, for a fact, that will work flawlessly. I've done that many times for other clients that have multiple domains. However, we have seven areas of practice and we don't have 7 nice sites. So child custody would be the only legal practice area that has it's own subdomain. Also, I wouldn't really be doing anything then, would I? We all know 301 redirects work. What I want is to harness all of this individual power to one mega-site. Between the four sites, I have 800 pages of content. I need to formulate a plan of action now, and then begin acting on it. I don't want to make the decision alone. Anybody care to chime in? Thank you in advance for your help. I really appreciate the time it took you to read this.
White Hat / Black Hat SEO | | SDSLaw0