Internal Duplicate Content - Classifieds (Panda)
-
I've been wondering for a while now, how Google treats internal duplicate content within classified sites.
It's quite a big issue, with customers creating their ads twice.. I'd guess to avoid the price of renewing, or perhaps to put themselves back to the top of the results. Out of 10,000 pages crawled and tested, 250 (2.5%) were duplicate adverts.
Similarly, in terms of the search results pages, where the site structure allows the same advert(s) to appear under several unique URLs. A prime example would be in this example. Notice, on this page we have already filtered down to 1 result, but the left hand side filters all return that same 1 advert.
Using tools like Siteliner and Moz Analytics just highlights these as urgent high priority issues, but I've always been sceptical.
On a large scale, would this count as Panda food in your opinion, or does Google understand the nature of classifieds is different, and treat it as such?
Appreciate thoughts.
Thanks.
-
TL;DR: You're right to be skeptical that this is an urgent issue (in my opinion), but it is something worth fixing at some point for several reasons.
I was far more concerned by search results, but I see you've added those to noindex/disallow in robots.txt, which is great. Not many people know that works!
I think it's very possible that Google understands the difference between a classified ad and an editorial content piece. They definitely treat products and content differently. That said, it's generally a good idea to avoid relying on Google's intelligence, as many have been let down by Google's failure to understand.
Duplicate content is generally something SEOs are overly-concerned with. More often than not it triggers a filter - not a "penalty." I don't see it as the most dangerous thing you could be doing by any stretch of the imagination. That said, I've seen several classified sites do the following, which I'd recommend as a "best practice" approach. At one time Craigslist did this, and may still be doing it.
- Accept non-spam ads with a pending status
- Check against listings in a given period of time for duplicates. This happens even if the ad is changed slightly, so there's some kind of semantic+image analysis going on.
- If a duplicate is found under the same user name, inform them that they've already posted the ad. From here the rules are up to you. Many sites say the ad can't be posted again for 7 days (if the old ad is deleted) or 30 days (if not). They then encourage users to buy a featured listing that shows up higher than others.
- If duplicates are found under different user names, give a warning that it's against your terms of service (make sure it is) to post duplicate ads from multiple accounts, that accounts can be banned, and have them certify the post is not the same.
You don't need to follow this exactly, but it's here to give you some ideas on having your users prevent duplicate content for you. Given the general positive architecture I've seen on the site it looks like you know what to do with the site better than I would.
Now I don't think 250 out of 10k is bad. Having consulted with a few local classified sites that's actually quite low. But I do think there's something to be gained by detecting duplicates to prevent users from gaining an unfair advantage over those playing by the rules. And if you sell featured listings this is an excellent way to help those who are most desparate to sell while increasing revenue.
I hope that helps.
Obligatory disclaimer: This is merely free advice for your consideration, and not the Moz official stance. The consequences of any changes you do or don't make are ultimately your responsibility.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Benefit of internal link in content
Hi, Is there a real benefit to having internal links in content other than at the bottom of a page for example and not surrounded by content. Would the benefit be 1 to 10 or 1 to 1.5 ? Thank you,
Intermediate & Advanced SEO | | seoanalytics0 -
Duplicate content. Competing for rank.
Scenario: An automotive dealer lists cars for sale on their website. The descriptions are very good and in depth at 1,200 words per car. However chunks of the copy are copied from car review websites and weaved into their original copy. Q1: This is flagged in copyscape - how much of an issue is this for Google? Q2: The same stock with the same copy is fed into a popular car listing website - the dealer's website and the classifieds website often rank in the top two positions (sometimes the dealer on top other times the classifieds site). Is this a good or a bad thing? Are you risking being seen as duplicating/scraping content? Thank you.
Intermediate & Advanced SEO | | Bee1590 -
Duplicate Content Errors new website. How do you know which page to put the rel canonical tag on?
I am having problems with duplicate content. This is a new website and all the pages have the same page and domain rank, the following is an example of the homepage. How do you know which page to use the canonical tag on? http://medresourcesupply.com/index.php http://medresourcesupply.com/ Would this be the correct way to use this? Here is another example where Moz says these are duplicates. I can't figure out why because they have different url's and content. http://medresourcesupply.com/clutching_at_the_throat http://medresourcesupply.com/index.php?src=gendocs&ref=detailed_specfications &category=Main
Intermediate & Advanced SEO | | artscube.biz0 -
Duplicate Content / Canonical Conundrum on E-Commerce Website
Hi all, I’m looking for some expert advice on use of canonicals to resolve duplicate content for an e-Commerce site. I’ve used a generic example to explain the problem (I do not really run a candy shop). SCENARIO I run a candy shop website that sells candy dispensers and the candy that goes in them. I sell about 5,000 different models of candy dispensers and 10,000 different types of candy. Much of the candy fits in more than one candy dispenser, and some candy dispensers fit exactly the same types of candy as others. To make things easy for customers who need to fill up their candy dispensers, I provide a “candy finder” tool on my website which takes them through three steps: 1. Pick your candy dispenser brand (e.g. Haribo) 2. Pick your candy dispenser type (e.g. soft candy or hard candy) 3. Pick your candy dispenser model (e.g. S4000-A) RESULT: The customer is then presented with a list of candy products that they can buy. on a URL like this: Candy-shop.com/haribo/soft-candy/S4000-A All of these steps are presented as HTML pages with followable/indexable links. PROBLEM: There is a duplicate content issue with the results pages. This is because a lot of the candy dispensers fit exactly the same candy (e.g. S4000-A, S4000-B and S4000-C). This means that the content on these pages are the basically same because the same candy products are listed. I’ll call these the “duplicate dispensers” E.g. Candy-shop.com/haribo/soft-candy/S4000-A Candy-shop.com/haribo/soft-candy/S4000-B Candy-shop.com/haribo/soft-candy/S4000-C The page titles/headings change based on the dispenser model, but that’s not enough for the pages to be deemed unique by Moz. I want to drive organic traffic searches for the dispenser model candy keywords, but with duplicate content like this I’m guessing this is holding me back from any of these dispenser pages ranking. SOLUTIONS 1. Write unique content for each of the duplicate dispenser pages: Manufacturers add or discontinue about 500 dispenser models each quarter and I don’t have the resources to keep on top of this content. I would also question the real value of this content to a user when it’s pretty obvious what the products on the page are. 2. Pick one duplicate dispenser to act as a rel=canonical and point all its duplicates at it. This doesn’t work as dispensers get discontinued so I run the risk of randomly losing my canonicals or them changing as models become unavailable. 3. Create a single page with all of the duplicate dispensers on, and canonical all of the individual duplicate pages to that page. e.g. Canonical: candy-shop.com/haribo/soft-candy/S4000-Series Duplicates (which all point to canonical): candy-shop.com/haribo/soft-candy/S4000-Series?model=A candy-shop.com/haribo/soft-candy/S4000-Series?model=B candy-shop.com/haribo/soft-candy/S4000-Series?model=C PROPOSED SOLUTION Option 3. Anyone agree/disagree or have any other thoughts on how to solve this problem? Thanks for reading.
Intermediate & Advanced SEO | | webmethod0 -
Duplicate Content... Really?
Hi all, My site is www.actronics.eu Moz reports virtually every product page as duplicate content, flagged as HIGH PRIORITY!. I know why. Moz classes a page as duplicate if >95% content/code similar. There's very little I can do about this as although our products are different, the content is very similar, albeit a few part numbers and vehicle make/model. Here's an example:
Intermediate & Advanced SEO | | seowoody
http://www.actronics.eu/en/shop/audi-a4-8d-b5-1994-2000-abs-ecu-en/bosch-5-3
http://www.actronics.eu/en/shop/bmw-3-series-e36-1990-1998-abs-ecu-en/ate-34-51 Now, multiply this by ~2,000 products X 7 different languages and you'll see we have a big dupe content issue (according to Moz's Crawl Diagnostics report). I say "according to Moz..." as I do not know if this is actually an issue for Google? 90% of our products pages rank, albeit some much better than others? So what is the solution? We're not trying to deceive Google in any way so it would seem unfair to be hit with a dupe content penalty, this is a legit dilemma where our product differ by as little as a part number. One ugly solution would be to remove header / sidebar / footer on our product pages as I've demonstrated here - http://woodberry.me.uk/test-page2-minimal-v2.html since this removes A LOT of page bloat (code) and would bring the page difference down to 80% duplicate.
(This is the tool I'm using for checking http://www.webconfs.com/similar-page-checker.php) Other "prettier" solutions would greatly appreciated. I look forward to hearing your thoughts. Thanks,
Woody 🙂1 -
Scraping / Duplicate Content Question
Hi All, I understanding the way to protect content such as a feature rich article is to create authorship by linking to your Google+ account. My Question
Intermediate & Advanced SEO | | Mark_Ch
You have created a webpage that is informative but not worthy to be an article, hence no need create authorship in Google+
If a competitor comes along and steals this content word for word, something similar, creates their own Google+ page, can you be penalised? Is there any way to protect yourself without authorship and Google+? Regards Mark0 -
Duplicate Content Error because of passed through variables
Hi everyone... When getting our weekly crawl of our site from SEOMoz, we are getting errors for duplicate content. We generate pages dynamically based on variables we carry through the URL's, like: http://www.example123.com/fun/life/1084.php
Intermediate & Advanced SEO | | CTSupp
http://www.example123.com/fun/life/1084.php?top=true ie, ?top=true is the variable being passed through. We are a large site (approx 7000 pages) so obviously we are getting many of these duplicate content errors in the SEOMoz report. Question: Are the search engines also penalizing for duplicate content based on variables being passed through? Thanks!0 -
Duplicate page content
Hi. I am getting error of having duplicate content on my website and pages its showing there are: www.mysitename.com www.mysitename.com/index.html As my best knowledge it only one page, I know this can be solved with some conical tag used in header, but do not know how. Can anyone please tell me about that code or any other way to get this solved. Thanks
Intermediate & Advanced SEO | | onlinetraffic0