Are all duplicate contents bad?
-
We were badly hit by Panda back in January 2012. Unfortunately, it is only now that we are trying to recover back.
CASE 1:
We develop software products. We send out 500-1000 word description about the product to various download sites so that they can add to their product listing. So there are several hundred download sites with same content.How does Google view this? Did Google penalize us due to this reason?
CASE 2:
In the above case the product description does not match with any content on our website. However, there are several software download sites that copy and paste the content from our website as the product description. So in this case, the duplicate content match with our website.
How does Google view this? Did Google penalize us due to this reason?
Along with all the download sites, there are also software piracy & crack sites that have the duplicate content.
So, should I remove duplicate content only from the software piracy & crack sites or also from genuine download sites?
Does Google reject all kind of duplicate content? Or it depends on who hosts the duplicate content?
Confused Please help.
-
It is tricky. As Michael said it is important to get your content indexed first, which can help identify you as the source. Google doesn't always do a great job of that. Generally, I don't worry too much about Case 1, but in your case, it can be tougher. The problem is that many download sites can have very high authority and could start outranking you for these product descriptions. If that happens, it's unlikely you'd be penalized, but you could be filtered out or knocked down the rankings, which might feel like a penalty.
Here's the thing, with Case 1, though. If these download sites are simply outranking you, but you're distributing product, is it so awful? I think you have to look at the trade-off through the lens of your broader business goals.
Case 2 is tougher, since there's not a lot you can do about it, short of DMCA takedowns. You've got to hope Google sorts it out. Again, getting in front of it and getting your content in the index quickly is critical.
If you were hit by Panda, I'd take a hard look at anything on your own site that could be harming you. Are you spinning out variations of your own content? Are you creating potentially duplicate URLs? Are you indexing a ton of paginated content (internal searches, for example). You may find that the external duplicates are only part of your Panda problem - if you can clean up what you control, you'll be much better off. I have an extensive duplicate content write-up here:
-
For all new content it is important to get indexed fast. There is the scenario that if your site is crawled infrequently another site may get that copy indexed first and by default is viewed as theirs. So with any new content I would post on social media as quickly as possible - G+, Twitter etc to get noticed and to mark as yours. G+ author attribute will help.
-
Hi Gautam,
Good questions, it really hard to say what Google determines as duplicate content so this will just be my hunch on your issue. As I have experienced Google won't 'penalize' you as you're the owner of the content and you can't be the victim of other people stealing or copying your content. Also if you have provided these sites with your content. Mostly because you're often not in charge of the content management on somebodies elses site.
Hope this helps a bit!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Our site dropped by April 2018 Google update about content relevance: How to recover?
Hi all, After Google's confirmed core update in April 2018, we dropped globally and couldn't able to recover later. We found the update is about the content relevance as officially stated by Google later. We wonder how we are not related in-terms of content being ranking for same keywords over years. And we are expecting to find a solution to this. Are there any standard ways to measure the content relevancy? Please suggest! Thank you
Algorithm Updates | | vtmoz0 -
Do the back-links go wasted when anchor text or context content doesn't match with page content?
Hi Community, I have seen number of back-links where the content in that link is not matching with page content. Like page A linking to page B, but content is not really relevant beside brand name. Like page with "vertigo tiles" linked to page about "vertigo paints" where "vertigo" is brand name. Will these kind of back-links completely get wasted? I have also found some broken links which I'm planning to redirect to existing pages just to reclaim the back-links even though the content relevancy is not much beside brand name. Are these back-links are beneficial or not? Thanks
Algorithm Updates | | vtmoz0 -
Rel canonical on other page instead of duplicate page. How Google responds?
Hi all, We have 3 pages for same topics. We decided to use rel canonical and remove old pages from search to avoid duplicate content. Out of these 3 pages....1 and 2 type of pages have more similar content where 3 type don't have. Generally we must use rel canonical between 1 and 2. But I am wondering what happens if I canonical between 1 and 3 while 2 has more similar content? Will Google respects it or penalise as we left the most similar page and used other page for canonical. Thanks
Algorithm Updates | | vtmoz0 -
Is it bad from an SEO perspective that cached AMP pages are hosted on domains other than the original publisher's?
Hello Moz, I am thinking about starting to utilize AMP for some of my website. I've been researching this AMP situation for the better part of a year and I am still unclear on a few things. What I am primarily concerned with in terms of AMP and SEO is whether or not the original publisher gets credit for the traffic to a cached AMP page that is hosted elsewhere. I can see the possible issues with this from an SEO perspective and I am pretty sure I have read about how SEOs are unhappy about this particular aspect of AMP in other places. On the AMP project FAQ page you can find this, but there is very little explanation: "Do publishers receive credit for the traffic from a measurement perspective?
Algorithm Updates | | Brian_Dowd
Yes, an AMP file is the same as the rest of your site – this space is the publisher’s canvas." So, let's say you have an AMP page on your website example.com:
example.com/amp_document.html And a cached copy is served with a URL format similar to this: https://google.com/amp/example.com/amp_document.html Then how does the original publisher get the credit for the traffic? Is it because there is a canonical tag from the AMP version to the original HTML version? Also, while I am at it, how does an AMP page actually get into Google's AMP Cache (or any other cache)? Does Google crawl the original HTML page, find the AMP version and then just decide to cache it from there? Are there any other issues with this that I should be aware of? Thanks0 -
Staging site - Treated as duplicate?
Last week (exactly 8 days ago to be precise) my developer created a staging/test site to test some new features. The staging site duplicated the entire existing site on the same server. To explain this better -My site address is - www.mysite.com The path of the new staging site was www.mysite/staging I realized this only today and have immediately restricted robot text and put a no index no follow on the entire duplicate server folder but I am sure that Google would have indexed the duplicate content by now? So far I do not see any significant drop in traffic but should I be worried? and what if anything can I do at this stage?
Algorithm Updates | | rajatsharma0 -
How can I use Intuit without getting duplicate content issues
All of my Intuit site show duplicate content on the index pages. How can I avoid this
Algorithm Updates | | onestrohm0 -
Large number of thin content pages indexed, affect overall site performance?
Hello Community, Question on negative impact of many virtually identical calendar pages indexed. We have a site that is a b2b software product. There are about 150 product-related pages, and another 1,200 or so short articles on industry related topics. In addition, we recently (~4 months ago) had Google index a large number of calendar pages used for webinar schedules. This boosted the indexed pages number shown in Webmaster tools to about 54,000. Since then, we "no-followed" the links on the calendar pages that allow you to view future months, and added "no-index" meta tags to all future month pages (beyond 6 months out). Our number of pages indexed value seems to be dropping, and is now down to 26,000. When you look at Google's report showing pages appearing in response to search queries, a more normal 890 pages appear. Very few calendar pages show up in this report. So, the question that has been raised is: Does a large number of pages in a search index with very thin content (basically blank calendar months) hurt the overall site? One person at the company said that because Panda/Penguin targeted thin-content sites that these pages would cause the performance of this site to drop as well. Thanks for your feedback. Chris
Algorithm Updates | | cogbox0 -
Need some Real Insight into our SEO Issue and Content Generation
We have our site www.practo.com We have our blog as blog.practo.com We plan to have our main site in a months time from now as www.ray.practo.com The Issues - I will then need to direct all my existing traffic from www.practo.com to www.ray.practo.com Keeping in mind SEO and also since I will be generating new content via our Wordpress instance what are the best ways to do this so that google does not have difficulty in find out content 1. Would it be good if I put the Wordpress instance as ray.practo.com/ blog(wordpress instance comes in here in the directory) / article-url 2.Would it be better with www.practo.com / ray / blog/article-url I am using wordpress to roll out all our new SEO based content on various keywords and topics for which we want traffice - primary reasons are since we needed a content generation cms platform so that we dont have to deal with html pages and every time publish those content pages via a developer. Is the above - what soever I am planning to do in the correct manner keeping SEO in mind. Any suggestions are welcome. I seriously need to know writing seo based content on wordpress instance and have them in the urls is that a good idea? Or is only html a good idea. But we need some cms to be there so that content writers can write content independently. Please guide accordingly. Thanks
Algorithm Updates | | shanky10