ROI on Policing Scraped Content
-
Over the years, tons of original content from my website (written by me) has been scraped by 200-300 external sites. I've been using Copyscape to identify the offenders. It is EXTREMELY time consuming to identify the site owners, prepare an email with supporting evidence (screen shots), and following up 2, 3, 15 times until they remove the scraped content. Filing DMCA takedowns are a final option for sites hosted in the US, but quite a few of the offenders are in China, India, Nigeria, and other places not subject to DMCA. Sometimes, when a site owner takes down scraped content, it reappears a few months or years later. It's exasperating.
My site already performs well in the SERPs - I'm not aware of a third party site's scraped content outperforming my site for any search phrase.
Given my circumstances, how much effort do you think I should continue to put into policing scraped content?
-
I watch my traffic increases and decreases. You can do that with google analytics. I do it with clicky. When I see an important page show traffic losses, I go looking.
One of my retail sites suddenly was not selling a certain product category very well. I looked into it and hundreds of "made in China" blogs had scraped my content.
Then, I have images that are often grabbed. I watch image search traffic and watch for them.
I have tens of thousands of pages on the web. Its hard to monitor all of them, but it is easy to monitor when you can download a traffic spreadsheet that has % up and % down, sort it and then investigate. So, I am being responsive instead of proactive. And, really, I don't look at it as ROI, it is loss prevention.
-
Thanks for the detailed suggestions!
As a follow up: what metric do you use to decide which offenders to go after, and which ones to ignore? I simply don't have time to go after everybody who has copied my content so I need a way to prioritize.
There are two obvious situations where action is warranted: first, when the infringement is committed by a competitor in my industry, and second, when the infringing content outperforms my own site in the SERPs. What else would you suggest?
Thanks again.
-
Over the years, tons of original content from my website (written by me) has been scraped by 200-300 external sites.
I have the same problem on multiple sites. Most of the time the scraping is not harmful. But, on several occasions it has cost me thousands of dollars and forced me to abandon product lines and donate thousands of dollars worth of inventory to Goodwill. Infringers have included websites of many law firms, a state supreme court. a presidential candidate, an Ivy League law school and many others. Infringers can be using images, video or text.
It is EXTREMELY time consuming to identify the site owners, prepare an email with supporting evidence (screen shots), and following up 2, 3, 15 times until they remove the scraped content. Filing DMCA takedowns are a final option for sites hosted in the US,....
I am not an expert in intellectual property law, so what I do or say is not advice. Filing a DMCA can get you sued even if you are in the right. If you file a DMCA all of the details including your name and why you filed will be easily available to the person or company that you complained about. They can retaliate against you, call begging you to retract the DMCA, they can do anything they want against you.
If I contact someone two or three times without results I go straight to DMCA. One thing that I can say about Google is that they generally respond promptly about removing infringing content from their web SERPs and image SERPs. They also generally respond promptly to infringing content on Blogspot and YouTube. Ebay will shut down auctions en masse in response to a DMCA if a seller or group of sellers are using your images or other property.
When infringing content is on a university, government agency, or prominent company's website they usually respond immediately to notification. I usually contact a provost, legal department, or internal manager instead of writing to "webmaster" - who probably was involved in the problem and simply does not understand intellectual property. I usually don't prepare a big document. An email pointing out the infringing work and offering a resolution of "take it down right away" will usually get fast results.
quite a few of the offenders are in China, India, Nigeria, and other places not subject to DMCA.
If you can't identify the owner of the website or if they are outside of the USA, you can still file a DMCA to have the content removed from search engines or websites like YouTube or Blogspot who have an international user community but are owned by a US company. Some of them will insist that you deal with their infringing member, having an attorney contact them might yield quick results.
A lot of the professional spam is done from outside of the USA but there are a few spammers and simply arrogant cowboys in the USA. DMCA is the route to take, but you do risk retaliation with some of them.
Sometimes, when a site owner takes down scraped content, it reappears a few months or years later. It's exasperating.
Yep.
I spend a good amount of time protecting my content. The problem is so big that I can usually only afford to do it in situations where the scraping, infringing or whatever is costing me or my content is appearing on the website of an established business or organization who should have people in leadership positions who would not want that happening.
I watch my analytics watching for traffic drops, etc. Occasionally I go out looking for infringement. The cost of policing can be astronomical. I could have a full time employee working on this if I was going after everyone - and its not cost effective. Most of the people who are grabbing your stuff are putting it on domains that can't damage your rankings.
A greater problem than verbatim theft, in my opinion, is the people who grab your articles and simply rewrite them. You spent tons of time doing the research and preparing the presentation. They simply do a paragraph-by-paragraph rewrite into something that is not detectable or recognizable beyond structure.
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is my content being fully read by Google?
Hi mozzers, I wanted to ask you a quick question regarding Google's crawlability of webpages. We just launched a series of content pieces but I believe there's an issue.
Intermediate & Advanced SEO | | TyEl
Based on what I am seeing when I inspect the URL it looks like Google is only able to see a few titles and internal links. For instance, when I inspect one of the URLs on GSC this is the screenshot I am seeing: image.pngWhen I perform the "cache:" I barely see any content**:** image.pngVS one of our blog post image.png Would you agree with me there's a problem here? Is this related to the heavy use of JS? If so somehow I wasn't able to detect this on any of the crawling tools? Thanks!0 -
Supplier Videos & Duplicate Content
Hi, We have some supplier videos the product management want to include on these product pages. I am wondering how detrimental this is for SEO & the best way to approach this. Do we simply embed the supplier YouTube videos, or do we upload them to our YouTube - referencing the original content & then embed our YouTube videos? Thank you!
Intermediate & Advanced SEO | | BeckyKey0 -
Duplicate content with URLs
Hi all, Do you think that is possible to have duplicate content issues because we provide a unique image with 5 different URLs ? In the HTML code pages, just one URL is provide. It's enough for that Google don't see the other URLs or not ? Example, in this article : http://www.parismatch.com/People/Kim-Kardashian-sa-securite-n-a-pas-de-prix-1092112 The same image is available on: http://cdn-parismatch.ladmedia.fr/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize1-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize2-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg http://resize3-parismatch.ladmedia.fr/img/var/news/storage/images/paris-match/people/kim-kardashian-sa-securite-n-a-pas-de-prix-1092112/15629236-1-fre-FR/Kim-Kardashian-sa-securite-n-a-pas-de-prix.jpg Thank you very much for your help. Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Two Domains, Same Products/Content
We're an e-commerce company with two domains. One is our original company name/domain, one is a newer top-level domain. The older domain doesn't receive as much traffic but is still searched and used by long-time customers who are loyal to that brand, who we don't want to alienate. The sites are both identical in products and content, which creates a duplicate content issue. I have come across two options so far: 1. a 301 redirect from the old domain to the new one. 2. Optimize the content on the newer domain (the strongest of the two) and leave the older domain content as is. Does anyone know of a solution better than the two I listed above or have experience resolving a similar problem in the past?
Intermediate & Advanced SEO | | ilewis0 -
Apps content Google indexation ?
I read some months back that Google was indexing the apps content to display it into its SERP. Does anyone got any update on this recently ? I'll be very interesting to know more on it 🙂
Intermediate & Advanced SEO | | JoomGeek0 -
Duplicate content
I have just read http://www.seomoz.org/blog/duplicate-content-in-a-post-panda-world and I would like to know which option is the best fit for my case. I have the website http://www.hotelelgreco.gr and every image in image library http://www.hotelelgreco.gr/image-library.aspx has a different url but is considered duplicate with others of the library. Please suggest me what should i do.
Intermediate & Advanced SEO | | socrateskirtsios0 -
What constitutes duplicate content?
I have a website that lists various events. There is one particular event at a local swimming pool that occurs every few months -- for example, once in December 2011 and again in March 2012. It will probably happen again sometime in the future too. Each event has its own 'event' page, which includes a description of the event and other details. In the example above the only thing that changes is the date of the event, which is in an H2 tag. I'm getting this as an error in SEO Moz Pro as duplicate content. I could combine these pages, since the vast majority of the content is duplicate, but this will be a lot of work. Any suggestions on a strategy for handling this problem?
Intermediate & Advanced SEO | | ChatterBlock0