ROI on Policing Scraped Content
-
Over the years, tons of original content from my website (written by me) has been scraped by 200-300 external sites. I've been using Copyscape to identify the offenders. It is EXTREMELY time consuming to identify the site owners, prepare an email with supporting evidence (screen shots), and following up 2, 3, 15 times until they remove the scraped content. Filing DMCA takedowns are a final option for sites hosted in the US, but quite a few of the offenders are in China, India, Nigeria, and other places not subject to DMCA. Sometimes, when a site owner takes down scraped content, it reappears a few months or years later. It's exasperating.
My site already performs well in the SERPs - I'm not aware of a third party site's scraped content outperforming my site for any search phrase.
Given my circumstances, how much effort do you think I should continue to put into policing scraped content?
-
I watch my traffic increases and decreases. You can do that with google analytics. I do it with clicky. When I see an important page show traffic losses, I go looking.
One of my retail sites suddenly was not selling a certain product category very well. I looked into it and hundreds of "made in China" blogs had scraped my content.
Then, I have images that are often grabbed. I watch image search traffic and watch for them.
I have tens of thousands of pages on the web. Its hard to monitor all of them, but it is easy to monitor when you can download a traffic spreadsheet that has % up and % down, sort it and then investigate. So, I am being responsive instead of proactive. And, really, I don't look at it as ROI, it is loss prevention.
-
Thanks for the detailed suggestions!
As a follow up: what metric do you use to decide which offenders to go after, and which ones to ignore? I simply don't have time to go after everybody who has copied my content so I need a way to prioritize.
There are two obvious situations where action is warranted: first, when the infringement is committed by a competitor in my industry, and second, when the infringing content outperforms my own site in the SERPs. What else would you suggest?
Thanks again.
-
Over the years, tons of original content from my website (written by me) has been scraped by 200-300 external sites.
I have the same problem on multiple sites. Most of the time the scraping is not harmful. But, on several occasions it has cost me thousands of dollars and forced me to abandon product lines and donate thousands of dollars worth of inventory to Goodwill. Infringers have included websites of many law firms, a state supreme court. a presidential candidate, an Ivy League law school and many others. Infringers can be using images, video or text.
It is EXTREMELY time consuming to identify the site owners, prepare an email with supporting evidence (screen shots), and following up 2, 3, 15 times until they remove the scraped content. Filing DMCA takedowns are a final option for sites hosted in the US,....
I am not an expert in intellectual property law, so what I do or say is not advice. Filing a DMCA can get you sued even if you are in the right. If you file a DMCA all of the details including your name and why you filed will be easily available to the person or company that you complained about. They can retaliate against you, call begging you to retract the DMCA, they can do anything they want against you.
If I contact someone two or three times without results I go straight to DMCA. One thing that I can say about Google is that they generally respond promptly about removing infringing content from their web SERPs and image SERPs. They also generally respond promptly to infringing content on Blogspot and YouTube. Ebay will shut down auctions en masse in response to a DMCA if a seller or group of sellers are using your images or other property.
When infringing content is on a university, government agency, or prominent company's website they usually respond immediately to notification. I usually contact a provost, legal department, or internal manager instead of writing to "webmaster" - who probably was involved in the problem and simply does not understand intellectual property. I usually don't prepare a big document. An email pointing out the infringing work and offering a resolution of "take it down right away" will usually get fast results.
quite a few of the offenders are in China, India, Nigeria, and other places not subject to DMCA.
If you can't identify the owner of the website or if they are outside of the USA, you can still file a DMCA to have the content removed from search engines or websites like YouTube or Blogspot who have an international user community but are owned by a US company. Some of them will insist that you deal with their infringing member, having an attorney contact them might yield quick results.
A lot of the professional spam is done from outside of the USA but there are a few spammers and simply arrogant cowboys in the USA. DMCA is the route to take, but you do risk retaliation with some of them.
Sometimes, when a site owner takes down scraped content, it reappears a few months or years later. It's exasperating.
Yep.
I spend a good amount of time protecting my content. The problem is so big that I can usually only afford to do it in situations where the scraping, infringing or whatever is costing me or my content is appearing on the website of an established business or organization who should have people in leadership positions who would not want that happening.
I watch my analytics watching for traffic drops, etc. Occasionally I go out looking for infringement. The cost of policing can be astronomical. I could have a full time employee working on this if I was going after everyone - and its not cost effective. Most of the people who are grabbing your stuff are putting it on domains that can't damage your rankings.
A greater problem than verbatim theft, in my opinion, is the people who grab your articles and simply rewrite them. You spent tons of time doing the research and preparing the presentation. They simply do a paragraph-by-paragraph rewrite into something that is not detectable or recognizable beyond structure.
Good luck.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Two sites with same content
Hi Everyone, I am having two listing websites. Website A&B are marketplaces Website A approx 12k listing pages Website B : approx 2k pages from one specific brand. The entire 2k listings on website B do exist on website A with the same URL structure with just different domain name. Just header and footer change a little bit. But body is same code. The listings of website B are all partner of a specific insurance company. And this insurance company pays me to maintain their website. They also look at the traffic going into this website from organic so I cannot robot block or noindex this website. How can I be as transparent as possible with Google. My idea was to apply a canonical on website B (insurance partner website) to the same corresponding listing from website A. Which would show that the best version of the product page is on website A. So for example :www.websiteb.com/productxxx would have a canonical pointing to : www.websitea.com/productxxxwww.websiteb.com/productyyy would have a canonical pointing to www.websitea.com/productyyyAny thoughts ? Cheers
Intermediate & Advanced SEO | | Evoe0 -
SEM Rush & Duplicate content
Hi SEMRush is flagging these pages as having duplicate content, but we have rel = next etc implemented: https://www.key.co.uk/en/key/brand/bott https://www.key.co.uk/en/key/brand/bott?page=2 Or is it being flagged as they're just really similar pages?
Intermediate & Advanced SEO | | BeckyKey0 -
How do I optimize dynamic content for SEO?
Hello, folks! I'm wondering how I optimize a site if it is built on a platform that works based on dynamic content. For example, the page pulls in certain information based on the information it has about the user. Not every user will see the same page. Thanks!
Intermediate & Advanced SEO | | Geonetric
Lindsey0 -
Penalties for duplicate content
Hello!We have a website with various city tours and activities listed on a single page (http://vaiduokliai.lt/). The list changes accordingly depending on filtering (birthday in Vilnius, bachelor party in Kaunas, etc.). The URL doesn't change. Content changes dynamically. We need to make URL visible for each category, then optimize it for different keywords (for example city tours in Vilnius for a list of tours and activities in Vilnius with appropriate URL /tours-in-Vilnius).The problem is that activities overlap very often in different categories, so there will be a lot of duplicate content on different pages. In such case, how severe penalty could be for duplicate content?
Intermediate & Advanced SEO | | jpuzakov0 -
Duplicate content on URL trailing slash
Hello, Some time ago, we accidentally made changes to our site which modified the way urls in links are generated. At once, trailing slashes were added to many urls (only in links). Links that used to send to
Intermediate & Advanced SEO | | yacpro13
example.com/webpage.html Were now linking to
example.com/webpage.html/ Urls in the xml sitemap remained unchanged (no trailing slash). We started noticing duplicate content (because our site renders the same page with or without the trailing shash). We corrected the problematic php url function so that now, all links on the site link to a url without trailing slash. However, Google had time to index these pages. Is implementing 301 redirects required in this case?1 -
Block lightbox content
I'm working on a new website with aggregator of content.
Intermediate & Advanced SEO | | JohnPalmer
i'll show to my users content from another website in my website in LIGHTBOX windows when they'll click on the title of the items. ** I don't have specific url for these items.
What is the best way to say for SE "Don't index these pages"?0 -
How do you archive content?
In this video from Google Webmasters about content, https://www.youtube.com/watch?v=y8s6Y4mx9Vw around 0:57 it is advised to "archive any content that is no longer relevant". My question is how do you exactly do that? By adding noindex to those pages, by removing all internal links to that page, by completely removing those from the website? How do you technically archive content? watch?v=y8s6Y4mx9Vw
Intermediate & Advanced SEO | | SorinaDascalu1 -
Homepage Content
I have a website which perform very well for some keywords and much less for other keywords. I would like to try to optimize the keywords with less performance. Let's say our website offers 2 main services: KEYWORD A and KEYWORD Z. KEYWORD Z is a very important keyword for us in terms of revenue. KEYWORD A gives us position Nr 1 on our local Google and redirect properly the visitors to xxxxxx.com/keyword-a/keyword-a.php KEYWORD Z perform badly and gives us position Nr 7 on local Google search. 90% Google traffic is sent to xxxxxx.com/keyword-z/keyword-z.php and the other 10% is sent to the home page of the website. The Homepage is a "soup" of all the services our company offers, some are important (KEYWORD Z) and other much less important. In order to optimize the keyword KEYWORD Z we were thinking to make a permanent redirect for xxxxxx.com/keyword-z/keyword-z.php to xxxxxx.com and optimize the content of the Homepage to ONLY describe our KEYWORD Z. I am not sure if Google gives more importance in the content of the homepage or not. Of course links on the homepage to other pages like xxxxxx.com/keyword-a/keyword-a.php will still exists. The point for us is maybe to optimize better the homepage and give more importance to the KEYWORD Z. Does it make sense or not?
Intermediate & Advanced SEO | | netbuilder0