What tools do you use to find scraped content?
-
This hasn’t been an issue for our company so far, but I like to be proactive. What tools do you use to find sites that may have scraped your content?
Looking forward to your suggestions.
Vic
-
Oh, this belongs to a different thread: http://moz.com/community/q/chinese-site-ranking-for-our-brand-name-possible-hack
-
Is this part of the original conversation, or something else? Which sites are these?
-
I'm not sure we have been scraped as such though, because the site in question has different content.
It looks as though the offending site has hacked another site (which redirects to the offending site) but the hacked site is ranking for our brand name. Our homepage has lost all rankings it had (our category and product pages seem fine) and has essentially disappeared.
Can anyone else shed any light?
-
Siteliner (Copyscape's big brother) is really great and what we use first (plus I have a bookmarklet for it to make it faster & easy to use.)
Also use Linda's method of taking a bit of content in quotes. Easiest way to show an ecommerce client how much work they're going to require - take three product descriptions into Google, watch the magic, and explain that would happen across all 15,000 products.
-
I spot check on a regular basis by taking a unique chunk out of a post, putting it in quotes, and doing a Google search on it. It's not comprehensive, but it is free. [And the main problems we have had with scrapers have been with sites that have taken huge portions of our content, not just an article or two, and a spot check roots those out.]
-
Thanks, Chris & Jonathan. I will look into Copyscape. Good stuff!
-
Yep, Copyscape is what I use. I use a wordpress plugin that uses the copyscape API and just check my main content every month or so with a simple click.
-
Copyscape works well for us. You can scan a couple of pages for free, and then it's $0.05/page after that.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is RSS feed syndication an effective link building strategy? Has anyone used it and had success?
This process was recommended to us and I am having trouble understanding exactly how it works. Does this type of link building directly benefit your site or is it an indirect process? Also, can you be penalized for republishing someone's content on your feed?
White Hat / Black Hat SEO | | marketingdepartment.ch0 -
Whether to use new domain or old ecommerce site domain that has been incomplete for a long time.
Hello, We are starting a second store in our niche. Which of the following should I choose: A. We have a site from a year and a half ago that we put content on but never actually added products. The category and article content needs to be completely rewritten. We will completely rewrite the content to be much better and up to date. We're planning on adding products and rewriting the manufacturer descriptions. B. We could use a new domain that is closer to exact match for our main keyword. We'd just buy one for $15 I don't know whether A or B would be the fastest way to get the site going. I'm concerned that leaving a site half done for a year could cause an issue, but I really don't know. If you've got experience with this, please advise. Thank you.
White Hat / Black Hat SEO | | BobGW0 -
Common passwords used for spam accounts?
This is a bit of a longshot. I know that many of the spam forum accounts, blog posts etc that have in the past been used for SEO are generated automatically. Does anyone know of any common passwords that are often used when setting up these accounts? I only ask as, trying to clean up the backlink profile for my website, I found myself in desperation keying in random passwords trying to access the spam accounts created on various forums by our former SEO agency. Eventually I got lucky and worked out the password for a series of forum accounts was, not very imaginatively, 'seo'. Having worked out this, I was able to delete the spam signatures on about 10 forums. But there are many other accounts where I have no idea of the password used. I guess I'm just wondering if there are standard stock passwords used in the past by many SEOs? Not likely to get an answer to this one, I know, but worth a shot.
White Hat / Black Hat SEO | | mgane0 -
Best Location to find High Page Authority/ Domain Authority Expired Domains?
Hi, I've been looking online for the best locations to purchase expired domains with existing Page Authority/ Domain Authority attached to them. So far I've found: http://www.expireddomains.net
White Hat / Black Hat SEO | | VelasquezEF
http://www.domainauthoritylinks.com
http://moonsy.com/expired_domains/ These site's are great but I'm wondering if I'm potentially missing other locations? Any other recommendations? Thanks.1 -
Using Redirects To Avoid Penalties
A quick question, born out of frustration! If a webpage has been penalised for unnatural links, what would be the effects of moving that page to a new URL and setting up a 301 redirect from the old penalised page to the new page? Will Google treat the new page as ‘non-penalised’ and restore your rankings? It really shouldn’t work, but I’m convinced (although not certain) that our clients competitor has done this, with great effect! I suppose you could also achieve this using canonicalisation too! Many thanks in advance, Lee.
White Hat / Black Hat SEO | | Webpresence0 -
Are there tools out there to determine when a link linked to your site? I want to know when a link farm was done a site.
In Webmaster Tools I discovered that a client of mine with signed up for or hired another company to get links. The links are poor quality and from other countries, so it looks like a link farm was done. I want to know when they links were linked to the site, and not sure how to find that information out. Does anyone know how to find this out?
White Hat / Black Hat SEO | | StrategicEdgePartners0 -
My attempt to reduce duplicate content got me slapped with a doorway page penalty. Halp!
On Friday, 4/29, we noticed that we suddenly lost all rankings for all of our keywords, including searches like "bbq guys". This indicated to us that we are being penalized for something. We immediately went through the list of things that changed, and the most obvious is that we were migrating domains. On Thursday, we turned off one of our older sites, http://www.thegrillstoreandmore.com/, and 301 redirected each page on it to the same page on bbqguys.com. Our intent was to eliminate duplicate content issues. When we realized that something bad was happening, we immediately turned off the redirects and put thegrillstoreandmore.com back online. This did not unpenalize bbqguys. We've been looking for things for two days, and have not been able to find what we did wrong, at least not until tonight. I just logged back in to webmaster tools to do some more digging, and I saw that I had a new message. "Google Webmaster Tools notice of detected doorway pages on http://www.bbqguys.com/" It is my understanding that doorway pages are pages jammed with keywords and links and devoid of any real content. We don't do those pages. The message does link me to Google's definition of doorway pages, but it does not give me a list of pages on my site that it does not like. If I could even see one or two pages, I could probably figure out what I am doing wrong. I find this most shocking since we go out of our way to try not to do anything spammy or sneaky. Since we try hard not to do anything that is even grey hat, I have no idea what could possibly have triggered this message and the penalty. Does anyone know how to go about figuring out what pages specifically are causing the problem so I can change them or take them down? We are slowly canonical-izing urls and changing the way different parts of the sites build links to make them all the same, and I am aware that these things need work. We were in the process of discontinuing some sites and 301 redirecting pages to a more centralized location to try to stop duplicate content. The day after we instituted the 301 redirects, the site we were redirecting all of the traffic to (the main site) got blacklisted. Because of this, we immediately took down the 301 redirects. Since the webmaster tools notifications are different (ie: too many urls is a notice level message and doorway pages is a separate alert level message), and the too many urls has been triggering for a while now, I am guessing that the doorway pages problem has nothing to do with url structure. According to the help files, doorway pages is a content problem with a specific page. The architecture suggestions are helpful and they reassure us they we should be working on them, but they don't help me solve my immediate problem. I would really be thankful for any help we could get identifying the pages that Google thinks are "doorway pages", since this is what I am getting immediately and severely penalized for. I want to stop doing whatever it is I am doing wrong, I just don't know what it is! Thanks for any help identifying the problem! It feels like we got penalized for trying to do what we think Google wants. If we could figure out what a "doorway page" is, and how our 301 redirects triggered Googlebot into saying we have them, we could more appropriately reduce duplicate content. As it stands now, we are not sure what we did wrong. We know we have duplicate content issues, but we also thought we were following webmaster guidelines on how to reduce the problem and we got nailed almost immediately when we instituted the 301 redirects.
White Hat / Black Hat SEO | | CoreyTisdale0