What tools do you use to find scraped content?
-
This hasn’t been an issue for our company so far, but I like to be proactive. What tools do you use to find sites that may have scraped your content?
Looking forward to your suggestions.
Vic
-
Oh, this belongs to a different thread: http://moz.com/community/q/chinese-site-ranking-for-our-brand-name-possible-hack
-
Is this part of the original conversation, or something else? Which sites are these?
-
I'm not sure we have been scraped as such though, because the site in question has different content.
It looks as though the offending site has hacked another site (which redirects to the offending site) but the hacked site is ranking for our brand name. Our homepage has lost all rankings it had (our category and product pages seem fine) and has essentially disappeared.
Can anyone else shed any light?
-
Siteliner (Copyscape's big brother) is really great and what we use first (plus I have a bookmarklet for it to make it faster & easy to use.)
Also use Linda's method of taking a bit of content in quotes. Easiest way to show an ecommerce client how much work they're going to require - take three product descriptions into Google, watch the magic, and explain that would happen across all 15,000 products.
-
I spot check on a regular basis by taking a unique chunk out of a post, putting it in quotes, and doing a Google search on it. It's not comprehensive, but it is free. [And the main problems we have had with scrapers have been with sites that have taken huge portions of our content, not just an article or two, and a spot check roots those out.]
-
Thanks, Chris & Jonathan. I will look into Copyscape. Good stuff!
-
Yep, Copyscape is what I use. I use a wordpress plugin that uses the copyscape API and just check my main content every month or so with a simple click.
-
Copyscape works well for us. You can scan a couple of pages for free, and then it's $0.05/page after that.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Site Footer Links Used for Keyword Spam
I was on the phone with a proposed web relaunch firm for one of my clients listening to them talk about their deep SEO knowledge. I cannot believe that this wouldn’t be considered black-hat or at least very Spammy in which case a client could be in trouble. On this vendor’s site I notice that they stack the footer site map with about 50 links that are basically keywords they are trying to rank for. But here’s the kicker shown by way of example from one of the themes in the footer: 9 footer links:
White Hat / Black Hat SEO | | RosemaryB
Top PR Firms
Best PR Firms
Leading PR Firms
CyberSecurity PR Firms
Cyber Security PR Firms
Technology PR Firms
PR Firm
Government PR Firms
Public Sector PR Firms Each link goes to a unique URL that is basically a knock-off of the homepage with a few words or at the most one sentences swapped out to include this footer link keyword phrase, sometimes there is a different title attribute but generally they are a close match to each other. The canonical for each page links back to itself. I simply can’t believe Google doesn’t consider this Spammy. Interested in your view.
Rosemary0 -
Lots of websites copied my original content from my own website, what should I do?
1. Should I ask them to remove and replace the content with their unique and original content? 2. Should I ask them to link to the URL where the original content is located? 3. Should I use a tool to easily track these "copycat" sites and automatically add links from their site to my site? Thanks in advance!
White Hat / Black Hat SEO | | esiow20130 -
Duplicate content for product pages
Say you have two separate pages, each featuring a different product. They have so many common features, that their content is virtually duplicated when you get to the bullets to break it all down. To avoid a penalty, is it advised to paraphrase? It seems to me it would benefit the user to see it all laid out the same, apples to apples. Thanks. I've considered combining the products on one page, but will be examining the data to see if there's a lost benefit to not having separate pages. Ditto for just not indexing the one that I suspect may not have much traction (requesting data to see).
White Hat / Black Hat SEO | | SSFCU0 -
Is it still valuable to place content in subdirectories to represent hierarchy or is it better to have every URL off the root?
Is it still valuable to place content in subdirectories to represent hierarchy on the site or is it better to have every URL off the root? I have seen websites structured both ways. It seems having everything off the root would dilute the value associated with pages closest to the homepage. Also, from a user perspective, I see the value in a visual hierarchy in the URL.
White Hat / Black Hat SEO | | belcaro19860 -
When to NOT USE the disavow link tool
Im not here to say this is concrete and should never do this, and please if you disagree with me then lets discuss. One of the biggest things out there today especially after the second wave of Penguin (2.0) is the fear striken web masters who run straight to the disavow tool after they have been hit with Penguin or noticed a drop shortly after. I had a friend who's site who never felt the effects of Penguin 1.0 and thought everything was peachy. Then P2.0 hit and his rankings dropped of the map. I got a call from him that night and he was desperately asking me for help to review his site and guess what might have happened. He then tells me the first thing he did was compile a list of websites back linking to him that might be the issue and create his disavow list and submitted it. I asked him "How long did you research these sites before you came the conclusion they were the problem?" He Said "About an hour" Then I asked him "Did you receive a message in your Google Webmaster Tools about unnatural linking?" He Said "No" I said "Then why are you disavowing anything?" He Said "Um.......I don't understand what you are saying?" In reading articles, forums and even here in the Moz Q/A I tend to think there is some misconceptions about the disavow tool from Google that do not seem to be clearly explained. Some of my findings with the tool and when to use it is purely based on logic IMO. Let me explain When to NOT use the tool If you spent an hour reviewing your back link profile and you are to eager to wait any longer to upload your list. Unless you have less than 20 root domains linking to you, you should spend a lot more than an hour reviewing your back link profile You DID NOT receive a message from GWT informing you that you had some "unnatural" links Ill explain later If you spend a very short amount of time reviewing your back link profile. Did not look at each individual site linking to you and every link that exists, then you might be using it WAY TO SOON. The last thing you want to do is disavow a link that actually might be helping you. Take the time to really look at each link and ask your self this question (Straight from the Google Guidelines) "A good rule of thumb is whether you'd feel comfortable explaining what you've done to a website that competes with you, or to a Google employee" Studying your back link profile We all know when we have cheated. Im sure 99.9% of all of us can admit to it at one point. Most of the time I can find back links from sites and look right at the owner and ask him or her "You placed this back link didn't you?" I can see the guilt immediately in their eyes 🙂 Remember not ALL back links you generate are bad or wrong because you own the site. You need to ask yourself "Was this link necessary and does it apply to the topic at hand?", "Was it relevant?" and most important "Is this going to help other users?". These are some questions you can ask yourself before each link you place. You DID NOT receive a message about unnatural linking This is were I think the most confusing takes place (and please explain to me if I am wrong on this). If you did not receive a message in GWT about unnatural linking, then we can safely say that Google does not think you contain any "fishy" spammy links in which they have determined to be of a spammy nature. So if you did not receive any message yet your rankings dropped, then what could it be? Well it's still your back links that most likely did it, but its more likely the "value" of previous links that hold less or no value at all anymore. So obviously when this value drops, so does your rank. So what do I do? Build more quality links....and watch you rankings come back 🙂
White Hat / Black Hat SEO | | cbielich1 -
What happens when content on your website (and blog) is an exact match to multiple sites?
In general, I understand that having duplicate content on your website is a bad thing. But I see a lot of small businesses (specifically dentists in this example) who hire the same company to provide content to their site. They end up with the EXACT same content as other dentists. Here is a good example: http://www.hodnettortho.com/blog/2013/02/valentine’s-day-and-your-teeth-2/ http://www.braces2000.com/blog/2013/02/valentine’s-day-and-your-teeth-2/ http://www.gentledentalak.com/blog/2013/02/valentine’s-day-and-your-teeth/ If you google the title of that blog article you find tons of the same article all over the place. So, overall, doesn't this make the content on these blogs irrelevant? Does this hurt the SEO on these sites at all? What is the value of having completely unique content on your site/blog vs having duplicate content like this?
White Hat / Black Hat SEO | | MorganPorter0 -
I'm worried my client is asking me to post duplicate content, am I just being paranoid?
Hi SEOMozzers, I'm building a website for a client that provides photo galleries for travel destinations. As of right now, the website is basically a collection of photo galleries. My client believes Google might like us a bit more if we had more "text" content. So my client has been sending me content that is provided free by tourism organizations (tourism organizations will often provide free "one-pagers" about their destination for media). My concern is that if this content is free, it seems likely that other people have already posted it somewhere on the web. I'm worried Google could penalize us for posting content that is already existent. I know that conventionally, there are ways around this-- you can tell crawlers that this content shouldn't be crawled-- but in my case, we are specifically trying to produce crawl-able content. Do you think I should advise my client to hire some bloggers to produce the content or am I just being paranoid? Thanks everyone. This is my first post to the Moz community 🙂
White Hat / Black Hat SEO | | steve_benjamins0 -
Competitor is using a blog network - worth reporting?
Hey guys, Today I checked the backlink profile of a competitor who is #1 in Google Australia for a highly competitive keyword. To my surprise though, every single link (except a few directory link) seems to be from a private blog network. It's a business selling advertisment products, yet somehow seems to have links on blog from website that sell pc repair services, sleepwear, bali villas rentals, etc.. In this case, would filing a spam report in google WMT be beneficial? It's not like they advertise that they sell links (nor are the websites the links are on), but it is quite clear that something dodgy is going on. Thanks
White Hat / Black Hat SEO | | Michael-Goode0