What tools do you use to find scraped content?
-
This hasn’t been an issue for our company so far, but I like to be proactive. What tools do you use to find sites that may have scraped your content?
Looking forward to your suggestions.
Vic
-
Oh, this belongs to a different thread: http://moz.com/community/q/chinese-site-ranking-for-our-brand-name-possible-hack
-
Is this part of the original conversation, or something else? Which sites are these?
-
I'm not sure we have been scraped as such though, because the site in question has different content.
It looks as though the offending site has hacked another site (which redirects to the offending site) but the hacked site is ranking for our brand name. Our homepage has lost all rankings it had (our category and product pages seem fine) and has essentially disappeared.
Can anyone else shed any light?
-
Siteliner (Copyscape's big brother) is really great and what we use first (plus I have a bookmarklet for it to make it faster & easy to use.)
Also use Linda's method of taking a bit of content in quotes. Easiest way to show an ecommerce client how much work they're going to require - take three product descriptions into Google, watch the magic, and explain that would happen across all 15,000 products.
-
I spot check on a regular basis by taking a unique chunk out of a post, putting it in quotes, and doing a Google search on it. It's not comprehensive, but it is free. [And the main problems we have had with scrapers have been with sites that have taken huge portions of our content, not just an article or two, and a spot check roots those out.]
-
Thanks, Chris & Jonathan. I will look into Copyscape. Good stuff!
-
Yep, Copyscape is what I use. I use a wordpress plugin that uses the copyscape API and just check my main content every month or so with a simple click.
-
Copyscape works well for us. You can scan a couple of pages for free, and then it's $0.05/page after that.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What if i dont use an H1, but rather, h2 with multiple keywords.
the reason i dont want to use h1 is because i can have only one h1, however if i use several h2s. is it gonna help me rank? bacause google likes h1 more than h2, is google gonna give more priority or same priority to h2., and if that priority is gonna be less, what will be the percentage of that lessness? for ex: h1 gets 90 score if my h1 is missing how much score my h2 will get out of hundred(i know there is no such metric but i am just wondering anyways)
White Hat / Black Hat SEO | | Sam09schulz0 -
I have 100+ Landing Pages I use for PPC... Does Google see this as a blog farm?
I am currently using about 50-100 domains for geotargeted landing pages for my PPC campaigns. All these pages basically have the same content, I believe are hosted on a single unique ip address and all have links back to my main url. I am not using these pages for SEO at all, as I know they will never achieve any significant SEO value. They are simply designed to generate a higher conversion rate for my PPC campaigns, because they are state and city domains. My question is, does google see this as a blog/link farm, and if so, what should I do about it? I don't want to lose any potential rankings they may be giving my site, if any at all, but if they are hurting my main urls SEO performance, then I want to know what I should do about it. any advice would be much appreciated!
White Hat / Black Hat SEO | | jfishe19881 -
Internal Links & Possible Duplicate Content
Hello, I have a website which from February 6 is keep losing positions. I have not received any manual actions in the Search Console. However I have read the following article a few weeks ago and it look a lot with my case: https://www.seroundtable.com/google-cut-down-on-similar-content-pages-25223.html I noticed that google has remove from indexing 44 out of the 182 pages of my website. The pages that have been removed can be considered as similar like the website that is mentioned in the article above. The problem is that there are about 100 pages that are similar to these. It is about pages that describe the cabins of various cruise ships, that contain one picture and one sentence of max 10 words. So, in terms of humans this is not duplicate content but what about the engine, having in mind that sometimes that little sentence can be the same? And let’s say that I remove all these pages and present the cabin details in one page, instead of 15 for example, dynamically and that reduces that size of the website from 180 pages to 50 or so, how will this affect the SEO concerning the internal links issue? Thank you for your help.
White Hat / Black Hat SEO | | Tz_Seo0 -
Indexing content behind a login
Hi, I manage a website within the pharmaceutical industry where only healthcare professionals are allowed to access the content. For this reason most of the content is behind a login. My challenge is that we have a massive amount of interesting and unique content available on the site and I want the healthcare professionals to find this via Google! At the moment if a user tries to access this content they are prompted to register / login. My question is that if I look for the Google Bot user agent and allow this to access and index the content will this be classed as cloaking? I'm assuming that it will. If so, how can I get around this? We have a number of open landing pages but we're limited to what indexable content we can have on these pages! I look forward to all of your suggestions as I'm struggling for ideas now! Thanks Steve
White Hat / Black Hat SEO | | stever9990 -
Using competitor brand names. How far is too far?
We are a small company competing for traffic in an industry with more or less one other very large brand. I'm noticing we are getting a descent amount of organic traffic for the competitor's brand name however I haven't done any on-page inclusion or link building for the term. We are using their brand as a keyword in our paid campaigns and seeing potential. I firmly believe we have a superior product. I'm tempted to start going after our competitor's brand as a keyword to skim some of their traffic. My question is how far it too far? Do I actively try to obtain a few anchor text specific backlinks? Dare I use their brand name as a term on our page? Maybe just a simple blog post comparing our two products is more appropriate? Any suggestions are appreciated.
White Hat / Black Hat SEO | | CaliB0 -
Is it still valuable to place content in subdirectories to represent hierarchy or is it better to have every URL off the root?
Is it still valuable to place content in subdirectories to represent hierarchy on the site or is it better to have every URL off the root? I have seen websites structured both ways. It seems having everything off the root would dilute the value associated with pages closest to the homepage. Also, from a user perspective, I see the value in a visual hierarchy in the URL.
White Hat / Black Hat SEO | | belcaro19860 -
Is it outside of Google's search quality guidelines to use rel=author on the homepage?
I have recently seen a few competitors using rel=author to markup their homepage. I don't want to follow suit if it is outside of Google's search quality guidelines. But I've seen very little on this topic, so any advice would be helpful. Thanks!
White Hat / Black Hat SEO | | smilingbunny0 -
Merging four sites into one... Best way to combine content?
First of all, thank you in advance for taking the time to look at this. The law firm I work for once took a "more is better" approach and had multiple websites, with keyword rich domains. We are a family law firm, but we have a specific site for "Arizona Child Custody" as one example. We have four sites. All four of our sites rank well, although I don't know why. Only one site is in my control, the other three are managed by FindLaw. I have no idea why the FindLaw sites do well, other than being in the FindLaw directory. They have terrible spammy page titles, and using Copyscape, I realize that most of the content that FindLaw provides for it's attorneys are "spun articles." So I have a major task and I don't know how to begin. First of all, since all four sites rank well for all of the desired phrases-- will combining all of that power into one site rocket us to stardom? The sites all rank very well now, even though they are all technically terrible. Literally. I would hope that if I redirect the child custody site (as one example) to the child custody overview page on the final merged site, we would still maintain our current SERP for "arizona child custody lawyer." I have strongly encouraged my boss to merge our sites for many reasons. One of those being that it's playing havoc with our local places. On the other hand, if I take down the child custody site, redirect it, and we lose that ranking, I might be out of a job. Finally, that brings me down to my last question. As I mentioned, the child custody site is "done" very poorly. Should I actually keep the spun content and redirect each and every page to a duplicate on our "final" domain, or should I redirect each page to a better article? This is the part that I fear the most. I am considering subdomains. Like, redirecting the child custody site to childcustody.ourdomain.com-- I know, for a fact, that will work flawlessly. I've done that many times for other clients that have multiple domains. However, we have seven areas of practice and we don't have 7 nice sites. So child custody would be the only legal practice area that has it's own subdomain. Also, I wouldn't really be doing anything then, would I? We all know 301 redirects work. What I want is to harness all of this individual power to one mega-site. Between the four sites, I have 800 pages of content. I need to formulate a plan of action now, and then begin acting on it. I don't want to make the decision alone. Anybody care to chime in? Thank you in advance for your help. I really appreciate the time it took you to read this.
White Hat / Black Hat SEO | | SDSLaw0