Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Regarding internal duplicate content
Suppose two of my webpages from the same site are having 30% to 35% common content. The reason behind this common content is that I put same data and images (in the main content area) since both pages are partially related. But, title tag, meta description, h1 tag, urls are different.
On-Page Optimization | | b.me
My questions are Can Google consider it as duplicate content?
Can it hamper the ranking of my pages ?
How can I deal with it?0 -
Help finding someone to handle crashing site/site optimization.
I need someone who can handle website/WordPress issues as they come up.For example, my site has gone down 4 times tonight, and my host can't figure it out. They also keep recommending that I optimize my site, but I don't know how. I need a go-to web person for this sort of thing. Any recommendations?
On-Page Optimization | | cbrant7770 -
Many have stolen our content. Rewrite vs. DMCA content removal?
Hello, We own a medical tourism website and many other sites have stolen (copied and pasted) our content. Our content is more than 2 years old, so we thought we could rewrite the content - but Which is a more wiser decision from you guys' experience? Archive our current content at a different URL and upload a fresh content in the current URL Claim our originality to Google and ask the stolen sites to remove our content. Thank you and appreciate your time.
On-Page Optimization | | joony0 -
Googlebot indexing URL's with ? queries in them. Is this Panda duplicate content?
I feel like I'm being damaged by Panda because of duplicate content as I have seen the Googlebot on my site indexing hundreds of URL's with ?fsdgsgs strings after the .html. They were beign generated by an add-on filtering module on my store, which I have since turned off. Googlebot is still indexing them hours later. At a loss what to do. Since Panda, I have lost a couple of dozen #1 rankings that I've held for months on end and had one drop over 100 positions.
On-Page Optimization | | sparrowdog0 -
Duplicate content shown in Google webmaster tools for 301 redirected URLs.
Why does Google webmaster tools shows 5 URLs that have been 301 redirected as having duplicate meta descriptions?
On-Page Optimization | | Madlena0 -
Duplicate Title & Content in WordPress
I'm getting a lot of Crawl Errors due to duplicate content and duplicate title because of category and tag posts in WordPress. I rebuilt the sitemap and said to exclude category and tags, should that clear up the issue? I've also went through and did NO INDEX and NO FOLLOW for all categories and posts. Any thoughts on this issue?
On-Page Optimization | | seantgreen0 -
Duplicate content problem
I am having an issue with duplicate content that I can't seem to figure out. I got rid of the www.mydomain.com by modifying the htaccess file but I can't figure out how to fix theproblem of mydomain.com/ and mydomain.com
On-Page Optimization | | ayetti0 -
Duplicate Page Content Issue
For one of our campaigns, we have 164 errors for Duplicate Page Content. We have a website where much of the same content lives in two different places on their website. The information needs to be accessible from both areas. What is the best way to tackle this problem? Is there anything that can be done so these pages are not competing against one another? If the only solution is to edit the content on one of the pages, how much of the content has to be different? Is there a certain percentage to go by? Here is an example of what I am referring to: 1.) http://www.valleyorthopedicassociates.com/services/foot-center/preventing-sprains-and-strains 2.) http://www.valleyorthopedicassociates.com/patient-resources/service/foot-and-ankle-center/preventing-sprains-and-strains
On-Page Optimization | | cmaseattle1