Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Word Count - Content site vs ecommerce site
Hi there, what are your thoughts on word count for a content site vs. an ecommerce site. A lot of content sites have no problem pushing out 500+ words per page, which for me is a decent amount to help you get traction. However on ecommerce sites, a lot of the time the product description only needs to be sub-100 words and the total word count on the page comes in at under 300 words, a lot of that could be considered duplicate. So what are your views? Do ecommerce sites still need to have a high word count on the product description page to rank better?
On-Page Optimization | | Bee1590 -
Duplicate content from page links
So for the last month or so I have been going through fixing SEO content issues on our site. One of the biggest issues has been duplicate content with WHMCS. Some have been easy and other have been a nightmare trying to fix. Some of the duplicate content has been the login page when a page requires a login. For example knowledge base article that are only viewable by clients etc. Easily fixed for me as I dont really need them locked down like that. However pages like affiliate.php and pwreset.php that are only linked off of a page. I am unsure how to take care of these types. Here are some pages that are being listed as duplicate: Should this type of stuff be a 301 redirect to cart.php or would that break something. I am guessing that everything should point back to cart.php.
On-Page Optimization | | blueray
https://www.bluerayconcepts.com/brcl...art.php?a=view
https://www.bluerayconcepts.com/brcl...php?a=checkout These are the ones that are really weird to me. These are showing as duplicate content but pwreset is only a link of the KB category. It shows up as duplicate many times as does affilliate.php: https://www.bluerayconcepts.com/brcl...ebase/16/Email
https://www.bluerayconcepts.com/brcl...16/pwreset.php Any help is overly welcome.0 -
I am trying to better understand solving the duplicate content issues highlighted in your recent crawl report of our site - www.thehomesites.com.
Below are some of the urls highlighted as having duplicate content -
On-Page Optimization | | urahul
http://www.thehomesites.com/zip_details/76105
http://www.thehomesites.com/zip_details/44135
http://www.thehomesites.com/zip_details/75227
http://www.thehomesites.com/zip_details/94501 These are neighborhood reports generated for 4 different zip codes. We use a standard template to create these reports. What are some of the steps we can take to avoid these pages being categorized as duplicate content?0 -
Duplicate page content
Hi Crawl errors is showing 2 pages of duplicate content for my clients WordPress site: /news/ & /category/featured/ Yoast is installed so how best to resolve this ? i see that both pages are canonicalised to themselves so presume just need to change the canonical tag on /category/featured/ to reference /news/ ?(since news is the page with higher authority and the main page for showing this info) or is there other way in Yoast or WP to deal with this & prevent from happening again ? Cheers Dan
On-Page Optimization | | Dan-Lawrence0 -
Duplicate Content Indentification Tools
Does anyone have a recommendation for a good tool that can identify which elements on a page are duplicated content? I use Moz Analytics to determine which pages have the duplicated content on them, but it doesn't say which pieces of text or on-page elements are in fact considered to be duplicate. Thanks Moz Community in advance!
On-Page Optimization | | EmpireToday0 -
Duplicate Content on Category Pages
Hi Everyone, I have a few category pages within a category for my eCommerce store and I've recently started writing a short description for each. However a lot of these paragraphs can be replicated for the same category. For instance '1 Inch thickness' I'll show all the information, and it'll be very similar to '2 inch thickness' but obviously one is 1 inch and one is 2 inch so I would only be changing one keyword and that is the thickness. I feel that this is helping customers because it has all the information in each category e.g. how to filter your choices. But it might be duplicate content. What would you recommend?
On-Page Optimization | | EcomLkwd0 -
Summarize your question.Images being seen as duplicate content/pages
My images suddenly are appearing in my crawl reports as duplicate content, without meta tags, this happened over night and cant figure out why.
On-Page Optimization | | RBYoung0 -
Best Way to check for duplicate pages
With Google's updates we know they want to clean out duplicate content. i have been seeing the same crap spit out even word for word on different sites. Anyway how do you experienced SEO people test for dups on your own site as well as other sites. The only thing i can come up with is paying copyscape 5 cts a test. There has to be other ways. Advise/
On-Page Optimization | | joemas990