Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicated Content on Wordpress Mobile&Desktop versions- Is it bad for SEO?
Hello, I use a Wordpress theme for displaying mobile and desktop versions separately. The problem is that if you use tools like screaming frog or if you look at the code (view source), you can detect duplicated content. But if you're browsing on your mobile you will see only the content I have created for the mobile version and the same if you're looking at the desktop version, you will only see the desktop content.
On-Page Optimization | | AlphaRoadside
Is this creating an SEO problem? If it is, please let me know why and if it has a solution. Thanks in advance.0 -
Is minor duplicate content on my website okay?
I know duplicate content across multiple websites is not a good thing, however I've always wondered about minor duplicate content on your own website. I know its good practice to have unique content on each page but what about the little stuff. For example on our website certain related pages share the same content in a right sidebar. Such as links to pdf leaflets, or "you can read our blog etc" . Is there a minimum number of repeated words required before its flagged as duplicate content? Another example is a customer gave two testimonials for two of our employees - the testimonials were identical other than the employee names - if these were posted on separate pages is it a problem for the site as a whole or for both those individual pages? Thanks
On-Page Optimization | | Brabian0 -
Content in Tabs
I speed read an article recently and forgot to save it regarding Contents on a page in tabs. Is it correct that now Google is rendering the entire page it's better not to have content in tabs hidden by Javascript? As it stands at the moment, we've got the tabs set-up so that the main part of the page containing the keyword rich text is in a tab and not the first thing presented to the user
On-Page Optimization | | Ham19790 -
Duplicate content on partner site
I have a trade partner who will be using some of our content on their site. What's the best way to prevent any duplicate content issues? Their plan is to attribute the content to us using rel=author tagging. Would this be sufficient or should I request that they do something else too? Thanks
On-Page Optimization | | ShearingsGroup0 -
Issue: Duplicate Page Content
For duplicate page content, how different should pages be? For example, I have seven locations and on each location page, we offer a discount. The discounts are the same currently and open into a pop-up window. So it looks something like this: mysite.com/locationA/dicount mysite.com/locationB/discount mysite.com/locationX/discount The pages are identical. Should I change the verbiage on each page or let it be? I noticed that our organic search rankings have dropped since our site upgrade and this is one item that SEOMOZ has noted. Thanks! DHO
On-Page Optimization | | DougHoltOnline0 -
Is this duplicate content okay?
We have a client who wants to rank locally, nationally and internationally for their products. I wrote a line that goes, "We can ship our products to you whether you’re here in Illinois, nationwide, or international." I added that line after a paragraph or two of unique product description on each of their 30-odd product pages. Will this damage their ranking? I tried researching this but only found full page duplicate content topics. Any advice would be great.
On-Page Optimization | | optimalwebinc0 -
What is this site all about?
Can anyone here tell me what this site is all about?... I really need to know, i have a blog.
On-Page Optimization | | Wales33030 -
Should H1s be used in the logo? If they are and it is dynamic on each page to relate to the page content, is this detrimental to the site rather than having it in the page content?
On some sites, the H1 is contained within the logo and remains consistent throughout the site (i.e. the company name is in the of the logo). If the h1 in a logo is dynamic for each page (i.e. on the homepage it is company name - homepage) is this better or worse to have it changed out on the logo rather than having it in the page content?
On-Page Optimization | | CabbageTree0