Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Potential duplicate content issue?
We have a category on our website for PVC rolls to buy as standard 50m rolls (this includes 15 products in the category). We're also releasing PVC rolls to buy per metre (10m roll/25m roll etc...), again with 15 products, which we are adding as a separate category as it makes more sense for our customers and removes the risk of having too many options. Would using the same description be bad practice for SEO? The product is exactly the same just available in different roll sizes, but we definitely do not want to combine categories as it doesn't work for our customers. Any help or suggestions would be appreciated, thanks.
On-Page Optimization | | RayflexGroup0 -
Should I be worried about our 'Duplicate' content
Hi guys... I've just been working through some issues to give our site a little cleanup. I'm working through our duplicate content issues (we have some legitimate duplicate pages that need removing, and some of our dynamic content is problematic. Are web developers are going to sort with canonical tags this week.) However... There are some pages that are actually different products, but are very similar pages that are 'triggering' MOZ to say we have duplicate pages. Here an example... http://www.toaddiaries.co.uk/filofax-refills/filo-12-month-inserts-personal-size/fortnight-view-filofax-personal and http://www.toaddiaries.co.uk/filofax-refills/filo-12-month-inserts-personal-size/week-to-a-view-filofax-personal They are very similar refill products, it's just the diary format is different. Question: Should I be worried about this? I've never seen our rankings change in the past when 'cleaning up' duplicate content. What do you guys think? Isaac.
On-Page Optimization | | isaac6630 -
Ecommerce product page duplicate content
Hi, I know this topic has been covered in the past but I haven't been able to find the answers to this specific thing. So let's say on a website, all the product pages contain partial duplicate content - i.e. this could be delivery options or returning policy etc. Would this be classed as duplicate content? Or is this something that you would not get concerned about if it's let's say 5-10% of the content on the page? Or if you think this is something you'd take into consideration, how would you fix it? Thank you!
On-Page Optimization | | MH-UK0 -
Duplicate Content
I run a Business Directory, Where all the businesses are listed. I am having an issue.. with Duplication content. I have categories, Like A, B, C Now a business in Category A, User can filter it by different locations, by State, City, Area So If they filter it by the State and the State has 10 businesses and all of them are in one City. Both of the page
On-Page Optimization | | Adnan4SEO
The state filtered and the city filtered are same, What can i do to avoid that? canonical-url-tag or changing page Meta's and Body text? Please help 🙂0 -
I have a lot of internal duplicate content as intros to a series of articles, is this bad?
On a site that I'm working on there is a series of posts with the same beginning to their titles. All of the titles start with Christ's Church ("Mormons"): And then about the first four paragraphs of all these posts is exactly the same, it is just explaining this series of posts. I'll link to a couple of examples so you know what I'm talking about. I know there are several other problems with these posts/site 🙂 but I am specifically curious about the partial duplicate title and the first few paragraphs being duplicate. http://www.mormonchurch.com/3259/christs-church-mormons-helping-out-a-friend http://www.mormonchurch.com/2969/christs-church-mormon-happiness-is-found-only-through-christ There are about 30 posts similar to these. Thank you, I look forward to your responses.
On-Page Optimization | | ThridHour1 -
Duplicating content on multiple domains
Hey guys, I've started working with a new client recently called Resource Investing News. I'm more a Social Media person, though I do have SEO experience. RIN has about 40 URLs all of which have original news content published on them. One SEO-related issue that I can see here though is that the primary domain re-publishes all of the original content that the other URLs do. In other words: resourceinvestingnews.com will have an article on it that is also published on goldinvestingnews.com with the same date stamp and a link out to the original article. E.g. http://resourceinvestingnews.com/42539-molybdenum-goes-far-beyond-steelmaking.html http://molyinvestingnews.com/5301-molybdenum-steelmaking-vehicle-demand-electronics-lubricant.html Does anyone have an idea if this is something that should be reviewed and/or whether the content is being negatively affected in search? Many thanks!
On-Page Optimization | | blahblahblah20150 -
How Do You Build Good Quality Content on an E-Commerce Site with over 1500 products?
I've been told for a while now that quality is king, and as far as the top level down goes, I have improved the quality of these pages incredibly, working from the landing page down to the categories. Now the problem comes when I know that if I describe each product dynamically with great content, this give the best result, yet doing this on over 1500 products is just a mammoth task for anyone, isn't it? I want a way to have good quality content on each of those pages without writing a million words.... is this possible?
On-Page Optimization | | frank-2443750 -
If a site has https versions of every page, will the search engines view them as duplicate pages?
A client's site has HTTPS versions of every page for their site and it is possible to view both http and https versions of the page. Do the search engines view this as duplicate content?
On-Page Optimization | | harryholmes0070