Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is it better to create more pages of content or expand on current pages of content?
I am assuming that one way of improving the rankings of current pages will be to create more content on the keywords used... should this be an expansion of the content on current pages I am optimising for a keyword or is it better to keep creating new pages and if we are creating new pages is it best to use an extension of the keyword on the new page – for example if we are optimising one page for ‘does voltage optimisation work’ would it then be worth creating a page optimised for ‘does voltage optimisation work in hotels’ for example and so on? I am guessing maybe both might help, this is just a question I have had from one of my clients.
On-Page Optimization | | TWSI1 -
Duplicate and thin content - advanced..
Hi Guys Two issues to sort out.. So we have a website that lists products and has many pages for: a) The list pages - that lists all the products for that area.
On-Page Optimization | | nick-name123
b) The detailed pages - that when click into from the list page, will list the specific product in full. On the list page, we perhaps have half the description written down, when clicked into you see the full description.
If you search in google for a phrase on the detailed page, you will see results for that specific page including 'multiple' list pages where it is on. For example, lets say we are promoting 'trees' which are situated in Manhatten. And we are also promoting trees in Brooklyn, there is a crossover. So a tree listed in Manhatten will also be listen in brooklyn as its close by (not from America so don't laugh if I have areas muddled)
We then have quite a few pages with the same content as a result. I read a post a while back from the mighty Cutts who said not to worry about the duplicate unless its spammy, but what is good for one person, is spammy to another.. Does anyone have any ideas as to if this is a genuine problem and how you would solve? Also, we know we have alot of thin content on the site, but we dont know how to identify it. It's a large site so needs something automated (I think).. Thanks in advance Nick0 -
Content with changing URL and duplicate content
Hi everyone, I have a question regarding content (user reviews), that are changing URL all the time. We get a lot of reviews from users that have been dining at our partner restaurants, which get posted on our site under (new) “reviews”. My worry however is that the URL for these reviews is changing all the time. The reason for this is that they start on page 1, and then get pushed down to page 2, and so on when new reviews come in. http://www.r2n.dk/restaurant-anmeldelser I’m guessing that this could cause for serious indexing problems? I can see in google that some reviews are indexed multiple times with different URLs, and some are not indexed at all. We further more have the specific reviews under each restaurant profile. I’m not sure if this could be considered duplicate content? Maybe we should tell google not to index the “new reviews section” by using robots.txt. We don’t get much traffic on these URLs anyways, and all reviews are still under each restaurant-profile. Or maybe the canonical tag can be used? I look forward to your input. Cheers, Christian
On-Page Optimization | | Christian_T2 -
Empty public profiles are viewed as duplicate content. What to do?
Hi! I manage a social networking site. We have a lot of public user profiles that are viewed as duplicate content. This is because these users haven't filled out any public profile info and thus the profiles are "empty" (except for the name). Is this something I should worry about? If yes, what are my options to solve this? Thanks!
On-Page Optimization | | thomasvanderkleij0 -
Is there a way to tell Google a site has duplicated content?
Hello, We are joining 4 of our sites, into 1 big portal, and the content from each site gonna be inside this portal and sold as a package. We don't wanna kill these sites we are joining at this moment, we just wanna import their content into the new site and in a few months we will be killing them. Is there a way to tell Google to not consider the content on these small sites, so the new site don't get penalised? Thanks,
On-Page Optimization | | darkmediagroup0 -
Does this site have a duplicate content issue?
Google WMT is showing me only 2 short meta descriptions under "HTML Improvements" but I believe http://www.customgia.com may have a content duplication issue. Numerous keywords are used repeatedly across many product descriptions. To make matters worse, every product page has a "Design It!" button that sends the user to a flash-based jewelry designer in which they can edit the product's appearance. I'm not sure if these "designer pages" are adding unnecessary and potentially damaging duplicate content but it's certainly a possibility. There are many items on this site that are similar to one another but not the same. The product description tend to use the same phrases over and over again - words like crystal, Swarovski, beaded, design it, customize, change, pearl, glass beads, iridescent, pearl, drop earrings are used a lot. What I'm stuck on is whether or not I should be focusing on a content duplication issue as the primary SEO problem or if there is something bigger. Thank you for any assistance you can provide!
On-Page Optimization | | rja2140 -
Duplicate content "/"
Hi all, Ran my website through the SEOMOZ campaigns and the crawl diagnostics give me a duplicate error for these urls http://www.mysite.com/cat1/article http://www.mysite.com/cat1/article/ so the url with the "/" is a duplicate of the one without the "/" Can someone point me out to a solution to solve this ? regards, Frederik
On-Page Optimization | | frdrik1230 -
Tools for finding duplicate content offsite?
Hi is there a tool that will spider my site then find similar text on external sites?
On-Page Optimization | | adamzski0