Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Hi - How do you get rid of duplicate content that was accidentally created on a tag url? For example, when I published a new article, the content was duplicated on: /posts/tag/lead-generation/
the original article was created with: /posts/shippers-looking-for-freight-brokers/ How can I fix this so a new URL is not created every time I add a tag to a new posting?
On-Page Optimization | | treetopgrowthstrategy0 -
Duplicate Content
I run a Business Directory, Where all the businesses are listed. I am having an issue.. with Duplication content. I have categories, Like A, B, C Now a business in Category A, User can filter it by different locations, by State, City, Area So If they filter it by the State and the State has 10 businesses and all of them are in one City. Both of the page
On-Page Optimization | | Adnan4SEO
The state filtered and the city filtered are same, What can i do to avoid that? canonical-url-tag or changing page Meta's and Body text? Please help 🙂0 -
Product Attribute pages and Duplicate content
Hiya I have two queries is about a jewellery shop running on wordpress and woocommerce. 1. I am a little indecisive on how to index the product categories without creating duplicate pages which will get me into trouble. For example: All earrings are listed on the category page: chainsofgold.co.uk/buy/earrings/ We also have product attribute pages which lists all the subcategories for the earrings: chainsofgold.co.uk/earrings/creoles/
On-Page Optimization | | bongoheads
chainsofgold.co.uk/earrings/drop/
chainsofgold.co.uk/earrings/studs/ I have the category URL and the product attribute URLs set to be indexed on my sitemaps. Will this get me into trouble creating duplicate content with the main category page? Should I only have the main category indexed and "no-index, follow" all the product attribute pages? 2. I am also thinking about incorporating these product attribute URLS into my menu so when people hover over earrings they get shown the types of earrings they can buy. However, I have the woocommerce faceted navigation working on the category pages. So if someone is visiting the page chainsofgold.co.uk/buy/earrings/ The user can click on the left hand side, and select "drops". The URL they will get though is one which is not indexed: http://www.chainsofgold.co.uk/buy/earrings/?filter_earrings=123 Can I link to those product attribute pages without the risk of getting accused of creating duplicate content? Thank you for your help. Carolina0 -
"Turning off" content to a site
One site I manage has a lot of low quality content. We are in the process of improving the overall site content but we have "turned off" a large portion of our content by setting 2/3 of the posts to draft. Has anyone done this before or had experience with doing something similar? This quote from Bruce Clay comes to mind: “Where a lot of people don’t understand content factoring to this is having 100 great pages and 100 terrible pages—they average, when the quality being viewed is your website,” he explained. “So, it isn’t enough to have 100 great pages if you still have 100 terrible ones, and if you add another 100 great pages, you still have the 100 terrible ones dragging down your average. In some cases we have found that it’s much better, to improve your ranking, to actually remove or rewrite the terrible ones than add more good ones.” What are your thoughts? Thanks
On-Page Optimization | | ThridHour0 -
Do https sites rank as well as http sites?
2 Questions: Question 1 - We currently have our entire site running on https (the http pages 301-redirect to the https versions). Assuming that the https pages load as quickly as the http versions, is it a problem that the entire site is https? The only official answer I've been able to find is this 2011 video where Matt Cutts basically says "I don't know" - http://www.youtube.com/watch?v=xeFo4ytOk8M Question 2 - Is there any problem with having half our site running on https only (with the http pages redirected), and the other half (our blog) running on http only (with all https blog pages redirected to the http versions)? Thanks in advance for any input! Justin
On-Page Optimization | | JustinClark0 -
Creating a product per size causing duplicate content problems?
I have an e-commerce site and in order to receive a listing for each size and color in Google Merchant, I've created a new product for each size and color. The problem is that since I did this, the canonical tags aren't correct and there isn't a way to change them manually with the platform I'm on. I feel like this is one of the main reasons I've been dropping in the rankings. Should I delete all duplicate products? The system will take care of canonical tags automatically when creating a new size/color within the system (how it's supposed to be created) but the canonical tags become messy when I duplicate a product and edit the size/color to create a "whole new product". Here is an example of what I'm referring to: http://www.carbonconnection.com/search.php?search_query=nalini+rigel&x=0&y=0 (this problem actually isn't mine, it's a friend's but for the sake of simplicity and gaining a second opinion to be sure before he redoes all of his products, I'm asking as though it were my issue)
On-Page Optimization | | EmdeS0 -
Seasonal site structure
Bit of a complicated one for anyone who likes a challenge.. We sell a range of products which are very seasonal, so therefore have a seasonal section within the store with the products categorized into their relevant categories. In additon to this i wanted to also create a feature of each season so in effect pull forward on to a new tab the relevant season ie: Valentine so that customers didn't have to hunt for the products by going via seasonal shop etc The problem is that my site urls display last-category/product-title so in effect as the seasons change these urls will be deleted. They do remain elsewhere in our catalogue.. Does this make sense?
On-Page Optimization | | LadyApollo0 -
Best practice for franchise sites with duplicated content
I know that duplicated content is a touchy subject but I work with multiple franchise groups and each franchisee wants their own site, however, almost all of the sites use the same content. I want to make sure that Google sees each one of these sites as unique sites and does not penalize them for the following issues. All sites are hosted on the same server therefor the same IP address All sites use generally the same content across their product pages (which are very very important pages) *templated content approved by corporate Almost all sites have the same design (A few of the groups we work with have multiple design options) Any suggestions would be greatly appreciated. Thanks Again Aaron
On-Page Optimization | | Shipyard_Agency0