Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Content above the Fold, or Below
Hi, I have an ecommerce site with several categories that I consider good landing pages. In order to get better search results I add content to these pages, usually above the fold, then after the content products are listed. Example:https://www.carburetor-parts.com/Carburetor-Kits_c_568.html I worry that customers get to the page and since they don't see the products above the fold, they move on. Should I be putting content in the footer instead of the header and if so how does that effect SEO? This has been bugging me for a long time. Thanks
On-Page Optimization | | MikeCarbs
Mike0 -
Hi i have a few pages with duplicate content but we've added canonical urls to them, but i need help understanding what going on
hi google is seeing many of our pages and dupliates but they have canonical url on there https://www.hijabgem.com/index.php/maxi-shirt-dress.html has tags https://www.hijabgem.com/maxi-shirt-dress.html
On-Page Optimization | | hijabgem
has tagshttps://www.hijabgem.com/index.php/quickview/index/view/id/4693
has tags
my question is which page takes authority?and are they setup correct, can you have more than one link rel="canonical" on one page?0 -
Word Count - Content site vs ecommerce site
Hi there, what are your thoughts on word count for a content site vs. an ecommerce site. A lot of content sites have no problem pushing out 500+ words per page, which for me is a decent amount to help you get traction. However on ecommerce sites, a lot of the time the product description only needs to be sub-100 words and the total word count on the page comes in at under 300 words, a lot of that could be considered duplicate. So what are your views? Do ecommerce sites still need to have a high word count on the product description page to rank better?
On-Page Optimization | | Bee1590 -
Does hreflang restrain my site from being penalized for duplicated content?
I am curently setting up a travel agency website. This site is going to be targeting both american and mexican costumers. I will be working with an /es subdirectory. Would hreflang, besides showing the matching language version in the SERP´s, restrain my site translated content (wich is pretty much the same) from being penalized fro duplicated content? Do I have to implement relcannonical? Thank ypu in advanced for any help you can provide.
On-Page Optimization | | kpi3600 -
Duplicate Content
I'm currently working on a site that sells appliances. Currently, there are thousands of "issues" with this site, many of them dealing with duplicate content. Now, the product pages can be viewed in "List" or "Grid" format. As Lists, they have very little in the way of content. My understanding is that the duplicate content arises from different URLs going to the same site. For instance, the site might have a different URL when told to display 9 items than when told to display 15. This could then be solved by inserting rel = canonical. Is there a way to take a site and get a list of all possible duplicates? This would be much easier than slogging through every iteration of the options and copying down the URLs. Also, is there anything I might be missing in terms of why there is duplicate content? Thank you.
On-Page Optimization | | David_Moceri0 -
Is This A Reason To Move Content?
Dear All, I am questioning my initial decisions when I planned a site due to reading lots of info on moz. Although what I have read has made me question what I have already done, I can't find anything that is specific to my exact case, so here goes. I recently built a shopping cart in OpenCart. I want the site to have lots of information on the products it sells. I have populated each category with at least 1000 words of content that is specific to the products in that category, also I have some information pages that have no products in them at all, just copy. So the shopping site actually has a few pages that look like a static website and a few that look like a normal shopping cart. My thought behind this was I wanted the pages with lots of info to rank and become authoritative, in some way elevating the whole site. I have recently put a blog on the site, and a combination of that, and reading Moz has lead me think that I should move all the content from the category pages to the blog, and deep link each blog post to it's relevant products and category. From what I have read it would be easier to get the blog ranking and acknowledged as an authority rather than 30 category pages. Also each 1500+ word category page will make at least 3-4 nice blog posts, and each post can be focused on a single keyword rather than a large category page that has maybe 3-4 keywords it's trying to rank for. Also the blog is much better optimised than a standard OC category page (even using extensions with them). The only negative I can see is moving the content, but the site is less that 2 months old, and the amount of link juice it has is negligible. Does google cut new sites a bit of slack in these situations of moving content around, or will I be seen as 'up to something' by google? I guess my question is, am I barking up the right tree? Or is the old adage 'a little information is dangerous' true in this case, and I just about to make a load of work for the sake of it with no real benefit. However, if I am to make such a dramatic change to the sites architecture I think the time is now, before things start gaining juice & rank. I hope I have explained my situation clearly and I thank anyone who can offer me any advice. Great forum, Thank you, Ian
On-Page Optimization | | cookie7770 -
Why is my site not ranking?
Could you please help me understand what is wrong with this site: www.award-certificates.com It simply isn't ranking after about 3 years and I am not so sure what I can do to improve it.
On-Page Optimization | | nicolebd0 -
Duplicate Page Title
I have a dating site, it's got a lot of duplicate page titles, most of them are the language buttons for the users to view the site in there language. but I think it's obvious that the buttons don't have anything to do with it. I'm thinking that page tittle is basically a description of what the site is. like for an example "online-dating" is this it? please tell me in terms for a dummy, how to fix it.
On-Page Optimization | | clickit2getwithit0