Checking for content duplication against content on your own site.
-
We are currently trying to rewrite our product descriptions and I'm afraid some of the salespeople that are writing the descriptions are plagiarizing one-another's writing. Is there a content duplication checker that will allow you to check a piece of writing against a specific site rather than all of the web?
-
I assume that you have an admin section in the CMS where you are editing and entering these articles before they go live.
You need to get a developer to simply write a search algo that when you create a new article and before it goes live, it takes sections of your content and looks for matches/duplicates. You can set a requirement that it has to match on a minimum of a 4 to 5 word string and other such limitations to make sure you are not matching too many items. It will take a few tests to find a sweet spot of too many matches vs not enough.
With 17K pages, this is the only way you can really do this in an efficient way, you need some IT support/development. They may have to create a reporting layer as well to help you sift through the results.
Good luck.
-
I have two dev servers, one of which it is possible to do what you're talking about but that is the absolute least efficient tool to use for this.
The crawl diagnostics are updated about once a week which means I would have to post the new content and hope I got it online in time for the crawl. If I didn't then I would have to wait an additional week to see results.
The crawl diagnostics also limits the amount of pages it will crawl on your site to 10,000. I stated before that I have over 17,000 pages. So even if I did use this method, the chances of that page being crawled is little better than 50/50.
Also, the crawl diagnostics only tell you what pages have duplicate content - not the exact content that was duplicated. That means I'd have to manually find the page I'm targeting, then follow the supposed duplicate content suggestions proposed by the crawler and find the similarities myself.
I think it's very safe to say that the crawl diagnostics, nor any product that SEOmoz provides, is an answer to my issue. If I thought it was, I would have already been using it and would not have posted this question.
-
Hi Michael,
Having a website that big means that you might have a test or dev environment.
If not create one.
if you have something like test.yourwebsite.com and submit it to the SEOmoz tools as a new project you can see a report before your website goes live.
Cornel
-
Those are good answers and would work on a smaller scale site. We currently have over 17,000 product pages so I can't really use either method. It's looking like a google custom search is the best bet even though I can't search an entire paragraph at a time.
-
Just off the top of my head, there are a few low tech ways to do it....
If you have Win 7 the searching has improved greatly - just move all files to a local machine - and search the directory you placed in for the content you are wanting to check - it will give all files that contain the words. (but can become overloading)
If you have dreamweaver or other enterprise level editor - almost all have a site search function to where you can search/profile code/text and have it find one by one which pages contain the searched terms - or globally list them.
Other than that, probably a custom script -or a google search for an HTML profiler might help?
Shane
-
That's for pages that are already published and crawled. I want to able to search my site for entire sentences and/or paragraphs of text that I have yet to publish so I can make sure it's not being used elsewhere on the site. The crawl diagnostics tell me I have duplicate content after the fact - I'm trying to take a proactive approach rather than reactive.
-
The duplicate content from you website is shown in the SEOmoz tools.
Check the Crawl Diagnostics Summary:
Cornel
-
That site searches the entire web for copies. I'm looking for something to crawl my own site for duplicate content.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
If I host same video on my site which my manufacturer hosted on his site will be consider as duplicate?
Hello All, My manufacturer hosted video's on his site now if I host same video on my ecommerce site will it be consider as duplicate? any penalty ? Any suggestion pls? Thanks!
On-Page Optimization | | pragnesh96390 -
Duplicate Content, http vs https
Hi All! I just discovered that a client of ours a duplicate content issue. Essentially they have approximately 20 pages that have an http and an https version. Is there a better way to handle this than a simple 301? Regards, Frank
On-Page Optimization | | FrankSweeney0 -
Duplicate Issue
Hello Mozzers! We have a client going through a website revamp. The client is The Michelangelo Hotel, and they are part of Star Hotels. Star Hotels plans to create a section on their site for The Michelangelo, as opposed to maintaining a stand alone site. They will then take the michelangelohotel.com domain, and point it to the corresponding pages on the Star site. The guest will key in www.michelangelohotel.com, and will see the same content that can be found on www.starhotel.com/en/michelangelo-hotel-new-york. The problem we have is this: Essentially the same content will be indexed twice, once on starhotels.com and once on michelangelohotel.com. This would seem to cause a duplicate content issue. What are your thoughts? Edit: I apologize, because I was not nearly clear enough here. The Star Hotels site will have 5 pages dedicated to The Michelangelo Hotel. The content will sit solely on that server as those 5 pages. Those 5 pages will each be indexed as 2 URLs. www.michelangelohotel.com <-> www.starhotels.com/en/michelangelo/ www.michelangelohotel.com/accommodations <-> www.starhotels.com/en/michelangelo/accommodations And so on. Thanks!
On-Page Optimization | | FrankSweeney0 -
Index Page Content
Mozers, I am of the believe and as a person who puts the utmost emphasis on the index page of any website I am trying to rank, especially with a new domain ... insuring content is relevant, structured, optimized and we have some link juice flowing in. I find once we get the index page ranked, Google's little bots then start to index and rank accordingly the rest of the website ... and we start producing results. We also develop websites (dare I say its where we expertise in) and unexpectantly the client has asked us to carry out SEO work additionally to their web development. Problem lies here, their index page, has absolutely no written content at all, just one large image with a logo (Fashion Website) ...Which I identify as a huge issue as per my explanation is paragraphs one or two. I am sure withe the many more qualified SEO experts and gurus within the SEOmoz community, you have also come across this issue So a few questions, if you don't mind adding advice. 1 - Am I putting too much emphasize on content within the index page, in terms of indexing and actually ranking ...yes I appreciate that terms within the website will be ranked against other pages other than the index page, but will it harm us for having no content at all within the index page 2 - If so, and yes is the answer to above, how do we handle it, we have spoke with the client and he is pretty adamant that he want the index page as is, he has been through out the whole website building process. As suggested, any advice would be really appreciated, its a difficult market to rank within a it is, and i can only see this index page making the task a lot more difficult Cheers John
On-Page Optimization | | Johnny4B0 -
How Can I Fix Adobe Bridge Photo Galleries and Duplicate Content?
I have used the Adobe bridge program for a number of photo galleries on a remodeling site and it is showing a large amount of duplicate titles, etc. Is there an easy fix to this? anyone?
On-Page Optimization | | DaveBrown3330 -
How to avoid duplicates when URL and content changes during the course of a day?
I'm currently facing the following challenge: Newspaper industry: the content and title of some (featured) articles change a couple of times during a normal day. The CMS is setup so each article can be found by only using it's specific id (eg. domain.tld/123). A normal article looks like this: domain.tld/some-path/sub-path/i-am-the-topic,123 Now the article gets changed and with it the topic. It looks like this now: domain.tld/some-path/sub-path/i-am-the-new-topic,123 I can not tell the writers that they can not change the article as they wish any more. I could implement canonicals pointing to the short url (domain.tld/123). I could try to change the URL's to something like domain.tld/some-path/sub-path/123. Then we would lose keywords in URL (which afaik is not that important as a ranking factor; rather as a CTR factor). If anyone has experiences sharing them would be greatly appreciated. Thanks, Jan
On-Page Optimization | | jmueller0 -
Creating Duplicate Content on Shopping Sites
I have a client with an eCommerce site that is interested in adding their products to shopping sites. If we use the same information that is on the site currently, will we run into duplicate content issues when those same products & descriptions are published on shopping sites? Is it best practice to rewrite the product title and descriptions for shopping sites to avoid duplicate content issues?
On-Page Optimization | | mj7750 -
Duplicate Links
Hello, I am entering sitewide navigation that will go to primary seo pages. This is really for usability, not for link juice. I'm wondering if I should still link to these very important pages in my index page's content. Or if I should consider those navigation links strong enough. If I did link in the content, then I would have more than one link to the same page on my home page. Thanks Tyler
On-Page Optimization | | tylerfraser0