Duplicate Content - Bulk analysis tool?
-
Hi
I wondered if there's a tool to analyse duplicate content - within your own site or on external sites, but that you can upload the URL's you want to check in bulk?
I used Copyscape a while ago, but don't remember this having a bulk feature?
Thank you!
-
Great thank you!
I'll give both a go!
-
Great thanks
Yes I use screaming frog for this, but it was to look at actual page content. So yes to see if sites copy our content, but also to see whether we need to update our product content as some products are very similar.
I'll check the batch process on copyscape thanks!
-
I have not used this tool in this way, but have used it for other crawler projects related to content clean up and it is rock solid. They have been very responsive to me on questions related to use of the software. http://urlprofiler.com/
Duplicate content search is the project next on my list, here is how they do it.
http://urlprofiler.com/blog/duplicate-content-checker/
You let URL profiler crawl the section of your site that is most likely to be copied (say your blog) and you tell URL profiler what section of your HTML to compare against (i.e. the content section vs the header or footer). URL profiler then uses proxies (you have to buy the proxies) to perform Google searches on sentences from your content. It crawls those results to see if there is a site in the Google SERPs that has sentences from your content word for word (or pretty close).
I have played with Copyscape, but my markets are too niche for it to work for me. The logic here from URL profilers is that you are searching the database that most matters, Google.
Good luck!
-
I believe you might be able to use List Mode in ScreamingFrog to accomplish this, however it depends on ultimately what your goal is to check for duplicate content. Do you simply want to find duplicate titles or duplicate descriptions? Or do you want to find pages with sufficiently similar text as to warrant concern?
== Ooops! ==
It didn't occur to me that you were more interested in duplicate content caused by other sites copying your content rather than duplicate content among your list of URLs.
Copyscape does have a "Batch Process" tool but it is only available to paid subscribers. It does work quite nicely though.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content
Hello Moz Quick question. Can I copy and paste a paragraph of text (100 words) from my main category page into my products without hurting SEO of the category page? The content on my category page is so good I don't want to take chances as this is what I will be ranking for. Thanks
On-Page Optimization | | crocman0 -
Duplicate content affects on overall rankings
Hi guys, I have a website that has 23 pages with duplicate content. These pages serve the same function, which enables customers to upload their images. There is not much content on each one but we require a different page for each of our products, here is an example page: http://www.point101.com/giclee_printing/upload#/upload I don't think it makes sense to use a canonical tag as each page is for a different product and I think its going to be difficult to differentiate each page. I was wondering: 1. If this has a negative effect on the ranking of our homepage and other main product pages or if its an issue we do not need to worry too much about. 2. If anyone has any other ideas as to how we can resolve this issue. Thanks,
On-Page Optimization | | KerryK
Kerry0 -
Duplicate Page Content
Hey Moz Community, Newbie here. On my second week of Moz and I love it but have a couple questions regarding crawl errors. I have two questions: 1. I have a few pages with duplicate content but it say 0 duplicate URL's. How do I know what is duplicated in this instance? 2. I'm not sure if anyone here is familiar with an IDX for a real estate website. But I have this setup on my site and it seems as though all the links it generates for different homes for sale show up as duplicate pages. For instance, http://www.handyrealtysa.com/idx/mls...tonio_tx_78258 is listed as having duplicate page content compared with 7 duplicate URLS: http://www.handyrealtysa.com/idx/mls...tonio_tx_78247
On-Page Optimization | | HandyRealtySA
http://www.handyrealtysa.com/idx/mls...tonio_tx_78253
http://www.handyrealtysa.com/idx/mls...tonio_tx_78245
http://www.handyrealtysa.com/idx/mls...tonio_tx_78261
http://www.handyrealtysa.com/idx/mls...tonio_tx_78258
http://www.handyrealtysa.com/idx/mls...tonio_tx_78260
http://www.handyrealtysa.com/idx/mls...tonio_tx_78260 I've attached a screenshot that shows 2 of the pages that state duplicate page content but have 0 duplicate URLs. Also you can see somewhat about the idx duplicate pages. rel="canonical" is functioning on these pages, or so it seems when I view the source code from the page. Any help is greatly appreciated. skitch.png0 -
Empty public profiles are viewed as duplicate content. What to do?
Hi! I manage a social networking site. We have a lot of public user profiles that are viewed as duplicate content. This is because these users haven't filled out any public profile info and thus the profiles are "empty" (except for the name). Is this something I should worry about? If yes, what are my options to solve this? Thanks!
On-Page Optimization | | thomasvanderkleij0 -
New Client Wants to Keep Duplicate Content Targeting Different Cities
We've got a new client who has about 300 pages on their website that are the same except the cities that are being targeted. Thus far the website has not been affected by penguin or panda updates, and the client wants to keep the pages because they are bringing in a lot of traffic for those cities. We are concerned about duplicate content penalties; do you think we should get rid of these pages or keep them?
On-Page Optimization | | waqid0 -
Footer Content
We currently have footer content contained in a single php include file and is included in every page and contains the following: Most recent 3 tweets from our twitter feed Snippets of our 3 most recent blogs posts navigation links to our main pages (essentially the same as our main navigation in the header) Is this good/bad?
On-Page Optimization | | NeilD0 -
Duplicated Content with joomla multi language website
Dear Seomoz Community I am running a multi language joomla website (www.siam2nite.com) with 2 active languages. The first and primary language is english. the second language is thai. Most of the content (articles, event descriptions ...) is in english only. What we did is a thai translation for the navigation bars, headers, titles etc (translation of all joomla language files) those texts are static and only help the user navigate / understand our site in their thai language. Now I facing a problem with duplicated content. Lets take our Q&A component as example. the url structure looks like this: english - www.siam2nite.com/en/questions/ thai - www.siam2nite.com/th/questions/ Every question asked will create two URL, one for each language. The content itself (user questions & answers) is identical on both URL's. Only the GUI language is different. If you take a look at this question you will understand what i mean: ENGLISH VERSION: http://www.siam2nite.com/en/questions/where-to-celebrate-halloween-in-bangkok THAI VERSION: http://www.siam2nite.com/th/questions/where-to-celebrate-halloween-in-bangkok As you can see each page has a unique title (H1) and introduction text in the correct language (same for menu, buttons, etc.) but the questions and answers are only available in one language. Now my question 😉 I guess Google will see this pages as duplicated content. How should I proceed with this problem: put all thai links /th/questions/ in the robots.txt and block them or make a canonical tag for the english versions? Not sure if I set a canonical tag google will still index the thai title and introduction texts (they have important thai keywords in them) Would really appreciate your help on this 😉 Regards, Menelik
On-Page Optimization | | menelik0 -
Page speed tools
Working on reducing page load time, since that is one of the ranking factors that Google uses. I've been using Page Speed FireFox plugin (requires FireBug), which is free. Pretty happy with it but wondering if others have pointers to good tools for this task. Thanks...
On-Page Optimization | | scanlin0