Identifying Duplicate Content
-
Hi looking for tools (beside Copyscape or Grammarly) which can scan a list of URLs (e.g. 100 pages) and find duplicate content quite quickly.
Specifically, small batches of duplicate content, see attached image as an example.
Does anyone have any suggestions?
Cheers.
-
I'm going to recommend Screaming Frog here. Run a scan of your site and then filter it by duplicate title tags, duplicate meta descriptions, and (my favorite) word count. Usually I don't need to go any further than duplicate title tags.
There's also www.siteliner.com. I've used that regularly and it has been tremendously helpful for pages that have duplicate content in the body but not in the META.
Finally, Google Search Console. Go to Search Appearance and click on HTML Improvements. You can also find all your duplicate title tags there, which should help you identify duplicate content easily.
-
Exactly what i was looking for!
Thankyou.
-
Hi Jay! Great question here.
First of all, kudos to you for looking to kill duplicate content with fire. As a marketer but foremost a writer, I am all about great writing and not doing this duplicated/spun stuff to try to rank. It won't convert anyways.
I put out a call to my followers on Twitter and one of them recommended https://www.killduplicate.com/en. I haven't personally used it, but give it a shot! It comes highly recommended.
Hope that's helpful!
John
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to avoid duplicate content
Hi there, Our client has an ecommerce website, their products are also showing on an aggregator website (aka on a comparison website where multiple vendors are showing their products). On the aggregator website the same photos, titles and product descriptions are showing. Now with building their new website, how can we avoid such duplicate content? Or does Google even care in this case? I have read that we could show more product information on their ecommerce website and less details on the aggregator's website. But is there another or better solution? Many thanks in advance for any input!
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Galleries and duplicate content
Hi! I am now studing a website, and I have detected that they are maybe generating duplicate content because of image galleries. When they want to show details of some of their products, they link to a gallery url
Intermediate & Advanced SEO | | teconsite
something like this www.domain.com/en/gallery/slide/101 where you can find the logotype, a full image and a small description. There is a next and a prev button over the slider. The next goes to the next picture www.domain.com/en/gallery/slide/102 and so on. But the next picture is in a different URL!!!! The problem is that they are generating lots of urls with very thin content inside.
The pictures have very good resolution, and they are perfect for google images searchers, so we don't want to use the noindex tag. I thought that maybe it would be best to work with a single url with the whole gallery inside it (for example, the 6 pictures working with a slideshow in the same url ), but as the pictures are very big, the page weight would be greater than 7 Mb. If we keep the pictures working that way (different urls per picture), we will be generating duplicate content each time they want to create a gallery. What is your recommendation? Thank you!0 -
Pagination causing duplicate content problems
Hi The pagination on our website www.offonhols.com is causing duplicate content problems. Is the best solution adding add rel=”prev” / “next# to the hrefs As now the pagination links at the bottom of the page are just http://offonhols.com/default.aspx?dp=1
Intermediate & Advanced SEO | | offonhols
http://offonhols.com/default.aspx?dp=2
http://offonhols.com/default.aspx?dp=3
etc0 -
Duplicate peices of content on multiple pages - is this a problem
I have a couple of WordPress clients with the same issue but caused in different ways: 1. The Slash WP theme which is a portfolio theme, involves setting up multiple excerpts of content that can then be added to multiple pages. So although the pages themselves are not identical, there are the same snippets of content appearing on multiple pages 2. A WP blog which has multiple categories and/or tags for each post, effectively ends up with many pages showing duplicate excerpts of content. My view has always been to noindex these pages (via Yoast), but was advised recently not to. In both these cases, even though the pages are not identical, do you think this duplicate content across multiple pages could cause an issue? All thoughts appreciated
Intermediate & Advanced SEO | | Chammy0 -
Duplicate Content Question
My understanding of duplicate content is that if two pages are identical, Google selects one for it's results... I have a client that is literally sharing content real-time with a partner...the page content is identical for both sites, and if you update one page, teh otehr is updated automatically. Obviously this is a clear cut case for canonical link tags, but I'm cuious about something: Both sites seem to show up in search results but for different keywords...I would think one domain would simply win out over the other, but Google seems to show both sites in results. Any idea why? Also, could this duplicate content issue be hurting visibility for both sites? In other words, can I expect a boost in rankings with the canonical tags in place? Or will rankings remain the same?
Intermediate & Advanced SEO | | AmyLB0 -
How to prevent duplicate content within this complex website?
I have a complex SEO issue I've been wrestling with and I'd appreciate your views on this very much. I have a sports website and most visitors are looking for the games that are played in the current week (I've studied this - it's true). We're creating a new website from scratch and I want to do this is as best as possible. We want to use the most elegant and best way to do this. We do not want to use work-arounds such as iframes, hiding text using AJAX etc. We need a solid solution for both users and search engines. Therefor I have written down three options: Using a canonical URL; Using 301-redirects; Using 302-redirects. Introduction The page 'website.com/competition/season/week-8' shows the soccer games that are played in game week 8 of the season. The next week users are interested in the games that are played in that week (game week 9). So the content a visitor is interested in, is constantly shifting because of the way competitions and tournaments are organized. After a season the same goes for the season of course. The website we're building has the following structure: Competition (e.g. 'premier league') Season (e.g. '2011-2012') Playweek (e.g. 'week 8') Game (e.g. 'Manchester United - Arsenal') This is the most logical structure one can think of. This is what users expect. Now we're facing the following challenge: when a user goes to http://website.com/premier-league he expects to see a) the games that are played in the current week and b) the current standings. When someone goes to http://website.com/premier-league/2011-2012/ he expects to see the same: the games that are played in the current week and the current standings. When someone goes to http://website.com/premier-league/2011-2012/week-8/ he expects to the same: the games that are played in the current week and the current standings. So essentially there's three places, within every active season within a competition, within the website where logically the same information has to be shown. To deal with this from a UX and SEO perspective, we have the following options: Option A - Use a canonical URL Using a canonical URL could solve this problem. You could use a canonical URL from the current week page and the Season page to the competition page: So: the page on 'website.com/$competition/$season/playweek-8' would have a canonical tag that points to 'website.com/$competition/' the page on 'website.com/$competition/$season/' would have a canonical tag that points to 'website.com/$competition/' The next week however, you want to have the canonical tag on 'website.com/$competition/$season/playweek-9' and the canonical tag from 'website.com/$competition/$season/playweek-8' should be removed. So then you have: the page on 'website.com/$competition/$season/playweek-9' would have a canonical tag that points to 'website.com/$competition/' the page on 'website.com/$competition/$season/' would still have a canonical tag that points to 'website.com/$competition/' In essence the canonical tag is constantly traveling through the pages. Advantages: UX: for a user this is a very neat solution. Wherever a user goes, he sees the information he expects. So that's all good. SEO: the search engines get very clear guidelines as to how the website functions and we prevent duplicate content. Disavantages: I have some concerns regarding the weekly changing canonical tag from a SEO perspective. Every week, within every competition the canonical tags are updated. How often do Search Engines update their index for canonical tags? I mean, say it takes a Search Engine a week to visit a page, crawl a page and process a canonical tag correctly, then the Search Engines will be a week behind on figuring out the actual structure of the hierarchy. On top of that: what do the changing canonical URLs to the 'quality' of the website? In theory this should be working all but I have some reservations on this. If there is a canonical tag from 'website.com/$competition/$season/week-8', what does this do to the indexation and ranking of it's subpages (the actual match pages) Option B - Using 301-redirects Using 301-redirects essentially the user and the Search Engine are treated the same. When the Season page or competition page are requested both are redirected to game week page. The same applies here as applies for the canonical URL: every week there are changes in the redirects. So in game week 8: the page on 'website.com/$competition/' would have a 301-redirect that points to 'website.com/$competition/$season/week-8' the page on 'website.com/$competition/$season' would have a 301-redirect that points to 'website.com/$competition/$season/week-8' A week goes by, so then you have: the page on 'website.com/$competition/' would have a 301-redirect that points to 'website.com/$competition/$season/week-9' the page on 'website.com/$competition/$season' would have a 301-redirect that points to 'website.com/$competition/$season/week-9' Advantages There is no loss of link authority. Disadvantages Before a playweek starts the playweek in question can be indexed. However, in the current playweek the playweek page 301-redirects to the competition page. After that week the page's 301-redirect is removed again and it's indexable. What do all the (changing) 301-redirects do to the overall quality of the website for Search Engines (and users)? Option C - Using 302-redirects Most SEO's will refrain from using 302-redirects. However, 302-redirect can be put to good use: for serving a temporary redirect. Within my website there's the content that's most important to the users (and therefor search engines) is constantly moving. In most cases after a week a different piece of the website is most interesting for a user. So let's take our example above. We're in playweek 8. If you want 'website.com/$competition/' to be redirecting to 'website.com/$competition/$season/week-8/' you can use a 302-redirect. Because the redirect is temporary The next week the 302-redirect on 'website.com/$competition/' will be adjusted. It'll be pointing to 'website.com/$competition/$season/week-9'. Advantages We're putting the 302-redirect to its actual use. The pages that 302-redirect (for instance 'website.com/$competition' and 'website.com/$competition/$season') will remain indexed. Disadvantages Not quite sure how Google will handle this, they're not very clear on how they exactly handle a 302-redirect and in which cases a 302-redirect might be useful. In most cases they advise webmasters not to use it. I'd very much like your opinion on this. Thanks in advance guys and galls!
Intermediate & Advanced SEO | | StevenvanVessum0 -
Duplicate content - canonical vs link to original and Flash duplication
Here's the situation for the website in question: The company produces printed publications which go online as a page turning Flash version, and as a separate HTML version. To complicate matters, some of the articles from the publications get added to a separate news section of the website. We want to promote the news section of the site over the publications section. If we were to forget the Flash version completely, would you: a) add a canonical in the publication version pointing to the version in the news section? b) add a link in the footer of the publication version pointing to the version in the news section? c) both of the above? d) something else? What if we add the Flash version into the mix? As Flash still isn't as crawlable as HTML should we noindex them? Is HTML content duplicated in Flash as big an issue as HTML to HTML duplication?
Intermediate & Advanced SEO | | Alex-Harford0 -
How are they avoiding duplicate content?
One of the largest stores in USA for soccer runs a number of whitelabel sites for major partners such as Fox and ESPN. However, the effect of this is that they are creating duplicate content for their products (and even the overall site structure is very similar). Take a look at: http://www.worldsoccershop.com/23147.html http://www.foxsoccershop.com/23147.html http://www.soccernetstore.com/23147.html You can see that practically everything is the same including: product URL product title product description My question is, why is Google not classing this as duplicate content? Have they coded for it in a certain way or is there something I'm missing which is helping them achieve rankings for all sites?
Intermediate & Advanced SEO | | ukss19840