Removing duplicate content
-
Due to URL changes and parameters on our ecommerce sites, we have a massive amount of duplicate pages indexed by google, sometimes up to 5 duplicate pages with different URLs.
1. We've instituted canonical tags site wide.
2. We are using the parameters function in Webmaster Tools.
3. We are using 301 redirects on all of the obsolete URLs
4. I have had many of the pages fetched so that Google can see and index the 301s and canonicals.
5. I created HTML sitemaps with the duplicate URLs, and had Google fetch and index the sitemap so that the dupes would get crawled and deindexed.
None of these seems to be terribly effective. Google is indexing pages with parameters in spite of the parameter (clicksource) being called out in GWT. Pages with obsolete URLs are indexed in spite of them having 301 redirects. Google also appears to be ignoring many of our canonical tags as well, despite the pages being identical.
Any ideas on how to clean up the mess?
-
Where this is appearing the most is on cross domain canonicals. We have duplicate content across 2 websites, and we've canonicaled some pages from Site A to Site B, and some from Site B to Site A. In theory, pages that were canonicaled to the other domain should be deindexed. When I run a rankings report, I see pages for the wrong domain ranking, a month later. They are pages with parameters, or old URLs that we've changed. It's like a game of whack a mole. Every time we get a page deindexed, a duplicate with a different parameter takes its place. And this is in spite of calling out these parameters in GWT.
What I imagine is happening is that we have several URLs for the same page indexed. When Google crawls our site, it is correctly canonicaling the page it crawls. In the rankings, however, Google is probably pulling a duplicate page out of its index, and ranking it without crawling it. If it was crawling it, Google would see the canonical tag, and not rank it. So we have an ongoing battle to get Google to crawl the page it just pulled out of its index to see the the canonical tag.
The reason for all this is that when a page cross domain canonicals correctly, the rankings for the duplicate page on the other site goes up dramatically. As long as Google keeps ranking the wrong pages, we don't get the rankings bump on the other site.
-
Are you basing this on a site: search? It's fairly common for URLs to appear in a site: search that otherwise will not appear for any actual searches. Are the undesirable versions of the URLs getting any search traffic?
-
Yes, as Patrick said, surprisingly often something like this is a result of a simple oversight because we have been looking at the same code over and over...
Do you have access to Screaming Frog? You could crawl your site and see whether redirects/canonicals are behaving as you expected.
Have you taken a look at the html of one of the incorrectly indexed pages when it is loaded in your browser? Can you see the canonical? If you try going to a redirected page, does it redirect? [I know--way to obvious, but sometimes it is good to start at the beginning again when we can't root out an issue.]
Another culprit in these cases can be internal links. Do you link internally using any of the undesirable URLs? That can send a message to Google that those URLs are still in play. Again, you can use Screaming Frog to find those strings.
-
It sounds like part of the problem may be the sitemaps you're sending. By including duplicates in a sitemap, you're basically telling Google that each version of the page is valid. I would remove them and resubmit a sitemap with only the canonical versions you want indexed and see if that helps.
-
Hi there
Are you sure you are using all of the tools above properly? Not saying you're not but people make mistakes and it's just something to look into.
When did you implement all of the changes? Was it recently or was it a long time ago?
How is your organic traffic and rankings? Did you check if you have a manual action at all?
Let me know - thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Big retailers and duplicate content
Hello there! I was wondering if you guys have experience with big retailers sites fetching data via API (PDP content etc.) from another domain which is also sharing the same data with other multiple sites. If each retailer has thousands on products, optimizing PDP content (even in batches) is quite of a cumbersome task and rel="canonical" pointing to original domain will dilute the value. How would you approach this type of scenario? Looking forward to read your suggestions/experiences Thanks a lot! Best Sara
Intermediate & Advanced SEO | | SaraCoppola1 -
Implications of posting duplicate blog content on external domains?
I've had a few questions around the blog content on our site. Some of our vendors and partners have expressed interest in posting some of that content on their domains. What are the implications if we were to post copies of our blog posts on other domains? Should this be avoided or are there circumstances that this type of program would make sense?
Intermediate & Advanced SEO | | Visier1 -
Duplicate Content... Really?
Hi all, My site is www.actronics.eu Moz reports virtually every product page as duplicate content, flagged as HIGH PRIORITY!. I know why. Moz classes a page as duplicate if >95% content/code similar. There's very little I can do about this as although our products are different, the content is very similar, albeit a few part numbers and vehicle make/model. Here's an example:
Intermediate & Advanced SEO | | seowoody
http://www.actronics.eu/en/shop/audi-a4-8d-b5-1994-2000-abs-ecu-en/bosch-5-3
http://www.actronics.eu/en/shop/bmw-3-series-e36-1990-1998-abs-ecu-en/ate-34-51 Now, multiply this by ~2,000 products X 7 different languages and you'll see we have a big dupe content issue (according to Moz's Crawl Diagnostics report). I say "according to Moz..." as I do not know if this is actually an issue for Google? 90% of our products pages rank, albeit some much better than others? So what is the solution? We're not trying to deceive Google in any way so it would seem unfair to be hit with a dupe content penalty, this is a legit dilemma where our product differ by as little as a part number. One ugly solution would be to remove header / sidebar / footer on our product pages as I've demonstrated here - http://woodberry.me.uk/test-page2-minimal-v2.html since this removes A LOT of page bloat (code) and would bring the page difference down to 80% duplicate.
(This is the tool I'm using for checking http://www.webconfs.com/similar-page-checker.php) Other "prettier" solutions would greatly appreciated. I look forward to hearing your thoughts. Thanks,
Woody 🙂1 -
How do I use public content without being penalized for duplication?
The NHTSA produces a list of all recalls for automobiles. In their "terms of use" it states that the information can be copied. I want to add that to our site, so there is an up-to-date list for our audience to see. However, I'm just copying and pasting. I'm allowed to according to NHTSA, but google will probably flag it right? Is there a way to do this without being penalized? Thanks, Ruben
Intermediate & Advanced SEO | | KempRugeLawGroup1 -
Duplicate Content
Hi, So I have my great content (that contains a link to our site) that I want to distribute to high quality relevant sites in my niche as part of a link building campaign. Can I distribute this to lots of sites? The reason I ask is that those sites will then have duplicate content to all the other sites I distribute the content to won;t they? I this duplication bad for them and\or us? Thanks
Intermediate & Advanced SEO | | Studio330 -
Diagnosing duplicate content issues
We recently made some updates to our site, one of which involved launching a bunch of new pages. Shortly afterwards we saw a significant drop in organic traffic. Some of the new pages list similar content as previously existed on our site, but in different orders. So our question is, what's the best way to diagnose whether this was the cause of our ranking drop? My current thought is to block the new directories via robots.txt for a couple days and see if traffic improves. Is this a good approach? Any other suggestions?
Intermediate & Advanced SEO | | jamesti0 -
"Duplicate" Page Titles and Content
Hi All, This is a rather lengthy one, so please bear with me! SEOmoz has recently crawled 10,000 webpages from my site, FrenchEntree, and has returned 8,000 errors of duplicate page content. The main reason I have so many is because of the directories I have on site. The site is broken down into 2 levels of hierachy. "Weblets" and "Articles". A weblet is a landing page, and articles are created within these weblets. Weblets can hold any number of articles - 0 - 1,000,000 (in theory) and an article must be assigned to a weblet in order for it to work. Here's how it roughly looks in URL form - http://www.mysite.com/[weblet]/[articleID]/ Now; our directory results pages are weblets with standard content in the left and right hand columns, but the information in the middle column is pulled in from our directory database following a user query. This happens by adding the query string to the end of the URL. We have 3 main directory databases, but perhaps around 100 weblets promoting various 'canned' queries that users may want to navigate straight into. However, any one of the 100 directory promoting weblets could return any query from the parent directory database with the correct query string. The problem with this method (as pointed out by the 8,000 errors) is that each possible permutation of search is considered to be it's own URL, and therefore, it's own page. The example I will use is the first alphabetically. "Activity Holidays in France": http://www.frenchentree.com/activity-holidays-france/ - This link shows you a results weblet without the query at the end, and therefore only displays the left and right hand columns as populated. http://www.frenchentree.com/activity-holidays-france/home.asp?CategoryFilter= - This link shows you the same weblet with the an 'open' query on the end. I.e. display all results from this database. Listings are displayed in the middle. There are around 500 different URL permutations for this weblet alone when you take into account the various categories and cities a user may want to search in. What I'd like to do is to prevent SEOmoz (and therefore search engines) from counting each individual query permutation as a unique page, without harming the visibility that the directory results received in SERPs. We often appear in the top 5 for quite competitive keywords and we'd like it to stay that way. I also wouldn't want the search engine results to only display (and therefore direct the user through to) an empty weblet by some sort of robot exclusion or canonical classification. Does anyone have any advice on how best to remove the "duplication" problem, whilst keeping the search visibility? All advice welcome. Thanks Matt
Intermediate & Advanced SEO | | Horizon0 -
Removing Duplicate Page Content
Since joining SEOMOZ four weeks ago I've been busy tweaking our site, a magento eCommerce store, and have successfully removed a significant portion of the errors. Now I need to remove/hide duplicate pages from the search engines and I'm wondering what is the best way to attack this? Can I solve this in one central location, or do I need to do something in the Google & Bing webmaster tools? Here is a list of duplicate content http://www.unitedbmwonline.com/?dir=asc&mode=grid&order=name http://www.unitedbmwonline.com/?dir=asc&mode=list&order=name
Intermediate & Advanced SEO | | SteveMaguire
http://www.unitedbmwonline.com/?dir=asc&order=name http://www.unitedbmwonline.com/?dir=desc&mode=grid&order=name http://www.unitedbmwonline.com/?dir=desc&mode=list&order=name http://www.unitedbmwonline.com/?dir=desc&order=name http://www.unitedbmwonline.com/?mode=grid http://www.unitedbmwonline.com/?mode=list Thanks in advance, Steve0