Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing duplicate content
-
Due to URL changes and parameters on our ecommerce sites, we have a massive amount of duplicate pages indexed by google, sometimes up to 5 duplicate pages with different URLs.
1. We've instituted canonical tags site wide.
2. We are using the parameters function in Webmaster Tools.
3. We are using 301 redirects on all of the obsolete URLs
4. I have had many of the pages fetched so that Google can see and index the 301s and canonicals.
5. I created HTML sitemaps with the duplicate URLs, and had Google fetch and index the sitemap so that the dupes would get crawled and deindexed.
None of these seems to be terribly effective. Google is indexing pages with parameters in spite of the parameter (clicksource) being called out in GWT. Pages with obsolete URLs are indexed in spite of them having 301 redirects. Google also appears to be ignoring many of our canonical tags as well, despite the pages being identical.
Any ideas on how to clean up the mess?
-
Where this is appearing the most is on cross domain canonicals. We have duplicate content across 2 websites, and we've canonicaled some pages from Site A to Site B, and some from Site B to Site A. In theory, pages that were canonicaled to the other domain should be deindexed. When I run a rankings report, I see pages for the wrong domain ranking, a month later. They are pages with parameters, or old URLs that we've changed. It's like a game of whack a mole. Every time we get a page deindexed, a duplicate with a different parameter takes its place. And this is in spite of calling out these parameters in GWT.
What I imagine is happening is that we have several URLs for the same page indexed. When Google crawls our site, it is correctly canonicaling the page it crawls. In the rankings, however, Google is probably pulling a duplicate page out of its index, and ranking it without crawling it. If it was crawling it, Google would see the canonical tag, and not rank it. So we have an ongoing battle to get Google to crawl the page it just pulled out of its index to see the the canonical tag.
The reason for all this is that when a page cross domain canonicals correctly, the rankings for the duplicate page on the other site goes up dramatically. As long as Google keeps ranking the wrong pages, we don't get the rankings bump on the other site.
-
Are you basing this on a site: search? It's fairly common for URLs to appear in a site: search that otherwise will not appear for any actual searches. Are the undesirable versions of the URLs getting any search traffic?
-
Yes, as Patrick said, surprisingly often something like this is a result of a simple oversight because we have been looking at the same code over and over...
Do you have access to Screaming Frog? You could crawl your site and see whether redirects/canonicals are behaving as you expected.
Have you taken a look at the html of one of the incorrectly indexed pages when it is loaded in your browser? Can you see the canonical? If you try going to a redirected page, does it redirect? [I know--way to obvious, but sometimes it is good to start at the beginning again when we can't root out an issue.]
Another culprit in these cases can be internal links. Do you link internally using any of the undesirable URLs? That can send a message to Google that those URLs are still in play. Again, you can use Screaming Frog to find those strings.
-
It sounds like part of the problem may be the sitemaps you're sending. By including duplicates in a sitemap, you're basically telling Google that each version of the page is valid. I would remove them and resubmit a sitemap with only the canonical versions you want indexed and see if that helps.
-
Hi there
Are you sure you are using all of the tools above properly? Not saying you're not but people make mistakes and it's just something to look into.
When did you implement all of the changes? Was it recently or was it a long time ago?
How is your organic traffic and rankings? Did you check if you have a manual action at all?
Let me know - thanks!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
[E-commerce] Duplicate content due to color variations (canonical/indexing)
Hello, We currently have a lot of color variations on multiple products with almost the same content. Even with our canonicals being set, Moz's crawling tool seems to flag them as duplicate content. What we have done so far: Choosing the best-selling color variation (our "master product") Adding a rel="canonical" to every variation (with our "master product" as the canonical URL) In my opinion, it should be enough to address this issue. However, being given the fact that it's flagged as duplicate by Moz, I was wondering if there is something else we should do? Should we add a "noindex,follow" to our child products and "index,follow" to our master product? (sounds to me like such a heavy change) Thank you in advance
Intermediate & Advanced SEO | Apr 15, 2014, 4:53 AM | EasyLounge0 -
How to resolve duplicate content issues when using Geo-targeted Subfolders to seperate US and CAN
A client of mine is about to launch into the USA market (currently only operating in Canada) and they are trying to find the best way to geo-target. We recommended they go with the geo-targeted subfolder approach (___.com and ___.com/ca). I'm looking for any ways to assist in not getting these pages flagged for duplicate content. Your help is greatly appreciated. Thanks!
Intermediate & Advanced SEO | Jan 9, 2014, 3:01 PM | jyoung2220 -
Duplicate Content www vs. non-www and best practices
I have a customer who had prior help on his website and I noticed a 301 redirect in his .htaccess Rule for duplicate content removal : www.domain.com vs domain.com RewriteCond %{HTTP_HOST} ^MY-CUSTOMER-SITE.com [NC]
Intermediate & Advanced SEO | Oct 10, 2013, 8:14 PM | EnvoyWeb
RewriteRule (.*) http://www.MY-CUSTOMER-SITE.com/$1 [R=301,L,NC] The result of this rule is that i type MY-CUSTOMER-SITE.com in the browser and it redirects to www.MY-CUSTOMER-SITE.com I wonder if this is causing issues in SERPS. If I have some inbound links pointing to www.MY-CUSTOMER-SITE.com and some pointing to MY-CUSTOMER-SITE.com, I would think that this rewrite isn't necessary as it would seem that Googlebot is smart enough to know that these aren't two sites. -----Can you comment on whether this is a best practice for all domains?
-----I've run a report for backlinks. If my thought is true that there are some pointing to www.www.MY-CUSTOMER-SITE.com and some to the www.MY-CUSTOMER-SITE.com, is there any value in addressing this?0 -
Best practice for duplicate website content: same root domain name but different extension
Hi there I have a new client who has two websites: http://www.bayofislandsteambuilding.co.nz
Intermediate & Advanced SEO | Jul 23, 2013, 11:28 PM | turnbullholdingsltd
http://www.bayofislandsteambuilding.org.nz They are the same in every regard apart from the domain extension (.co.nz & .org.nz) which is likely to be causing them issues with Google ranking given the huge amount of duplicate content. What is the best practice approach to fixing this? Normally, if I was starting from scratch, I would set one of the extensions as an alias which redirects to the main domain. Thanks in advance. Laurie0 -
Is an RSS feed considered duplicate content?
I have a large client with satellite sites. The large site produces many news articles and they want to put an RSS feed on the satellite sites that will display the articles from the large site. My question is, will the rss feeds on the satellite sites be considered duplicate content? If yes, do you have a suggestion to utilize the data from the large site without being penalized? If no, do you have suggestions on what tags should be used on the satellite pages? EX: wrapped in tags? THANKS for the help. Darlene
Intermediate & Advanced SEO | Feb 12, 2013, 8:14 AM | gXeSEO0 -
Duplicate Content on Wordpress b/c of Pagination
On my recent crawl, there were a great many duplicate content penalties. The site is http://dailyfantasybaseball.org. The issue is: There's only one post per page. Therefore, because of wordpress's (or genesis's) pagination, a page gets created for every post, thereby leaving basically every piece of content i write as a duplicate. I feel like the engines should be smart enough to figure out what's going on, but if not, I will get hammered. What should I do moving forward? Thanks!
Intermediate & Advanced SEO | Jun 28, 2012, 1:40 PM | Byron_W0 -
Can PDF be seen as duplicate content? If so, how to prevent it?
I see no reason why PDF couldn't be considered duplicate content but I haven't seen any threads about it. We publish loads of product documentation provided by manufacturers as well as White Papers and Case Studies. These give our customers and prospects a better idea off our solutions and help them along their buying process. However, I'm not sure if it would be better to make them non-indexable to prevent duplicate content issues. Clearly we would prefer a solutions where we benefit from to keywords in the documents. Any one has insight on how to deal with PDF provided by third parties? Thanks in advance.
Intermediate & Advanced SEO | Apr 10, 2015, 2:38 AM | Gestisoft-Qc1 -
Duplicate Content | eBay
My client is generating templates for his eBay template based on content he has on his eCommerce platform. I'm 100% sure this will cause duplicate content issues. My question is this.. and I'm not sure where eBay policy stands with this but adding the canonical tag to the template.. will this work if it's coming from a different page i.e. eBay? Update: I'm not finding any information regarding this on the eBay policy's: http://ocs.ebay.com/ws/eBayISAPI.dll?CustomerSupport&action=0&searchstring=canonical So it does look like I can have rel="canonical" tag in custom eBay templates but I'm concern this can be considered: "cheating" since rel="canonical is actually a 301 but as this says: http://googlewebmastercentral.blogspot.com/2009/12/handling-legitimate-cross-domain.html it's legitimately duplicate content. The question is now: should I add it or not? UPDATE seems eBay templates are embedded in a iframe but the snap shot on google actually shows the template. This makes me wonder how they are handling iframes now. looking at http://www.webmaster-toolkit.com/search-engine-simulator.shtml does shows the content inside the iframe. Interesting. Anyone else have feedback?
Intermediate & Advanced SEO | Dec 9, 2014, 11:02 AM | joseph.chambers1