Why is Google Reporting big increase in duplicate content after Canonicalization update?
-
Our web hosting company recently applied a update to our site that should have rectified Canonicalized URLs. Webmaster tools had been reporting duplicate content on pages that had a query string on the end.
After the update there has been a massive jump in Webmaster tools reporting now over 800 pages of duplicate content, Up from about 100 prior to the update plus it reporting some very odd pages (see attached image)
They claim they have implement Canonicalization in line with Google Panda & Penguin, but surely something is not right here and it's going to cause us a big problem with traffic.
Can anyone shed any light on the situation???
-
Hi All,
I finally got to the bottom of the problem and it is that they have not applied canonicalization across the site, only to certain pages which is not my understanding when they implemented the update a few weeks back.
So they are preparing a hot fix as part of a service pack to our site which will rectify this issue and apply canonicalization to all pages that contain query strings. This should clear that problem up once and for all.
Thank you both for your input, a great help.
-
Hi Deb... I have nice blogpost from seomoz blog for you written by Lindsey in which she has explained it very nicely about it.
http://www.seomoz.org/blog/serious-robotstxt-misuse-high-impact-solutions
In this post check the example of digg.com. Digg.com has blocked "submit" in robots.txt but still Google has indexed URLs. Check screenshot in the Blog post. Hope this help.
-
_Those URLs will be crawled by Google, but will not be Indexed. And that being said, there will be no more duplicate content issue. I hope I have made myself clear over here. _
-
Deb, even if you block those URLs in Robots.txt, Google will going to index those URLs because those URLs are interlink with website. The best way is to put canonical tag so that you will get inter linking benefits as well.
-
Fraser,
Till now they have not implemented Canonicalization in your website. After Canonicalization implementation also you will duplication errors in your webmaster account but it will not harm your ranking. Because Canonicalization helps Google in selecting the page from multiple version of similar page that has to displayed in SERP. In above example, First URL is the original URL but the second URL has some parameters in URLs so your preferred version of URL should be first one. After proper Canonicalization implementation you will only see URLs that you have submitted in your sitemap via Google Webmaster Tool.
And about two webmaster codes, I don't think we have setup two separate accounts, you can provide view or admin access from your webmaster account to them.
-
Either you will have to block these pages via Google Webmaster Tools by Using URL parameter or else you need to block them via robots.txt file like this –
To block this URL: http://www.towelsrus.co.uk/towels/baby-towels/prodlist_ct493.htm?dir=1&size=100
You need to use this tag in robots.txt file – Disallow: /.htm?dir=
-
Hi,
Here are a couple of examples for you.
Duplication issue is showing because of below type of URLs:
http://www.towelsrus.co.uk/towels/baby-towels/prodlist_ct493.htm
http://www.towelsrus.co.uk/towels/baby-towels/prodlist_ct493.htm?dir=1&size=100 ```
-
The Canonical URL updates were supposed to have been implement some weeks back.
I have asked why there are 2 webmaster tools codes, I expect this is my account plus they have one to monitor things there end.
Query string parameters have been setup, but I am unsure if they are configured correctly as this is all a bit new to me and i am in there hands to deal with this really.
The URLs without query strings are submitted to Webmaster tools via site maps and they are the URLs we want indexed.
-
Can you please share the URL and some example pages where the problem of duplicate content is appearing?
-
Hi Fraser,
Are you talking about towelsrus.co.uk ? I didn't find any canonical tag in any source page of your website. Are they sure about implementation ? or they will implement it in future. And one more interesting point, why there are two webmaster code in your website's source page. Below are those to webmaster codes:
<meta name="<a class="attribute-value">google-site-verification</a>" content="<a class="attribute-value">BJ6cDrRRB2iS4fMx2zkZTouKTPTpECs2tw-3OAvIgh4</a>" />
<meta name="<a class="attribute-value">google-site-verification</a>" content="<a class="attribute-value">SjaHRLJh00aeQY9xJ81lorL_07UXcCDFgDFgG8lBqCk</a>" />
Have you blocked querystring parameters in "URL parameters" in Google webmaster
Tools ?
Duplication issue is showing because of below type of URLs:
http://www.towelsrus.co.uk/towels/baby-towels/prodlist_ct493.htm
http://www.towelsrus.co.uk/towels/baby-towels/prodlist_ct493.htm?dir=1&size=100
No canonical tag found on above URLs as well.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Causing Duplicate Content
I use Opencart and have found that a lot of my duplicate content (mainly from Products) which is caused by the Search function. Is there a simple way to tell Google to ignore the Search function pathway? Or is this particular action not recommended? Here are two examples: http://thespacecollective.com/index.php?route=product/search&tag=cloth http://thespacecollective.com/index.php?route=product/search
Intermediate & Advanced SEO | | moon-boots0 -
I think Google Analytics is mis-reporting organic landing pages.
I have multiple clients whose Google Analytics accounts are showing me that some of the top performing organic landing pages (in terms of highest conversion rates) look like this: /cart.php /quote /checkout.php /finishorder.php /login.php In some cases, these pages are blocked by Robots.txt. In other cases they are not even indexed at all in Google. These pages are clearly part of the conversion process. A couple of them are links sent out when a cart is abandoned, etc. - is it possible they actually came in organically but then re-entered via one of these links which is what Google is calling the organic landing page? How is it possible that these pages would be the top performing landing pages for organic visitors?
Intermediate & Advanced SEO | | FPD_NYC0 -
Galleries and duplicate content
Hi! I am now studing a website, and I have detected that they are maybe generating duplicate content because of image galleries. When they want to show details of some of their products, they link to a gallery url
Intermediate & Advanced SEO | | teconsite
something like this www.domain.com/en/gallery/slide/101 where you can find the logotype, a full image and a small description. There is a next and a prev button over the slider. The next goes to the next picture www.domain.com/en/gallery/slide/102 and so on. But the next picture is in a different URL!!!! The problem is that they are generating lots of urls with very thin content inside.
The pictures have very good resolution, and they are perfect for google images searchers, so we don't want to use the noindex tag. I thought that maybe it would be best to work with a single url with the whole gallery inside it (for example, the 6 pictures working with a slideshow in the same url ), but as the pictures are very big, the page weight would be greater than 7 Mb. If we keep the pictures working that way (different urls per picture), we will be generating duplicate content each time they want to create a gallery. What is your recommendation? Thank you!0 -
Does Google see this as duplicate content?
I'm working on a site that has too many pages in Google's index as shown in a simple count via a site search (example): site:http://www.mozquestionexample.com I ended up getting a full list of these pages and it shows pages that have been supposedly excluded from the index via GWT url parameters and/or canonicalization For instance, the list of indexed pages shows: 1. http://www.mozquestionexample.com/cool-stuff 2. http://www.mozquestionexample.com/cool-stuff?page=2 3. http://www.mozquestionexample.com?page=3 4. http://www.mozquestionexample.com?mq_source=q-and-a 5. http://www.mozquestionexample.com?type=productss&sort=1date Example #1 above is the one true page for search and the one that all the canonicals reference. Examples #2 and #3 shouldn't be in the index because the canonical points to url #1. Example #4 shouldn't be in the index, because it's just a source code that, again doesn't change the page and the canonical points to #1. Example #5 shouldn't be in the index because it's excluded in parameters as not affecting page content and the canonical is in place. Should I worry about these multiple urls for the same page and if so, what should I do about it? Thanks... Darcy
Intermediate & Advanced SEO | | 945010 -
Duplicate content on subdomains
Hi All, The structure of the main website goes by http://abc.com/state/city/publication - We have a partnership with public libraries to give local users access to the publication content for free. We have over 100 subdomains (each for an specific library) that have duplicate content issues with the root domain, Most subdomains have very high page authority (the main public library and other local .gov websites have links to this subdomains).Currently this subdomains are not index due to the robots text file excluding bots from crawling. I am in the process of setting canonical tags on each subdomain and open the robots text file. Should I set the canonical tag on each subdomain (homepage) to the root domain version or to the specific city within the root domain? Example 1:
Intermediate & Advanced SEO | | NewspaperArchive
Option 1: http://covina.abc.com/ = Canonical Tag = http://abc.com/us/california/covina/
Option 2: http://covina.abc.com/ = Canonical Tag = http://abc.com/ Example 2:
Option 1: http://galveston.abc.com/ = Canonical Tag = http://abc.com/us/texas/galveston/
Option 2: http://galveston.abc.com = Canonical Tag = http://abc.com/ Example 3:
Option 1: http://hutchnews.abc.com/ = Canonical Tag = http://abc.com/us/kansas/hutchinson/
Option 2: http://hutchnews.abc.com/ = Canonical Tag = http://abc.com/ I believe it makes more sense to set the canonical tag to the corresponding city (option 1), but wondering if setting the canonical tag to the root domain will pass "some link juice" to the root domain and it will be more beneficial. Thanks!0 -
Duplicate content
I run about 10 sites and most of them seemed to fall foul of the penguin update and even though I have never sought inorganic links I have been frantically searching for a link based answer since April. However since asking a question here I have been pointed in another direction by one of your contributors. It seems At least 6 of my sites have duplicate content issues. If you search Google for "We have selected nearly 200 pictures of short haircuts and hair styles in 16 galleries" which is the first bit of text from the site short-hairstyles.com about 30000 results appear. I don't know where they're from nor why anyone would want to do this. I presume its automated since there is so much of it. I have decided to redo the content. So I guess (hope) at some point in the future the duplicate nature will be flushed from Google's index? But how do I prevent it happening again? It's impractical to redo the content every month or so. For example if you search for "This facility is written in Flash® to use it you need to have Flash® installed." from another of my sites that I coincidently uploaded a new page to a couple of days ago, only the duplicate content shows up not my original site. So whoever is doing this is finding new stuff on my site and getting it indexed on google before even google sees it on my site! Thanks, Ian
Intermediate & Advanced SEO | | jwdl0 -
Does onsite content updates have an effect on SERPs?
Hi, Some might see this as a very (VERY) basic question but wanted to drill down into it anyway. Onsite content: Lets say you have a service website and attached to it is a blog, the blog gets updated every other day with 500 words of relevant content, containing anchor text links back to a relevant page on the main website. Forget about social signals and natural links being built from the quality content, will adding the content with anchor text links be more beneficial then using that content to generate links through guest blogging? 10 relevant articles onsite with anchor links, or 10 guest posts on other websites? I guess some might say 5 onsite and 5 guest posts.
Intermediate & Advanced SEO | | activitysuper0 -
Duplicate content question? thanks
Hi, Im my time as an SEO I have never come across the following two scenarios, I am an advocate of using unique content, therefore always suggest and in cases demand that all content is written or re-written. This is the scenarios I am facing right now. For Example we have www.abc.com (has over 200 original recipes) and then we have www.xyz.com with the recipes but they are translated into another language as they are targeting different audiences, will Google penalize for duplicate content? The other issue is that the client got the recipes from www.abc.com (that have been translated) and use them in www.xyz.com aswell, both sites owned by the same company so its not pleagurism they have legal rights but I am not sure how Google will see it and if it will penalize the sites. Thanks!
Intermediate & Advanced SEO | | M_81