Roger keeps telling me my canonical pages are duplicates
-
I've got a site that's brand spanking new that I'm trying to get the error count down to zero on, and I'm basically there except for this odd problem. Roger got into the site like a naughty puppy a bit too early, before I'd put the canonical tags in, so there were a couple thousand 'duplicate content' errors. I put canonicals in (programmatically, so they appear on every page) and waited a week and sure enough 99% of them went away.
However, there's about 50 that are still lingering, and I'm not sure why they're being detected as such. It's an ecommerce site, and the duplicates are being detected on the product page, but why these 50? (there's hundreds of other products that aren't being detected). The URLs that are 'duplicates' look like this according to the crawl report:
http://www.site.com/Product-1.aspx
http://www.site.com/product-1.aspx
And so on. Canonicals are in place, and have been for weeks, and as I said there's hundreds of other pages just like this not having this problem, so I'm finding it odd that these ones won't go away.
All I can think of is that Roger is somehow caching stuff from previous crawls? According to the crawl report these duplicates were discovered '1 day ago' but that simply doesn't make sense. It's not a matter of messing up one or two pages on my part either; we made this site to be dynamically generated, and all of the SEO stuff (canonical, etc.) is applied to every single page regardless of what's on it.
If anyone can give some insight I'd appreciate it!
-
ThompsonPaul -
Thanks for that info, it pretty much nails exactly what I had discovered independently. This is an IIS7/Win2k8R2 install so luckily the rewriting is a bit easier than in previous iterations. The whole platform is hand coded by us (after the 10th ecommerce site or so you can generally do them in your sleep) so I don't have to worry about CMS implementation and the like, and luckily we already knew that about the spaces so they simply aren't allowed in the filenames. I'm in the middle of making a regex right now that is going to down-case anything in an href="" or src="" tag that will hopefully handle everything on the site side user-created or not. Will consider what to do in regards to external links a bit down the road I think.
-
Valery, you're definitely going to want to normalize your URLs to lowercase. This is a quirk of IIS that it actually respects case in URLs and will consider different case URLs as different pages.
In addition to the search engine problems it creates, it's also a major problem for usabilty - yours and your users. For example, a user who is trying to type in a direct URL can get a 404 error depending on what case they use.
More importantly, your Google Analytics will report on each of those version as separate pages, unless you write a normalizing filter into your GA profiles. Better to do that normalization for the actual site, not just your analytics
While rel=canonical can resolve a number of issues, I've always found it vastly better to correct the actual problem at its root, rather than rely on canonicalization as a catch-all. Anecdotally, I've found correcting issues like this with rewrites seems to allow affected pages to rank better than when just corrected with canonicalization. WIsh I could find time to do an actual case-study on that
Managing rewrites on IIS servers will require a plugin like asapi-rewrite as IIS doesn't handle it natively.
P.S. IIS will also allow and respect spaces in URLs. Users in Internet Explorer will see them as normal with spaces but browsers like Firefox will insert the html entity for a space (%20) into each necessary spot in the URL. This is again a mess for usability, so much better to force rewrite of all URLs to replace spaces with dashes when creating new pages. Many CMSs have plugins for this or you can also use sitewide rewrites to do it after the fact.
-
I think I get your point; the canonical is pointing to where the juice should go, but the URLs are still functionally different things. I'm guessing some sort of URL rewrite is in order, and to standardize how I do in-text links on the site (with user-editable content this part could be a pain).
-
Hey Valery,
I see those on closer inspection. I know it looks weird, but that's accurate. Your server must be UNIX or Linux so they will actually treat case as a different word.
For example: banana.com/pancakes.html would be treated differently than banana.com/PanCakes.html.
So if you have any pages generated dynamically or otherwise that differ only in case, then they will be tagged as duplicate.
In your CSV file you can see the duplicates being caused by case. I'd also be happy to help provide a few specific examples but would want to generate a ticket for you so we don't divulge any private information.
Cheers,
Joel.
-
Joel -
Thanks a lot for looking into that. The pages are very similar, so I'm not surprised they're being duplicate triggered; but what does surprise me is that they are apparently being considered duplicate to a canonical version of themselves? When I click on the duplicate list I'm expecting to see:
Product1.aspx
Product1-Blue.aspx
Product1-Red.aspx
But instead I'm seeing:
Product1.aspx
product1.aspx
product1.ASPX
And so on. The first scenario to me implies that the 3 pages are duplicate to each other, whereas the second is saying that there's either a canonical problem or I literally have different-case versions of those files.
-
Hi Valery,
I took a peek at your campaign and it looks like those few remaining duplicate pages are in fact different, but very minor differences. Basically there's pages for different sizes of things.
While being different, they vary in such minute ways that Roger see's them as duplicates.
I Hope that answers the question.
Thanks,
Joel.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
On page report card - small niggle
I've been carefully making page corrections and trying to get each age in line with SEOMoz recommendations. However, under the section marked Optional (and I realise I could just ignore it!) it tells me to "Avoid Using Meta Keywords Tag". However, none of the pages have any meta keywords in. They have a meta description, but no keywords. I have also removed any global keywords. Is the software wrong, or are there some more hidden somewhere.... website page is http://www.forktruckexpress.com/Hire/rossendale-forklift-hire.html Thanks in advance
Moz Pro | | Gordon_Hall0 -
Getting rid of duplicate content
Hi everyone, I'm a newbie and at the moment don't know very much about SEO. I have a problem with some of my campaigns where i keep getting a report with either Duplicate Page and/or Duplicate Content errors. I have no idea how to rectify this error, remove it or fix it on the relevant websites. Can anyone please help explain how to do this, maybe step by step? I really appreciate your views and opinions! Regards, Hugh
Moz Pro | | DigitalAcademyZA0 -
Only few pages (308 pages of 1000 something pages) have been crawled and diagnosed in 4 days, how many days till the entire website is crawled complete?
Setup campaign about 4-5 days ago and yesterday rogerbot said 308 pages were crawled and the diagnostics were provided. This website has over 1000+ pages and would like to know how long it would take for roger to crawl the entire website and provide diagnostics. Thanks!
Moz Pro | | TejaswiNaidu0 -
Canonical URLs and Duplicate Page Content
My website (doctor directory) is getting a lot of duplicate page content & duplicate page title warnings from SEOmoz. The pages that are getting the warnings are doctors profiles which can be accessed at three different URLs. Problem is this should be handled by the canonical tag on the pages. So example below, all three open the same page: https://www.arzttermine.de/arzt/dr-sara-danesh/ https://www.arzttermine.de/arzt/dr-sara-danesh/gkv https://www.arzttermine.de/arzt/dr-sara-danesh/pkv Here's our canonical tag (on line 34): rel="canonical" href="http://www.arzttermine.de/arzt/dr-sara-danesh" /> So why is SEO moz crawling the page? We are getting hundreds of errors from this - and yet Google doesn't have any of the duplicate URLs indexed...
Moz Pro | | thomashillard0 -
Some questions on Canonical tag AND 301 redirect
Hi everyone, I'm new here - always loved SEOMoz and glad to be part of the Pro community now. I have 2 questions regarding the Canonical URL tag. Some background info: We used to run an OsCommerce store, and recently migrated to Magento. In doing so, we right away created 301 redirects of the old category pages (OsCommerce) to the new category pages (Magento) via the Magento admin. Example: www.example.com/old-widget-category.html
Moz Pro | | yacpro13
301 redicrected to
www.example.com/new-widget-category.html In Magento admin, we have enabled the Canonical tag for all product and category pages. Here's how Magento sets up the Canonical tag: The URL of interest which we want to rank is:
www.example.com/new-widget-category.html However Magento sets up the canonical tag on this page to point to:
www.example.com/old-widget-category.html When using the SEOMoz On Page Report Card, it pick this up as an error because the Canonical tag is pointing to a different URL. However, if we dig a little deeper, we see that the URL being pointed to
www.example.com/old-widget-category.html
has a 301 redirect to
www.example.com/new-widget-category.html
which is the URL we wan to rank. So because we set up a 301 redirect of the old-page to the new-page, on the new-page the canonical tag points to the old-page. Question 1)
What are you opinions on this? Do you think this method of setting up the Canonical tag is acceptable? Second question... We use pagination for category pages, so if we have 50 products in one category, we would have 5 pages of 10 products. The URL's would be: www.example.com/new-widget-category.html (which is the SAME as ?p=1)
www.example.com/new-widget-category.html?p=1
www.example.com/new-widget-category.html?p=2
www.example.com/new-widget-category.html?p=3
www.example.com/new-widget-category.html?p=4
www.example.com/new-widget-category.html?p=5 Now ALL the URLs above have the canonical tag set as:
<link rel="canonical" href="http://www.example.com/new-widget-category" /> However, the content of each page (page 1, 2, 3, 4, 5) is different because different products are displayed. So far most what I read regarding the Canonical tag is that it is used for pages that have the same content but different URLs. I would hope that Google would combine the content of all 5 pages and view the result as a single URL www.example.com/new-widget-category Question 2) Is using the canonical tag appropriate in the case described above? Thanks !0 -
Fixing the Too Many On-Page Links
In our campaign I see that it reported that some of our pages have too many on-page links. But I think most of the links that was seen by MozBot is related to our images. There are a lot of images in our site and at the same time we support 11 languages which adds additional links One of the pages that have a lot of links is www.florahospitality.com/dining.aspx What can you <a></a>suggest to fix this? Thanks. <a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a><a></a>
Moz Pro | | shebinhassan0 -
Yellow Pages
We have just made a yellow pages site n in 3 weeks Google has just indexed 1700 pages out of 18000, so what can we do that Google index all the pages or how the process works? yellowpages.naitazi.com Regards
Moz Pro | | razasaeed0 -
SEOMoz only crawling 5 pages of my website
Hello, I've added a new website to my SEOmoz campaign tool. It only crawls 5 pages of the site. I know the site has way more pages then this and also has a blog. Google shows at least 1000 results indexed. Am I doing something wrong? Could it be that the site is preventing a proper crawl? Thanks Bill
Moz Pro | | wparlaman0