Why do I get duplicate pages, website referencing the capital version of the url vs the lowercase www.agi-automation.com/Pneumatic-grippers.htm
-
Can I the rel=canonical tag this?
-
I'm not a pro when it comes to technical server set ups, so maybe Keri can jump in with some better knowledge.
It seems to me like you have everything set up on your server correctly. And it looks like Google currently has only one version indexed of the original page in question.
You site navigation menu points to the capitalized version of the URL, but somewhere on your site there must be a link that points to the lowercase version which would explain how SEOmoz found the duplication when crawling your site, and if SEOmoz can find, so can Google.
I still think you should use the rel=canonical attribute just to be safe. Again, I'm not that great at technical stuff. Sorry I couldn't be of more help here.
Tim
-
Hi Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Lin...). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Hi Keri and Tim,
Thanks for your responses. This is what the IT team has found. Let me know your thoughts:
On the physical computer that hosts the website the page exists as one file. The casing of the file is irrelevant to the host machine, it wouldn't allow 2 files of the same name in the same directory.
To reenforce this point, you can access said file by camel-casing the URI in any fashion (eg; http://www.agi-automation.com/Linear-EscapeMents.htm). This does not bring up a different file each time, the server merely processes the URI as case-less and pulls the file by it's name.
What is happening in the example given is that some sort of indexer is being used to create a "dummy" reference of all the site files. Since the indexer doesn't have file access to the server, it does this by link crawling instead of reading files. It is the crawler that is making an assumption that the different casings of the pages are in fact different files. Perhaps there is a setting in the indexer to ignore casing.
So the indexer is thinking that these are 2 different pages when they really aren't. This makes all of the other points moot, though they would certainly be relevant in the case of an actual duplicated page."
-
Excellent points, Keri. I hadn't thought about either of those issues. Using a redirect is definitely the best way to go.
-
I'd vote for doing the rewrite to the lowercase version. This gives you a couple of added benefits:
-
If people copy and paste the URL from their browser then link to it, you're getting all the links going to the same place.
-
Your analytics based on your URLs will be more accurate. Instead of seeing:
urla.htm 70 visits
urlb.htm 60 visits
urlB.htm 30 visitsYou'll see
urlb.htm 90 visits
urla.htm 70 visits -
-
The problem is that search engines view these URLs as two separate pages, so both pages get indexed and you run into duplication issues.
Yes, using rel=canonical is a good way to handle this. I would suggest using the lowercase version as your canonical page, so you would place this bit of HTML on both pages:
The other option is to create a 301 redirect from the caps version to the lowercase version. This would ensure that anyone arriving at the page (including search engine bots) would end up being directed to the lowercase version.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Some of my website urls are not getting indexed while checking (site: domain) in google
Some of my website urls are not getting indexed while checking (site: domain) in google
Technical SEO | | nlogix0 -
Redesigned and Migrated Website - Lost Almost All Organic Traffic - Mobile Pages Indexing over Normal Pages
We recently redesigned and migrated our site from www.jmacsupply.com to https://www.jmac.com It has been over 2 weeks since implementing 301 redirects, and we have lost over 90% of our organic traffic. Google seems to be indexing the mobile versions of our pages over our website pages. We hired a designer to redesign the site, and we are confident the code is doing something that is harmful for ranking our website. F or Example: If you google "KEEDEX-K-DS-FLX38" You should see our mobile page ranking: http://www.jmac.com/mobile/Product.aspx?ProductCode=KEEDEX-K-DS-FLX38 but the page that we want ranked (and we think should be, is https://www.jmac.com/Keedex_K_DS_FLX38_p/keedex-k-ds-flx38.htm) That second page isn't even indexed. (When you search for: "site:jmac.com Keedex K-DS-FLX38") We have implemented rel canonical, and rel alternate both ways. What are we doing wrong??? Thank you in advance for any help - it is much appreciated.
Technical SEO | | jmaccom0 -
Duplicate content or Duplicate page issue?
Hey Moz Community! I have a strange case in front of me. I have published a press release on my client's website and it ranked right away in Google. A week after the page completely dropped and it completely disappeared. The page is being indexed in Google, but when I search "title of the PR", the only results I get for that search query are the media and news outlets that have reported the news. No presence of my client's page. I also have to mention that I found two URLs of the same page: one with lower case letters and one with capital letters. Is this a duplicate page or a duplicate content issue coming from the news websites? How can I solve it? Thanks!
Technical SEO | | Workaholic0 -
Duplicate Page Title for a Large Listing Website
My company has a popular website that has over 4,000 crawl errors showing in Moz, most of them coming up as Duplicate Page Title. These duplicate page titles are coming from pages with the title being the keyword, then location, such as: "main keyword" North Carolina
Technical SEO | | StorageUnitAuctionList
"main keyword" Texas ... and so forth. These pages are ranked and get a lot of traffic. I was wondering what the best solution is for resolving these types of crawl errors without it effecting our rankings. Thanks!0 -
How to remove the duplicate page title
Hi everyone, I saw many posts related to this query.But i couldnt find a solution for my error.. Here is my question I got 575 Duplicate page title & 600 duplicate page content errors. My site is related to realestate. I created a page title like same sentence differs with locality name Eg: Land for sale - kandy property Land for sale - Galle property Likewise Locality name only differs..I have created meta title & Content like this. Can anyone let me know how to solve this error ASAP ?
Technical SEO | | Rajesh.Chandran0 -
Versions of same site with www, no www, ww, and w
It's just come to light that a couple of our clients have both www. and no www versions of their site, with no 301 in place. That's all fine, we're (trying) to get them to sort it. But, strangely, one of our clients not only has the www. and no www, but they also have ww., and w. - all showing their site as normal - the only difference being that the www. and no www are both PR 3 while all other versions are N/A. Does anyone have any idea what's going on/if it's a problem? Thanks very much 🙂
Technical SEO | | Chuck-Boom0 -
Redirect non www. domain to WWW. domain for established website?
Hey guys, The website in question has been online for more than 5 years but there are still 2 versions of the website. Both versions are indexed by Google and of course, this will result in duplicate content. Is it necessary to redirect the non-www domain to the www. domain. What are the cons and advantages? Will the www. links replace the non-www links when it comes to keyword rankings? Thanks.
Technical SEO | | BruLee0 -
How do I redirect non www pages to www on a windows server?
As the .htaccess file cannot be worked on, I added this php code 301 redirect if the URL does not contain a www on all the pages (small website - 10 pages) : header( "HTTP/1.1 301 Moved Permanently" ); header( "Location: $location" ); I want to know if this is ok for SEO? Has anyone done this on a windows server? Or if you have any better methods, it would be great if you can share. Please help. Thanks.
Technical SEO | | ArjunRajkumar0