URL Capitalization Inconsistencies Registering Duplicate Content Crawl Errors
-
Hello,
I have a very large website that has a good amount of "Duplicate Content" issues according to MOZ. In reality though, it is not a problem with duplicate content, but rather a problem with URLs. For example: http://acme.com/product/features and http://acme.com/Product/Features both land on the same page, but MOZ is seeing them as separate pages, therefor assuming they are duplicates.
We have recently implemented a solution to automatically de-captialize all characters in the URL, so when you type acme.com/Products, the URL will automatically change to acme.com/products – but MOZ continues to flag multiple "Duplicate Content" issues. I noticed that many of the links on the website still have the uppercase letters in the URL even though when clicked, the URL changes to all lower case. Could this be causing the issue?
What is the best way to remove the "Duplicate Content" issues that are not actually duplicate content?
-
http://moz.com/learn/seo/canonicalization
"Another option for dealing with duplicate content is to utilize the rel=canonical tag. The rel=canonical tag passes the same amount of link juice (ranking power) as a 301 redirect, and often takes much less development time to implement."
-
If you check Google Analytics, GA is probably seeing it too. We had a similar problem. Canonicalization will help with duplicate content, but it won't help with rankings. Internally, you are sending link juice to multiple versions of the same page. In addition, you could have backlinks pointing at multiple duplicate pages, and splitting the link love.
Canonicalization does not transfer link juice the way a 301 Redirect does. All the canonical tag does is tell Google "Rank This Page". If you don't care about rankings the canonical is fine. If you do care, you need to 301 all of your pages to the lower case version.
If you decide to 301, first, build an HTML sitemap with all of the uppercase URLs. After you do the 301, have Google fetch the sitemap and submit it, This will help Googlebot wind all of the pages that were 301ed.
-
Hey man. If your store is a Magento store, there are settings for adding canonicalization tags to categories and products under:
System => Configuration => Catalog => Search Engine Optimization
h/t to Yoast for reminding me of the string to get there.
I am optimizing a Magento store that had a similar issue after a relaunch. Found that to be a very easy way to fix it.
Hope this helps.
-
I have a complex CMS for my main website; I just asked my developer to do it. On my Wordpress sites, I use an SEO plugin for this (Yoast).
-
Thank you so much Linda.
Do you know of a fast way to add a rel=canonical tag to all the pages, the website is quite large, and it would likely take months over months to do it manually.
-
Hi,
There is the same question here on moz: Duplicate Content and URL Capitalization
-
The best way to fix this is with a rel=canonical URL. Tag each page with the lower-case version. (I had this same problem.)
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Possible duplicate content issues on same page with urls to multiple tabs?
Hello everyone! I'm first time here, and glad to be part of Moz community! Jumping right into the question I have. For a type of pages we have on our website, there are multiple tabs on each page. To give an example, let's say a page is for the information about a place called "Ladakh". Now the various urls that the page is accessible from, can take the form of: mywanderlust.in/place/ladakh/ mywanderlust.in/place/ladakh/photos/ mywanderlust.in/place/ladakh/places-to-visit/ and so on. To keep the UX smooth when the user switches from one tab to another, we load everything in advance with AJAX but it remains hidden till the user switches to the required tab. Now since the content is actually there in the html, does Google count it as duplicate content? I'm afraid this might be the case as when I Google for a text that's visible only on one of the tabs, I still see all tabs in Google results. I also see internal links on GSC to say a page mywanderlust.in/questions which is only supposed to be linked from one tab, but GSC telling internal links to this page (mywanderlust.in/questions) from all those 3 tabs. Also, Moz Pro crawl reports informed me about duplicate content issues, although surprisingly it says the issue exists only on a small fraction of our indexable pages. Is it hurting our SEO? Any suggestions on how we could handle the url structure better to make it optimal for indexing. FWIW, we're using a fully responsive design with the displayed content being exactly same for both desktop and mobile web. Thanks a ton in advance!
Intermediate & Advanced SEO | | atulgoyal0 -
HELP! How does one prevent regional pages as being counted as "duplicate content," "duplicate meta descriptions," et cetera...?
The organization I am working with has multiple versions of its website geared towards the different regions. US - http://www.orionhealth.com/ CA - http://www.orionhealth.com/ca/ DE - http://www.orionhealth.com/de/ UK - http://www.orionhealth.com/uk/ AU - http://www.orionhealth.com/au/ NZ - http://www.orionhealth.com/nz/ Some of these sites have very similar pages which are registering as duplicate content, meta descriptions and titles. Two examples are: http://www.orionhealth.com/terms-and-conditions http://www.orionhealth.com/uk/terms-and-conditions Now even though the content is the same, the navigation is different since each region has different product options / services, so a redirect won't work since the navigation on the main US site is different from the navigation for the UK site. A rel=canonical seems like a viable option, but (correct me if I'm wrong) it tells search engines to only index the main page, in this case, it would be the US version, but I still want the UK site to appear to search engines. So what is the proper way of treating similar pages accross different regional directories? Any insight would be GREATLY appreciated! Thank you!
Intermediate & Advanced SEO | | Scratch_MM0 -
Duplicate Content www vs. non-www and best practices
I have a customer who had prior help on his website and I noticed a 301 redirect in his .htaccess Rule for duplicate content removal : www.domain.com vs domain.com RewriteCond %{HTTP_HOST} ^MY-CUSTOMER-SITE.com [NC]
Intermediate & Advanced SEO | | EnvoyWeb
RewriteRule (.*) http://www.MY-CUSTOMER-SITE.com/$1 [R=301,L,NC] The result of this rule is that i type MY-CUSTOMER-SITE.com in the browser and it redirects to www.MY-CUSTOMER-SITE.com I wonder if this is causing issues in SERPS. If I have some inbound links pointing to www.MY-CUSTOMER-SITE.com and some pointing to MY-CUSTOMER-SITE.com, I would think that this rewrite isn't necessary as it would seem that Googlebot is smart enough to know that these aren't two sites. -----Can you comment on whether this is a best practice for all domains?
-----I've run a report for backlinks. If my thought is true that there are some pointing to www.www.MY-CUSTOMER-SITE.com and some to the www.MY-CUSTOMER-SITE.com, is there any value in addressing this?0 -
Finding Duplicate Content Spanning more than one Site?
Hi forum, SEOMoz's crawler identifies duplicate content within your own site, which is great. How can I compare my site to another site to see if they share "duplicate content?" Thanks!
Intermediate & Advanced SEO | | Travis-W0 -
Duplicate Content Question
My client's website is for an organization that is part of a larger organization - which has it's own website. We were given permission to use content from the larger organization's site on my client's redesigned site. The SEs will deem this as duplicate content, right? I can "re-write" the content for the new site, but it will still be closely based on the original content from the larger organization's site, due to the scientific/medical nature of the subject material. Is there a way around this dilemma so I do not get penalized? Thanks!
Intermediate & Advanced SEO | | Mills1 -
How to compete with duplicate content in post panda world?
I want to fix duplicate content issues over my eCommerce website. I have read very valuable blog post on SEOmoz regarding duplicate content in post panda world and applied all strategy to my website. I want to give one example to know more about it. http://www.vistastores.com/outdoor-umbrellas Non WWW version: http://vistastores.com/outdoor-umbrellas redirect to home page. For HTTPS pages: https://www.vistastores.com/outdoor-umbrellas I have created Robots.txt file for all HTTPS pages as follow. https://www.vistastores.com/robots.txt And, set Rel=canonical to HTTP page as follow. http://www.vistastores.com/outdoor-umbrellas Narrow by search: My website have narrow by search and contain pages with same Meta info as follow. http://www.vistastores.com/outdoor-umbrellas?cat=7 http://www.vistastores.com/outdoor-umbrellas?manufacturer=Bond+MFG http://www.vistastores.com/outdoor-umbrellas?finish_search=Aluminum I have restricted all dynamic pages by Robots.txt which are generated by narrow by search. http://www.vistastores.com/robots.txt And, I have set Rel=Canonical to base URL on each dynamic pages. Order by pages: http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name I have restrict all pages with robots.txt and set Rel=Canonical to base URL. For pagination pages: http://www.vistastores.com/outdoor-umbrellas?dir=asc&order=name&p=2 I have restrict all pages with robots.txt and set Rel=Next & Rel=Prev to all paginated pages. I have also set Rel=Canonical to base URL. I have done & apply all SEO suggestions to my website but, Google is crawling and indexing 21K+ pages. My website have only 9K product pages. Google search result: https://www.google.com/search?num=100&hl=en&safe=off&pws=0&gl=US&q=site:www.vistastores.com&biw=1366&bih=520 Since last 7 days, my website have affected with 75% down of impression & CTR. I want to recover it and perform better as previous one. I have explained my question in long manner because, want to recover my traffic as soon as possible.
Intermediate & Advanced SEO | | CommercePundit0 -
How to fix duplicated urls
I have an issue with duplicated pages. Should I use cannonical tag and if so, how? Or should change the page titles? This is causing my pages to compete with each other in the SERPs. 'Paradisus All Inclusive Luxury Resorts - Book your stay at Paradisus Resorts' is also used on http://www.paradisus.com/booking-template.php | http://www.paradisus.com/booking-template.php?codigoHotel=5889 line 9 | | http://www.paradisus.com/booking-template.php?codigoHotel=5891 line 9 | | http://www.paradisus.com/booking-template.php?codigoHotel=5910 line 9 | | http://www.paradisus.com/booking-template.php?codigoHotel=5911 line 9 |
Intermediate & Advanced SEO | | Melia0 -
400 errors and URL parameters in Google Webmaster Tools
On our website we do a lot of dynamic resizing of images by using a script which automatically re-sizes an image dependant on paramaters in the URL like: www.mysite.com/images/1234.jpg?width=100&height=200&cut=false In webmaster tools I have noticed there are a lot of 400 errors on these image Also when I click the URL's listed as causing the errors the URL's are URL Encoded and go to pages like this (this give a bad request): www.mysite.com/images/1234.jpg?%3Fwidth%3D100%26height%3D200%26cut%3Dfalse What are your thoughts on what I should do to stop this? I notice in my webmaster tools "URL Parameters" there are parameters for:
Intermediate & Advanced SEO | | James77
height
width
cut which must be from the Image URLs. These are currently set to "Let Google Decide", but should I change them manually to "Doesn't effect page content"? Thanks in advance0