Duplicate Content and URL Capitalization
-
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input.
A couple examples are:
www.househitz.com/Pennsylvania/Houses-for-sale
www.househitz.com/Pennsylvania/houses-for-sale
www.househitz.com/Pennsylvania/Houses-for-rent
www.househitz.com/Pennsylvania/houses-for-rent
There are currently thousands of instances of this on the site.
Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
-
Hey Jom, you only rewrite the URL if it is not all lowercase, you can distinguish between lower and upper-case in your rewrites.
-
Mark,
In the canonicalization guide link you sent me, there is a link to Matt Cutts' blog www.mattcutts.com/blog/seo-advice-url-canonicalization/ where he talks about it. In that blog he posts:
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).This makes me think that doing a 301 redirect and a rel="canonical" for lower case is not needed.
I'm conflicted again.
-
When you rewrite a URL that is already lower case to lower case with a 301 response code, does it now return a 301? Does that mean all pages on the site now return 301? Wouldn't that be bad?
Sorry if I'm being dense. I understand enough about rewrite rules to be dangerous (sometimes, very dangerous).
Jom
-
Yeah, it is absolutely the right thing to do. You can force the URLs t be lower case in RoR as well if you don't want to do it in htaccess (i would do both).
You are simply saying:
-
there are multiple versions of this page on different urls
-
this is the main version of the page
301 them to lower case and canonicalise them and you are good to go!
Marcus
-
-
Thanks, much! I will read through these.
-
Hi Marcus and Mark,
Thanks for the response. On creating the rel="canonical" statements.
That means that I will have thousands, perhaps hundreds of thousands (there are a lot of cities and zips in the US) of rel="canonical" statements on my site.
I thought I read on one of the blogs that too many canonical statements are bad practice. The site is dynamic (Ruby on Rails), I can certainly make the change. I would just like to be sure it's the wise thing to do.
-
Hey Jom,
I must admit I am not sure on the level of urgency to sort this problem out but personally I like to keep the duplication of content to a minimum.
There are multiple ways to sort this out but the most straight forward would probably be to add a rel canonical tag to your web pages.
Here is a good post discussing the faceted issues you can get from e-commerce site, here is SEOMoz's canonicalization guide and here is another seomoz blog post about e-commerce sites and the use of the rel canonical tag.
Hope this helps
-
Hey Jom
Problem is, from a search engine perspective, those are four duplicate pages & from a linking perspective, they are four different pages that you could see your link popularity shared between. Neither of which is ideal.
I would certainly deal with this but it needn't be an arduous task.
1. Set up a rewrite rule to change all URLs to lowercase and 301 any non lowercase ones, something like this in your htaccess should do the job assuming you are using a LAMP environment.
RewriteEngine On RewriteMap lc int:tolower RewriteCond %{REQUEST_URI} [A-Z] RewriteRule (.*) ${lc:$1} [R=301,L]
2. Add an automated lowercase canonical to all of these pages so they canonicalise to the lowercase version.
3. Try to replace the links so they all use lowercase. If this is a dynamic site it should be easy but if not, you could still do a string replacement across multiple files. You could write a little script to automate this if it is a huge job from the sitemap (of lowercase URLs of course.
Certainly worth doing and should not be too difficult with a bit of smarts applied.
Hope this helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content/Similar Pages
Hello, I'm working on our site and I'm coming into an issue with the duplicate content. Our company manufactures heavy-duty mobile lifts. We have two main lifts. They are the same, except for capacity. We want to keep the format similar and the owner of the company wants each lift to have its own dedicated page. Obviously, since the layout is the same and content is similar I'm getting the duplicate content issue. We also have a section of our accessories and a section of our parts. Each of these sections have individual pages for the accessory/part. Again, the pages are laid out in a similar fashion to keep the cohesiveness, and the content is different, however similar. Meaning different terminology, part numbers, stock numbers, etc., but the overall wording is similar. What can I do to combat these issues? I think our ratings are dropping due to the duplicate content.
Technical SEO | | slecinc0 -
Purchasing duplicate content
Morning all, I have a client who is planning to expand their product range (online dictionary sites) to new markets and are considering the acquisition of data sets from low ranked competitors to supplement their own original data. They are quite large content sets and would mean a very high percentage of the site (hosted on a new sub domain) would be made up of duplicate content. Just to clarify, the competitor's content would stay online as well. I need to lay out the pros and cons of taking this approach so that they can move forward knowing the full facts. As I see it, this approach would mean forgoing ranking for most of the site and would need a heavy dose of original content as well as supplementing the data on page to build around the data. My main concern would be that launching with this level of duplicate data would end up damaging the authority of the site and subsequently the overall domain. I'd love to hear your thoughts!
Technical SEO | | BackPack851 -
Content Duplication - Zencart
Hi Guys !!! Based on crawler results, it shows that I have 188 duplicate content pages, out of which some are those in which I am not able to understand where the duplication is ??? The page created is unique. All the URL's are static, all titles, metat tags are unique. How do I remove this duplication !!! I am using Zencart as a platform. Thanks in advance for the help !!! 🙂
Technical SEO | | sidjain4you0 -
Duplicate Content Reports
Hi Dupe content reports for a new client are sjhowing very high numbers (8000+) main of them seem to be for sign in, register, & login type pages, is this a scenario where best course of action to resolve is likely to be via the parameter handling tool in GWT ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
301 duplicate content dynamic url
I have a number of pages that appear as duplicate titles in google webmaster. They all have to do with a brand name query. I want to 301 these pages since I'm going to relaunch my new website on wordpress and don't want to have 404s on these pages. a simple 301 redirect doesn't work since they are dynamic urls. here is an example: /kidsfashionnetherlands/mimpi.html?q=brand%3Amim+pi%3A&page=2&sort=relevance /kidsfashionnetherlands/mimpi.html?q=mim+pi&page=3&sort=relevance /kidsfashionnetherlands/mimpi.html?q=mim+pi&page=5&sort=relevance should all be 301 to the original page that I want to remain indexed: /kidsfashionnetherlands/mimpi.html I have a lot of these but for different queries. Should I do a 301 on each of them to avoid having 404s when I change my site to wordpress? Thanks
Technical SEO | | dashinfashion0 -
How do I fix these duplicate URLs?
HI guys, I ran a report on my site and it shows some duplicate titles (example below). Do I need to add something to the htaccess file or another file to fix this? I understand that the search engines should only see 1 URL for the page. 2 pages have "Bikes for sale | used bikes | second hand bicycles" title pauslwebsite.com/bikes/ paulswebsite.com/bikes/index.asp Thanks
Technical SEO | | paulmund0 -
Strange duplicate content issue
Hi there, SEOmoz crawler has identified a set of duplicate content that we are struggling to resolve. For example, the crawler picked up that this page www. creative - choices.co.uk/industry-insight/article/Advice-for-a-freelance-career is a duplicate of this page www. creative - choices.co.uk/develop-your-career/article/Advice-for-a-freelance-career. The latter page's content is the original and can be found in the CMS admin area whilst the former page is the duplicate and has no entry in the CMS. So we don't know where to begin if the "duplicate" page doesn't exist in the CMS. The crawler states that this page www. creative-choices.co.uk/industry-insight/inside/creative-writing is the referrer page. Looking at it, only the original page's link is showing on the referrer page, so how did the crawler get to the duplicate page?
Technical SEO | | CreativeChoices0 -
Duplicate canonical URLs in WordPress
Hi everyone, I'm driving myself insane trying to figure this one out and am hoping someone has more technical chops than I do. Here's the situation... I'm getting duplicate canonical tags on my pages and posts, one is inside of the WordPress SEO (plugin) commented section, and the other is elsewhere in the header. I am running the latest version of WordPress 3.1.3 and the Genesis framework. After doing some testing and adding the following filters to my functions.php: <code>remove_action('wp_head', 'genesis_canonical'); remove_action('wp_head', 'rel_canonical');</code> ... what I get is this: With the plugin active + NO "remove action" - duplicate canonical tags
Technical SEO | | robertdempsey
With the plugin disabled + NO "remove action" - a single canonical tag
With the plugin disabled + A "remove action" - no canonical tag I have tried using only one of these remove_actions at a time, and then combining them both. Regardless, as long as I have the plugin active I get duplicate canonical tags. Is this a bug in the plugin, perhaps somehow enabling the canonical functionality of WordPress? Thanks for your help everyone. Robert Dempsey0