Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Duplicate Content and URL Capitalization
-
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input.
A couple examples are:
www.househitz.com/Pennsylvania/Houses-for-sale
www.househitz.com/Pennsylvania/houses-for-sale
www.househitz.com/Pennsylvania/Houses-for-rent
www.househitz.com/Pennsylvania/houses-for-rent
There are currently thousands of instances of this on the site.
Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
-
Hey Jom, you only rewrite the URL if it is not all lowercase, you can distinguish between lower and upper-case in your rewrites.
-
Mark,
In the canonicalization guide link you sent me, there is a link to Matt Cutts' blog www.mattcutts.com/blog/seo-advice-url-canonicalization/ where he talks about it. In that blog he posts:
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).This makes me think that doing a 301 redirect and a rel="canonical" for lower case is not needed.
I'm conflicted again.
-
When you rewrite a URL that is already lower case to lower case with a 301 response code, does it now return a 301? Does that mean all pages on the site now return 301? Wouldn't that be bad?
Sorry if I'm being dense. I understand enough about rewrite rules to be dangerous (sometimes, very dangerous).
Jom
-
Yeah, it is absolutely the right thing to do. You can force the URLs t be lower case in RoR as well if you don't want to do it in htaccess (i would do both).
You are simply saying:
-
there are multiple versions of this page on different urls
-
this is the main version of the page
301 them to lower case and canonicalise them and you are good to go!
Marcus
-
-
Thanks, much! I will read through these.
-
Hi Marcus and Mark,
Thanks for the response. On creating the rel="canonical" statements.
That means that I will have thousands, perhaps hundreds of thousands (there are a lot of cities and zips in the US) of rel="canonical" statements on my site.
I thought I read on one of the blogs that too many canonical statements are bad practice. The site is dynamic (Ruby on Rails), I can certainly make the change. I would just like to be sure it's the wise thing to do.
-
Hey Jom,
I must admit I am not sure on the level of urgency to sort this problem out but personally I like to keep the duplication of content to a minimum.
There are multiple ways to sort this out but the most straight forward would probably be to add a rel canonical tag to your web pages.
Here is a good post discussing the faceted issues you can get from e-commerce site, here is SEOMoz's canonicalization guide and here is another seomoz blog post about e-commerce sites and the use of the rel canonical tag.
Hope this helps
-
Hey Jom
Problem is, from a search engine perspective, those are four duplicate pages & from a linking perspective, they are four different pages that you could see your link popularity shared between. Neither of which is ideal.
I would certainly deal with this but it needn't be an arduous task.
1. Set up a rewrite rule to change all URLs to lowercase and 301 any non lowercase ones, something like this in your htaccess should do the job assuming you are using a LAMP environment.
RewriteEngine On RewriteMap lc int:tolower RewriteCond %{REQUEST_URI} [A-Z] RewriteRule (.*) ${lc:$1} [R=301,L]
2. Add an automated lowercase canonical to all of these pages so they canonicalise to the lowercase version.
3. Try to replace the links so they all use lowercase. If this is a dynamic site it should be easy but if not, you could still do a string replacement across multiple files. You could write a little script to automate this if it is a huge job from the sitemap (of lowercase URLs of course.
Certainly worth doing and should not be too difficult with a bit of smarts applied.
Hope this helps!
Marcus
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content and Subdirectories
Hi there and thank you in advance for your help! I'm seeking guidance on how to structure a resources directory (white papers, webinars, etc.) while avoiding duplicate content penalties. If you go to /resources on our site, there is filter function. If you filter for webinars, the URL becomes /resources/?type=webinar We didn't want that dynamic URL to be the primary URL for webinars, so we created a new page with the URL /resources/webinar that lists all of our webinars and includes a featured webinar up top. However, the same webinar titles now appear on the /resources page and the /resources/webinar page. Will that cause duplicate content issues? P.S. Not sure if it matters, but we also changed the URLs for the individual resource pages to include the resource type. For example, one of our webinar URLs is /resources/webinar/forecasting-your-revenue Thank you!
Technical SEO | | SAIM_Marketing0 -
Duplicate Content Issues with Pagination
Hi Moz Community, We're an eCommerce site so we have a lot of pagination issues but we were able to fix them using the rel=next and rel=prev tags. However, our pages have an option to view 60 items or 180 items at a time. This is now causing duplicate content problems when for example page 2 of the 180 item view is the same as page 4 of the 60 item view. (URL examples below) Wondering if we should just add a canonical tag going to the the main view all page to every page in the paginated series to get ride of this issue. https://www.example.com/gifts/for-the-couple?view=all&n=180&p=2 https://www.example.com/gifts/for-the-couple?view=all&n=60&p=4 Thoughts, ideas or suggestions are welcome. Thanks
Technical SEO | | znotes0 -
Duplicate content issue: staging urls has been indexed and need to know how to remove it from the serps
duplicate content issue: staging url has been indexed by google ( many pages) and need to know how to remove them from the serps. Bing sees the staging url as moved permanently Google sees the staging urls (240 results) and redirects to the correct url Should I be concerned about duplicate content and request Google to remove the staging url removed Thanks Guys
Technical SEO | | Taiger0 -
How does Google view duplicate photo content?
Now that we can search by image on Google and see every site that is using the same photo, I assume that Google is going to use this as a signal for ranking as well. Is that already happening? I ask because I have sold many photos over the years with first-use only rights, where I retain the copyright. So I have photos on my site that I own the copyright for that are on other sites (and were there first). I am not sure if I should make an effort to remove these photos from my site or if I can wait another couple years.
Technical SEO | | Lina5000 -
Duplicate Page Content and Titles from Weebly Blog
Anyone familiar with Weebly that can offer some suggestions? I ran a crawl diagnostics on my site and have some high priority issues that appear to stem from Weebly Blog posts. There are several of them and it appears that the post is being counted as "page content" on the main blog feed and then again when it is tagged to a category. I hope this makes sense, I am new to SEO and this is really confusing. Thanks!
Technical SEO | | CRMI0 -
Headers & Footers Count As Duplicate Content
I've read a lot of information about duplicate content across web pages and was interested in finding out about how that affected the header and footer of a website. A lot of my pages have a good amount of content, but there are some shorter articles on my website. Since my website has a header, footer, and sidebar that are static, could that hurt my ranking? My only concern is that sometimes there's more content in the header/footer/sidebar than the article itself since I have an extensive amount of navigation. Is there a way to define to Google what the header and footer is so that they don't consider it to be duplicate content?
Technical SEO | | CyberAlien0 -
How much to change to avoid duplicate content?
Working on a site for a dentist. They have a long list of services that they want us to flesh out with text. They provided a bullet list of services, we're trying to get 1 to 2 paragraphs of text for each. Obviously, we're not going to write this off the top of our heads. We're pulling text from other sources and trying to rework. The question is, how much rephrasing do we have to do to avoid a duplicate content penalty? Do we make sure there are changes per paragraph, sentence, or phrase? Thanks! Eric
Technical SEO | | ericmccarty0 -
Drupal URL Aliases vs 301 Redirects + Do URL Aliases create duplicates?
Hi all! I have just begun work on a Drupal site which heavily uses the URL Aliases feature. I fear that it is creating duplicate links. For example:: we have http://www.URL.com/index.php and http://www.URL.com/ In addition we are about to switch a lot of links and want to keep the search engine benefit. Am I right in thinking URL aliases change the URL, while leaving the old URL live and without creating search engine friendly redirects such as 301s? Thanks for any help! Christian
Technical SEO | | ChristianMKTG0