Mod Rewrite / .htaccess avoid duplicate content
-
I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.
i.e domain.com/file instead of domain.com/file.php
The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?
IF it helps here is my code.
Options +FollowSymLinks
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA] -
Hi Erik,
No problem, glad I could help
To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.
For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.
If you've got any other questions let me know, and I'll be happy to help.
Ben
-
Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?
I think it should be without extension but just want to double check.
-
Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.
-
Just tried this on my development server and it worked fine:
RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]
remove index RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]
The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar
-
Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.
-
Good answer Ben.
My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.
Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.
- you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.
-
Try using this instead:
<code>RewriteBase /</code>
<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]remove index
RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Recurring events and duplicate content
Does anyone have tips on how to work in an event system to avoid duplicate content in regards to recurring events? How do I best utilize on-page optimization?
Technical SEO | | megan.helmer0 -
Rel=canonical overkill on duplicate content?
Our site has many different health centers - many of which contain duplicate content since there is topic crossover between health centers. I am using rel canonical to deal with this. My question is this: Is there a tipping point for duplicate content where Google might begin to penalize a site even if it has the rel canonical tags in place on cloned content? As an extreme example, a site could have 10 pieces of original content, but could then clone and organize this content in 5 different directories across the site each with a new url. This would ultimately result in the site having more "cloned" content than original content. Is this at all problematic even if the rel canonical is in place on all cloned content? Thanks in advance for any replies. Eric
Technical SEO | | Eric_Lifescript0 -
How do I Address Low Quality/Duplicate Content Issue for a Job portal?
Hi, I want to optimize my job portal for maximum search traffic. Problems Duplicate content- The portal takes jobs from other portals/blogs and posts on our site. Sometimes employers provide the same job posting to multiple portals and we are not allowed to change it resulting in duplicate content Empty Content Pages- We have a lot of pages which can be reached via filtering for multiple options. Like IT jobs in New York. If there are no IT jobs posted in New York, then it's a blank page with little or no content Repeated Content- When we have job postings, we have about the company information on each job listing page. If a company has 1000 jobs listed with us, that means 1000 pages have the exact same about the company wording Solutions Implemented Rel=prev and next. We have implemented this for pagination. We also have self referencing canonical tags on each page. Even if they are filtered with additional parameters, our system strips of the parameters and shows the correct URL all the time for both rel=prev and next as well as self canonical tags For duplicate content- Due to the volume of the job listings that come each day, it's impossible to create unique content for each. We try to make the initial paragraph (at least 130 characters) unique. However, we use a template system for each jobs. So a similar pattern can be detected after even 10 or 15 jobs. Sometimes we also take the wordy job descriptions and convert them into bullet points. If bullet points already available, we take only a few bullet points and try to re-shuffle them at times Can anyone provide me additional pointers to improve my site in terms of on-page SEO/technical SEO? Any help would be much appreciated. We are also thinking of no-indexing or deleting old jobs once they cross X number of days. Do you think this would be a smart strategy? Should I No-index empty listing pages as well? Thank you.
Technical SEO | | jombay3 -
Dealing with duplicate content
Manufacturer product website (product.com) has an associated direct online store (buyproduct.com). the online store has much duplicate content such as product detail pages and key article pages such as technical/scientific data is duplicated on both sites. What are some ways to lessen the duplicate content here? product.com ranks #1 for several key keywords so penalties can't be too bad and buyproduct.com is moving its way up the SERPS for similar terms. Ideally I'd like to combine the sites into one, but not in the budget right away. Any thoughts?
Technical SEO | | Timmmmy0 -
Duplicate content with same URL?
SEOmoz is saying that I have duplicate content on: http://www.XXXX.com/content.asp?ID=ID http://www.XXXX.com/CONTENT.ASP?ID=ID The only difference I see in the URL is that the "content.asp" is capitalized in the second URL. Should I be worried about this or is this an issue with the SEOmoz crawl? Thanks for any help. Mike
Technical SEO | | Mike.Goracke0 -
Duplicate Content based on www.www
In trying to knock down the most common errors on our site, we've noticed we do have an issue with dupicate content; however, most of the duplicate content errors are due to our site being indexed with www.www and not just www. I am perplexed as to how this is happening. Searching through IIS, I see nothing that would be causing this, and we have no hostname records setup that are www.www. Does anyone know of any other things that may cause this and how we can go about remedying it?
Technical SEO | | CredA0 -
Duplicate content error - same URL
Hi, One of my sites is reporting a duplicate content and page title error. But it is the same page? And the home page at that. The only difference in the error report is a trailing slash. www.{mysite}.co.uk www.{mysite}.co.uk/ Is this an easy htaccess fix? Many thanks TT
Technical SEO | | TheTub1 -
How do i deal with duplicate content on the same domain?
I'm trying to find out if there's a way we can combat similar content on different pages on the same site, without having to re write the whole lot? Any ideas?
Technical SEO | | indurain0