Mod Rewrite / .htaccess avoid duplicate content
-
I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.
i.e domain.com/file instead of domain.com/file.php
The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?
IF it helps here is my code.
Options +FollowSymLinks
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA] -
Hi Erik,
No problem, glad I could help
To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.
For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.
If you've got any other questions let me know, and I'll be happy to help.
Ben
-
Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?
I think it should be without extension but just want to double check.
-
Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.
-
Just tried this on my development server and it worked fine:
RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]
remove index RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]
The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar
-
Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.
-
Good answer Ben.
My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.
Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.
- you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.
-
Try using this instead:
<code>RewriteBase /</code>
<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]remove index
RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content and rel canonicals?
Hi. I have a question relating to 2 sites that I manage with regards to duplicate content. These are 2 separate companies but the content is off a data base from the one(in other words the same). In terms of the rel canonical, how would we do this so that google does not penalise either site but can also have the content to crawl for both or is this just a dream?
Technical SEO | | ProsperoDigital0 -
Do I use /es/, /mx/ or /es-mx/ for my Spanish site for Mexico only
I currently have the Spanish version of my site under myurl.com/es/ When I was at Pubcon in Vegas last year a panel reviewed my site and said the Spanish version should be in /mx/ rather than /es/ since es is for Spain only and my site is for Mexico only. Today while trying to find information on the web I found /es-mx/ as a possibility. I am changing my site and was planning to change to /mx/ but want confirmation on the correct way to do this. Does anyone have a link to Google documentation that will tell me for sure what to use here? The documentation I read led me to the /es/ but I cannot find that now.
Technical SEO | | RoxBrock0 -
Duplicate content error from url generated
We are getting a duplicate content error, with "online form/" being returned numerous times. Upon inspecting the code, we are calling an input form via jQuery which is initially called by something like this: Opens Form Why would this be causing it the amend the URL and to be crawled?
Technical SEO | | pauledwards0 -
Duplicate content, Original source?
Hi there, say i have two websites with identicle content. website a had content on before website b - so will be seen as the original source? If the content was intended for website b, would taking it off a then make the orinal source to google then go to website b? I want website b to get the value of the content but it was put on website a first - would taking it off website a then give website b the full power of the content? Any help of advice much appreciated. Kind Regards,
Technical SEO | | pauledwards0 -
Duplicate homepage content
Hi, I recently did a site crawl using seomoz crawl test My homepage seems to have 3 cases of duplicate content.. These are the urls www.example.ie/ www.example..ie/%5B%7E19%7E%5D www.example..ie/index.htm Does anyone have any advise on this? What impact does this have on my seo?
Technical SEO | | Socialdude0 -
Root domain not resolving to www. Duplicate content?
Hi, I'm working with a domain that stays on the root domain if the www is not included. But if the www is included, it stays with the www. LIke this: example.com
Technical SEO | | HardyIntl
or
www.example.com Of course, they are identical and both go to the same IP. Do search engines consider that to be duplicate content? thanks,
michael0 -
Duplicate content on my home
Hello, I have duplication with my home page. It comes in two versions of the languages: French and English. http://www.numeridanse.tv/fr/ http://www.numeridanse.tv/en/ You should know that the home page are not directories : http://www.numeridanse.tv/ Google indexes the three versions: http://bit.ly/oqKT0H To avoid duplicating what is the best solution?
Technical SEO | | android_lyon
Have a version of the default language? Thanks a lot for your answers. Take care. A.0 -
Mod Rewrite Help
I need some help with doing the Mod Rewrite on our htaccess file. Basically I've a URL like this : www.example.co.uk/new-toy-search?keys=xbox and I want to rewrite the URL into something else (such as new-toy-search-xbox.html) Could someone please take me through the steps of how to do this? I've had a look at the Apache site but it's really confusing the way it's explained!
Technical SEO | | DanHill0