Mod Rewrite / .htaccess avoid duplicate content
-
I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.
i.e domain.com/file instead of domain.com/file.php
The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?
IF it helps here is my code.
Options +FollowSymLinks
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA] -
Hi Erik,
No problem, glad I could help
To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.
For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.
If you've got any other questions let me know, and I'll be happy to help.
Ben
-
Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?
I think it should be without extension but just want to double check.
-
Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.
-
Just tried this on my development server and it worked fine:
RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]
remove index RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]
The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar
-
Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.
-
Good answer Ben.
My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.
Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.
- you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.
-
Try using this instead:
<code>RewriteBase /</code>
<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]remove index
RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content on user queries
Our website supports a unique business industry where our users will come to us to look for something very specific (a very specific product name) to find out where they can get it. The problem that we're facing is that the products are constantly changing due to the industry. So, for example, one month, one product might be found on our website, and the next, it might be removed completely... and then might come back again a couple months later. All things that are completely out of our control - and we have no way of receiving any sort of warning when these things might happen. Because of this, we're seeing a lot of duplicate content issues arise... For Example... Product A is not active today... so www.mysite.com/search/productA will return no results... Product B is also not active today... so www.mysite.com/search/productB will also return no results. As per Moz Analytics, these are showing up as duplicate content because both pages indicate "No results were found for {your searched term}." Unfortunately, it's a bit difficult to return a 204 in these situations (which I don't know if a 204 would help anyway) or a 404, because, for a faster user experience, we simultaneously render different sections of the page... so in the very beginning of the page load - we start rendering the faster content (template type of content) that says "returning 200 code, we got the query successfully & we're loading the page".. the unique content results finish loading last since they take the longest. I'm still very new to the SEO world, so would greatly appreciate any ideas or suggestions that might help with this... I'm stuck. 😛 Thanks in advance!
Technical SEO | | SFMoz0 -
Duplicate Content Issue
My issue with duplicate content is this. There are two versions of my website showing up http://www.example.com/ http://example.com/ What are the best practices for fixing this? Thanks!
Technical SEO | | OOMDODigital0 -
Duplicate Content Vs No Content
Hello! A question that has been throw around a lot at our company has been "Is duplicate content better than no content?". We operate a range of online flash game sites, most of which pull their games from a feed, which includes the game description. We have unique content written on the home page of the website, but aside from that, the game descriptions are the only text content on the website. We have been hit by both Panda and Penguin, and are in the process of trying to recover from both. In this effort we are trying to decide whether to remove or keep the game descriptions. I figured the best way to settle the issue would be to ask here. I understand the best solution would be to replace the descriptions with unique content, however, that is a massive task when you've got thousands of games. So if you have to choose between duplicate or no content, which is better for SEO? Thanks!
Technical SEO | | Ryan_Phillips0 -
Duplicate Content
The crawl shows a lot of duplicate content on my site. Most of the urls its showing are categories and tags (wordpress). so what does this mean exactly? categories is too much like other categories? And how do i go about fixing this the best way. thanks
Technical SEO | | vansy0 -
How to prevent duplicate content in archives?
My news site has a number of excerpts in the form of archives based on categories that is causing duplicate content problems. Here's an example with the nutrition archive. The articles here are already posts, so it creates the duplicate content. Should I nofollow/noindex this category page along with the rest and 2011,2012 archives etc (see archives here)? Thanks so much for any input!
Technical SEO | | naturalsociety0 -
Duplicate Content
Many of the pages on my site are similar in structure/content but not exactly the same. What amount of content should be unique for Google to not consider it duplicate? If it is something like 50% unique would it be preferable to choose one page as the canonical instead of keeping them both as separate pages?
Technical SEO | | theLotter0 -
Strange duplicate content issue
Hi there, SEOmoz crawler has identified a set of duplicate content that we are struggling to resolve. For example, the crawler picked up that this page www. creative - choices.co.uk/industry-insight/article/Advice-for-a-freelance-career is a duplicate of this page www. creative - choices.co.uk/develop-your-career/article/Advice-for-a-freelance-career. The latter page's content is the original and can be found in the CMS admin area whilst the former page is the duplicate and has no entry in the CMS. So we don't know where to begin if the "duplicate" page doesn't exist in the CMS. The crawler states that this page www. creative-choices.co.uk/industry-insight/inside/creative-writing is the referrer page. Looking at it, only the original page's link is showing on the referrer page, so how did the crawler get to the duplicate page?
Technical SEO | | CreativeChoices0 -
Duplicate Content and Canonical use
We have a pagination issue, which the developers seem reluctant (or incapable) to fix whereby we have 3 of the same page (slightly differing URLs) coming up in different pages in the archived article index. The indexing convention was very poorly thought up by the developers and has left us with the same article on, for example, page 1, 2 and 3 of the article index, hence the duplications. Is this a clear cut case of using a canonical tag? Quite concerned this is going to have a negative impact on ranking, of course. Cheers Martin
Technical SEO | | Martin_S0