Mod Rewrite / .htaccess avoid duplicate content
-
I have been searching and testing for hours but cannot find a solution. I am able to get a URL to display with out the file exntension.
i.e domain.com/file instead of domain.com/file.php
The problem is both versions of the URL above work, therefore a duplicate content issue. How can I force the URL with the file extension not to resolve and give a 404 error? Or just redirect to the non extension URL?
IF it helps here is my code.
Options +FollowSymLinks
RewriteEngine OnRewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+)$ $1.php [L,QSA] -
Hi Erik,
No problem, glad I could help
To answer your question, No it doesn't matter which you use because the end result will be re-written to remove the file extension and add a forward slash at the end.
For consistency I would suggest having it without the .php inside your content though. If nothing else it would save you the pain of having to remove .php from your content if you moved to a content management system in the future.
If you've got any other questions let me know, and I'll be happy to help.
Ben
-
Didnt say thanks before, so thank you. One question I did not think of. Should the internal linking of the site be to the file name with extension or no extension?
I think it should be without extension but just want to double check.
-
Hi Ben. I tried this code on another hosting account and it did worked. The first account was a VPS account from Godaddy. The second was a shared account from the same hosting company. Im not sure why it works on one and not on the other. I did see the mod_rewrite option enabled.
-
Just tried this on my development server and it worked fine:
RewriteBase / RewriteEngine on RewriteCond %{HTTP_HOST} ^test.local RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP RewriteRule (.).php$ $1 [R=301]
remove index RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_URI} /$ RewriteRule (.)/ $1 [R=301] # add .php to access file, but don't redirect RewriteCond %{REQUEST_FILENAME}.php -f RewriteCond %{REQUEST_URI} !/$RewriteRule (.) $1.php [L]
The dev URL is test.local so you would want to change this to www.yourdomain.co.ukI had a page called about.php if I entered http://test.local/about.php or http://test.local/about it would show http://test.local/about in the address bar
-
Hi Ben. Thanks for your help but this does not work for some reason. Im testing it on an old site I have that is html and I just replaced php for html but both URL's still resolves.
-
Good answer Ben.
My main site is my own CMS, that I built 10 years ago, so after I added a lot of things to the .htaccess file and it became too large, I just moved the handling inside the control program, that only looks up filed URLs when they are broken. This processing is fast, but if there was any degradation, it only affects the broken URLs.
Speaking of broken URLs, I was getting a few 400 return codes and it seems the webserver handles those, so you have no chance to handle it in .htaccess. So the wat to handle that is with a 400 handler - that on cpanel sites just needs a 400.shtml file, that you can customize.
- you get a 400 response if you request a URL with a % symbol on the end, and some other site did that, thanks very much, and then google decided it would be a great thing to index.
-
Try using this instead:
<code>RewriteBase /</code>
<code># remove .php; use THE_REQUEST to prevent infinite loops
RewriteCond %{HTTP_HOST} ^www.domain.com
RewriteCond %{THE_REQUEST} ^GET\ (.).php\ HTTP
RewriteRule (.).php$ $1 [R=301]remove index
RewriteRule (.*)index$ $1 [R=301]
remove slash if not directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} /$
RewriteRule (.*)/ $1 [R=301]add .php to access file, but don't redirect
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{REQUEST_URI} !/$
RewriteRule (.*) $1.php [L]</code>
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Shopify Duplicate Content in products
Hello Moz Community, New to Moz and looking forward to beginning my journey towards SEO education and improving our clients' sites. Our client's website is a Shopify store. https://spiritsofthewestcoast.com/ Our first Moz reports show 686 duplicate content issues. I will show the first 4 as examples. https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/haida-eagle-teardrop-earrings https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/haida-orca-silver-earrings https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/silver-oval-earrings https://spiritsofthewestcoast.com/collections/native-earrings-and-studs-in-silver-and-gold/products/haida-eagle-spirit-silver-earrings As you can see, URL titles are unique. But I know that the content in each of those products have very similar product descriptions but not exactly. But since they have been flagged as a site issue by Moz, I am guessing that the content is 95% duplicate. So can a rel=canonical be the right solution for this type of duplicate content? Or should I be considering adding new content to each of 686 products to drop below the 95% threshold? Or another solution that I may not be aware of. Thanks in advance for your assistance and expertise! Sean
Technical SEO | | TheUpdateCompany1 -
Noticed a lot of duplicate content errors...
how do I fix duplicate content errors on categories and tags? I am trying to get rid of all the duplicate content and I'm really not sure how to. Any suggestions, advice and/or help on this would be greatly appreciated. I did add the canonical url through the SEO Yoast plugin, but I am still seeing errors. I did this on over 200 pages. Thanks for any assistance in advance. Jaime
Technical SEO | | slapshotstudio0 -
Wordpress tags and duplicate content?
I've seen a few other Q&A posts on this but I haven't found a complete answer. I read somewhere a while ago that you can use as many tags as you would like. I found that I rank for each tag I used. For example, I could rank for best night clubs in san antonio, good best night clubs in san antonio, great best night clubs in san antonio, top best night clubs in san antonio, etc. However, I now see that I'm creating a ton of duplicate content. Is there any way to set a canonical tag on the tag pages to link back to the original post so that I still keep my rankings? Would future tags be ignored if I did this?
Technical SEO | | howlusa0 -
Duplicate Content Due to Pagination
Recently our newly designed website has been suffering from a rankings loss. While I am sure there are a number of factors involved, I'd like to no if this scenario could be harmful... Google is showing a number of duplicate content issues within Webmaster Tools. Some of what I am seeing is duplicate Meta Titles and Meta Descriptions for page 1 and page 2 of some of my product category pages. So if a category has many products and has 4 pages, it is effectively showing the same page title and meta desc. across all 4 pages. I am wondering if I should let my site show, say 150 products per page to get them all on one page instead of the current 36 per page. I use the Big Commerce platform. Thank you for taking the time to read my question!
Technical SEO | | josh3300 -
Content and url duplication?
One of the campaign tools flags one of my clients sites as having lots of duplicates. This is true in the sense the content is sort of boiler plate but with the different countries wording changed. The is same with the urls but they are different in the sense a couple of words have changed in the url`s. So its not the case of a cms or server issue as this seomoz advises. It doesnt need 301`s! Thing is in the niche, freight, transport operators, shipping, I can see many other sites doing the same thing and those sites have lots of similar pages ranking very well. In fact one site has over 300 keywords ranked on page 1-2, but it is a large site with an 12yo domain, which clearly helps. Of course having every page content unique is important, however, i suppose it is better than copy n paste from other sites. So its unique in that sense. Im hoping to convince the site owner to change the content over time for every country. A long process. My biggest problem for understanding duplication issues is that every tabloid or broadsheet media website would be canned from google as quite often they scrape Reuters or re-publish standard press releases on their sites as newsworthy content. So i have great doubt that there is a penalty for it. You only have to look and you can see media sites duplication everywhere, everyday, but they get ranked. I just think that google dont rank the worst cases of spammy duplication. They still index though I notice. So considering the business niche has very much the same content layout replicated content, which rank well, is this duplicate flag such a great worry? Many businesses sell the same service to many locations and its virtually impossible to re write the services in a dozen or so different ways.
Technical SEO | | xtopher660 -
Squarespace Duplicate Content Issues
My site is built through squarespace and when I ran the campaign in SEOmoz...its come up with all these errors saying duplicate content and duplicate page title for my blog portion. I've heard that canonical tags help with this but with squarespace its hard to add code to page level...only site wide is possible. Was curious if there's someone experienced in squarespace and SEO out there that can give some suggestions on how to resolve this problem? thanks
Technical SEO | | cmjolley0 -
Help With Joomla Duplicate Content
Need another set of eyes on my site from someone with Joomla experience. I'm running Joomla 2.5 (latest version) and SEOmoz is giving my duplicate content errors on a lot of my pages. I checked my sitemap, I checked my menus, and I checked my links, and I can't figure out how SEOmoz is finding the alternate paths to my content. Home page is: http://www.vipfishingcharters.com/ There's only one menu at the top. Take the first link "Dania Beach" under fishing charters for example. This generates the SEF url: http://www.vipfishingcharters.com/fishing-charters/broward-county/dania-beach-fishing-charters-and-fishing-boats.html Somehow SEOmoz (and presumably all other robots) are finding duplicate content at: http://www.vipfishingcharters.com/broward-county/dania-beach-fishing-charters-and-fishing-boats.html SEOmoz says the referrer is the homepage/root. The first URL is constructed using the menu aliases. The second one is constructed using the Joomla category and article alias. Where is it getting this and how can I stop it? <colgroup><col width="601"></colgroup>
Technical SEO | | NoahC0 -
Duplicate content question with PDF
Hi, I manage a property listing website which was recently revamped, but which has some on-site optimization weaknesses and issues. For each property listing like http://www.selectcaribbean.com/property/147.html there is an equivalent PDF version spidered by google. The page looks like this http://www.selectcaribbean.com/pdf1.php?pid=147 my question is: Can this create a duplicate content penalty? If yes, should I ban these pages from being spidered by google in the robots.txt or should I make these link nofollow?
Technical SEO | | multilang0