How to properly abandon mod rewrite?
-
Hi,
We've done mod-rewrite to our .php files to show .htm files several years ago for SEO purposes.
My question is, doing this has become a hassle for adding new pages, etc. and I'd like to make a clean break with the .htm and move to their real file names and or directories (e.g. company.htm --> /company/ ).
What kind of ranking penalty am I looking at if we switch? We're a small company with billion dollar competitors so a rank loss would be fairly devastating.
I assume I'd need to do 301 redirects for all of the old file names (obviously yes for the change from page to directories) but for each individual page?
Thanks,
Matt
-
Maybe I am missing something, but wouldn't a rewrite that removes all the .php instances solve this problem site-wide? Or are you doing it file by file and leaving some pages as-is?
Something like this in your .htaccess should do it:
to remove php:
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.*)$ $1.phpor to change to htm site-wide:
RewriteEngine on
RewriteBase /
RewriteRule ^([^.]+).htm$ $1.php [L]Another way is to name the files with .htm and use this in htaccess to send htm through your PHP handler:
AddType application/x-httpd-php htm html php
AddHandler application/x-httpd-php .htm .htmlIf you use rewrites like those, you won't be able to also use 301s for the affected URIs as it would probably create a redirect loop.
In a perfect world, you should 301 redirect any page that changes if you stop using the php to htm rewrites. If there are simply too many for this to be practical, you could just redirect the most important pages and leave out any that may not have very many inbound links pointing to it. What I will often do in cases like this is set up the redirects for the important pages, then keep an eye on Google Webmaster Tools. Webmaster Tools will show you the 404 errors and where they found the links. Then you can pick the ones that have a lot of links and 301 those a few at a time. Tedious, but if you do that in your spare time, eventually you will get them all fixed.
If you can implement a "set it and forget it" rewrite so you don't have to add a new rewrite for each file, you won't have to worry about 301 redirecting all those old pages.
Otherwise, there really shouldn't be any major loss of rank from dropping the file types.
All that said, there isn't much of a reason to remove the file type extensions, other than to shorten addresses by a few characters and just look a little cleaner.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
An article we wrote was published on the Daily Business Review, we'd like to post it on our site. What is the proper way?
Part 1
Technical SEO | | peteboyd
We wrote an article and submitted it to the Daily Business Review. They published the article on their website. We want to also post the article on our website for our users but we want to make sure we are doing this properly. We don't want to be penalized for duplicating content. Is this the correct way to handle this scenario written below? We added a rel="canonical" to the blog post (on our website). The rel="canonical" is set to the Daily Business Review URL where the article was originally published. At the end of the blog post we wrote. "This article was originally posted on The Daily Business Review." and we link to the original post on the Daily Business Review. Should we be setting the blog post (on our website) to be a "noindex" or rel="canonical" ? Part 2 Our company was mentioned in a number of articles. We DID NOT write those articles, we were only mentioned. We have also posted those same articles on our website (verbatim from the original article). We want to show our users that we have been mentioned in highly credited articles. All of these articles were posted on our website and are set to be a "noindex". Is that the correct thing to do? Should we be using a rel="canonical" instead and pointing to the original article URL? Thanks in advance MOZ community for your assistance! We tried to do the leg work of our own research for the answers but couldn't find the exact same scenario that we are encountering**.**0 -
How to use rel="alternate" properly for mobile directory.
Hey everyone, For the URL - http://www.absoluteautomation.ca/dakota-alert-dcpa-p/dkdcpa2500.htm - I have the following tags in the header: rel="canonical" href="http://www.absoluteautomation.ca/dakota-alert-dcpa-p/dkdcpa2500.htm" /> rel="alternate" media="only screen and (max-width: 640px)" href="http://www.absoluteautomation.ca/mobile/Product.aspx?id=37564" /> Yes Google WMT is reading these as duplicate pages with duplicate titles, meta descriptions etc. How can I fix this? Thanks!
Technical SEO | | absoauto0 -
Proper method of consolidating https to http?
A client has an application area of the site (a directory) that has a form and needs to be secured with ssl. The vast majority of the site is static, and does not need to be secured. We have experienced situations where a visitor navigates the site as https which then throws security errors. We want to keep static visitors on http; (and crawlers) and only have visits to the secure area display as ssl. How is this best accomplished? Our developer wants to add a rule to the global configuration file in php that uses a 301 redirect to ensure static pages are accessed as http, and the secure directory is accessed as https. Is the the proper protocol? Are there any SEO considerations we should make? Thanks.
Technical SEO | | seagreen0 -
Do i have my robots.txt file set up properly
Hi, just doing some seo on my site and i am not sure if i have my robots file set correctly. i use joomla and my website is www.in2town.co.uk. here is my robots file, does this look correct to you User-agent: *
Technical SEO | | ClaireH-184886
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ many thanks1 -
How should I properly setup my .htaccess file?
I have searched google for 'how to setup .htaccess file' and it seems that every website has some variation. For example: RewriteCond %{HTTP_HOST} ^yoursite.com RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=permanent,L] On SEOMOZ someone posted this: RewriteCond %{HTTP_HOST} !^www.yoursite.com [NC] RewriteRule (.*) http://www.yoursite.com/$1 [L,R=301] On yet another website, I found this: RewriteEngine On RewriteCond %{HTTP_HOST} !^your-site.com$ [NC] RewriteRule ^(.*)$ http://your-site.com/$1 [L,R=301] As you can see there are slight differences. Which one do I use? I'm on Apache CentOS and I have HTML5 websites and several Joomla! wesites. Would the HTACCESS File be different for both?
Technical SEO | | maxduveen0 -
How to find original URLS after Hosting Company added canonical URLs, URL rewrites and duplicate content.
We recently changed hosting companies for our ecommerce website. The hosting company added some functionality such that duplicate content and/or mirrored pages appear in the search engines. To fix this problem, the hosting company created both canonical URLs and URL rewrites. Now, we have page A (which is the original page with all the link juice) and page B (which is the new page with no link juice or SEO value). Both pages have the same content, with different URLs. I understand that a canonical URL is the way to tell the search engines which page is the preferred page in cases of duplicate content and mirrored pages. I also understand that canonical URLs tell the search engine that page B is a copy of page A, but page A is the preferred page to index. The problem we now face is that the hosting company made page A a copy of page B, rather than the other way around. But page A is the original page with the seo value and link juice, while page B is the new page with no value. As a result, the search engines are now prioritizing the newly created page over the original one. I believe the solution is to reverse this and make it so that page B (the new page) is a copy of page A (the original page). Now, I would simply need to put the original URL as the canonical URL for the duplicate pages. The problem is, with all the rewrites and changes in functionality, I no longer know which URLs have the backlinks that are creating this SEO value. I figure if I can find the back links to the original page, then I can find out the original web address of the original pages. My question is, how can I search for back links on the web in such a way that I can figure out the URL that all of these back links are pointing to in order to make that URL the canonical URL for all the new, duplicate pages.
Technical SEO | | CABLES0 -
How to skip all rewrite rules
My web host account allows me to have multiple domain names. Internally, the first domain is the main domain, and the additional domains are virtual domains, but externally, the intent is for each domain to appear as a unique domain. When accessing a virtual domain, the server first processes the main .htaccess file, and then processes the .htaccess file for that virtual domain. I'm sure this is a common setup, and this is not unique to my web host. Due the main .htaccess file, references to virtual.com are rewritten as main.com/virtual. The web pages are displayed correctly, but of course, this rewrite is not what is desired. What is the common solution? For example, is there a conditional rewrite rule that says ignore the rest of the rewrite rules in this .htacces file? Best,
Technical SEO | | ChristopherGlaeser
Christopher0 -
How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
Today's sitemap webinar made me think about the disallow feature, seems opposite of sitemaps, but it also seems both are kind of ignored in varying ways by the engines. I don't need help semantically, I got that part. I just can't seem to find a contemporary answer about what should be blocked using the robots.txt file. For example, I have folders containing site comps for clients that I really don't want showing up in the SERPS. Is it better to not have these folders on the domain at all? There are also security issues I've heard of that make sense, simply look at a site's robots file to see what they are hiding. It makes it easier to hunt for files when they know the directory the files are contained in. Do I concern myself with this? Another example is a folder I have for my xml sitemap generator. I imagine google isn't going to try to index this or count it as content, so do I need to add folders like this to the disallow list?
Technical SEO | | SpringMountain0