Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing index.php
-
I have question for the community and whether or not this is a good or bad idea.
I currently have a Joomla site that displays www.domain.com/index.php in all the URLs with the exception of the home page. I have read that it's better to not have index.php showing in the URL at all. Does it really matter if I have index.php in my URL? I've read that it is a bad practice.
I am thinking about installing the sh404SEF component on my site and removing the index.php. However, I rank pretty high for the keywords I want in Google, Bing and Yahoo. All of the URLs that show up in the searches have index.php as part of the URL.
Has anyone ever used sh404SEF to remove the index.php and how did you overcome not loosing your search engine links? I don't want an existing search showing www.domain.com/index.php/sales and it not linking to the correct page which would now be www.domain.com/sales. I guess I could insert the proper redirects in the htaccess file. But I was hoping to avoid having every page of my site in the htaccess file for redirecting.
Any help or advice appreciated.
-
Add this to your htaccess file (remove the .txt extension from the file in order to use it)
Remove index.php or index.htm/html from URL requests
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index.(php|html?)\ HTTP/
RewriteCond %{REQUEST_URI} !^/administrator
RewriteRule ^([^/]+/)*index.(html?|php)$ http://your_site_URL/$1 [R=301,L]Obviously change the your_site_url to the your domain in http://your_site_URL/$1
Also remove the # before RewriteEngine On to make these changes work.
-
Devanur/Jane,
Thank you for the info.
Dan
-
As Devanur says, this will achieve your goal. It's worth reiterating that there is nothing inherently wrong with /index.php URLs as long as you cannot access the same content without /index.php. For instance, if www.site.com/page1/index.php exists as well as www.site.com/page1/, then this is duplicate content and should be fixed. I imagine this is your current situation because this is most common when /index.php is being added to URLs.
However, if only one version of every page loads and that version has the /index.php extension, this is not automatically bad. It's preferable for the extension not to be there for the sake of URL tidiness and because this does move the content one folder-level away from the root (not a huge issue, but probably best avoided) however.
If you go through 301 redirects to shift the old URLs to the new ones without /index.php, your rankings should not suffer. There might be a little bit of ranking fluctuation as Google indexes the new URLs and acknowledges the redirects, but nothing permanent. It's worth noting that this is not an absolute rule, however, and that there is always a risk of lowered rankings or rankings not returning to what they were before after a 301 redirect though.
Cheers,
Jane
-
Hi, any plugin like sh404SEF will work and accomplish your goal without hurting your rankings as long as it redirects, the index.php URLs to their corresponding without index.php URLs via 301. By the way, you don't need to list all your URLs in .htaccess file to implement this. You can go with pattern match redirection.
Here you go for more:
http://www.askapache.com/htaccess/301-redirect-with-mod_rewrite-or-redirectmatch.html
and
http://www.searchenginepeople.com/blog/htaccess-redirect-rewrite-rules.html
By the way, having index.php in URLs does not affect your SEO efforts directly but by stripping index.php from all the URLs will make them look pretty, clean and a bit user friendly.
Hope it helps. Good Luck to you.
Best regards,
Devanur Rafi
<colgroup><col width="182"></colgroup>
| |
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to check if the page is indexable for SEs?
Hi, I'm building the extension for Chrome, which should show me the status of the indexability of the page I'm on. So, I need to know all the methods to check if the page has the potential to be crawled and indexed by a Search Engines. I've come up with a few methods: Check the URL in robots.txt file (if it's not disallowed) Check page metas (if there are not noindex meta) Check if page is the same for unregistered users (for those pages only available for registered users of the site) Are there any more methods to check if a particular page is indexable (or not closed for indexation) by Search Engines? Thanks in advance!
Intermediate & Advanced SEO | | boostaman0 -
How important is admin-ajax.php?
Hi there! It's been a long time since I last did a technical audit of a site. I've currently playing with the 'fetch as google' tool to find out if we're blocking anything vital. The site is based on Wordpress, and after a recent hacking incident, a previous SEO moved the login portal from domain.com/wp-admin/ to domain.com/pr3ss/wp-admin/ - to stop people finding it. Fair enough. But they then updated the robots.txt file to look like this: User-agent: *
Intermediate & Advanced SEO | | Muhammad-Isap
Disallow: /pr3ss/wp-admin/ Now, some pages are trying to draw on theme elements like: http://www.domain.com/pr3ss/wp-admin/admin-ajax.php
http://www.domain.com/pr3ss/wp-content/themes/bestpracticegroup/images/column_wrapper_bg.png And are naturally being blocked (not that this seems to affect the way pages are rendering in Google's eyes) A good SEO friend of mine has suggested allowing the theme folder, and any sub folders where this becomes an issue. What are your thoughts? Is it even worth disallowing the /pr3ss/wp-admin/ path? Cheers guys and gals! All the best, John. I've found a couple of the theme's0 -
How to do Country specific indexing ?
We are a business that operate in South East Asian countries and have medical professionals listed in Thailand, Philippines and Indonesia. When I go to Google Philippines and check I can see indexing of pages from all countries and no Philippines pages. Philippines is where we launched recently. How can I tell Google Philippines to give more priority to pages from Philippines and not from other countries Can someone help?
Intermediate & Advanced SEO | | ozil0 -
Remove URLs that 301 Redirect from Google's Index
I'm working with a client who has 301 redirected thousands of URLs from their primary subdomain to a new subdomain (these are unimportant pages with regards to link equity). These URLs are still appearing in Google's results under the primary domain, rather than the new subdomain. This is problematic because it's creating an artificial index bloat issue. These URLs make up over 90% of the URLs indexed. My experience has been that URLs that have been 301 redirected are removed from the index over time and replaced by the new destination URL. But it has been several months, close to a year even, and they're still in the index. Any recommendations on how to speed up the process of removing the 301 redirected URLs from Google's index? Will Google, or any search engine for that matter, process a noindex meta tag if the URL's been redirected?
Intermediate & Advanced SEO | | trung.ngo0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Whats the best way to remove search indexed pages on magento?
A new client ( aqmp.com.br/ )call me yestarday and she told me since they moved on magento they droped down more than US$ 20.000 in sales revenue ( monthly)... I´ve just checked the webmaster tool and I´ve just discovered the number of crawled pages went from 3.260 to 75.000 since magento started... magento is creating lots of pages with queries like search and filters. Example: http://aqmp.com.br/acessorios/lencos.html http://aqmp.com.br/acessorios/lencos.html?mode=grid http://aqmp.com.br/acessorios/lencos.html?dir=desc&order=name Add a instruction on robots.txt is the best way to remove unnecessary pages of the search engine?
Intermediate & Advanced SEO | | SeoMartin10 -
Removing Content 301 vs 410 question
Hello, I was hoping to get the SEOmoz community’s advice on how to remove content most effectively from a large website. I just read a very thought-provoking thread in which Dr. Pete and Kerry22 answered a question about how to cut content in order to recover from Panda. (http://www.seomoz.org/q/panda-recovery-what-is-the-best-way-to-shrink-your-index-and-make-google-aware). Kerry22 mentioned a process in which 410s would be totally visible to googlebot so that it would easily recognize the removal of content. The conversation implied that it is not just important to remove the content, but also to give google the ability to recrawl that content to indeed confirm the content was removed (as opposed to just recrawling the site and not finding the content anywhere). This really made lots of sense to me and also struck a personal chord… Our website was hit by a later Panda refresh back in March 2012, and ever since then we have been aggressive about cutting content and doing what we can to improve user experience. When we cut pages, though, we used a different approach, doing all of the below steps:
Intermediate & Advanced SEO | | Eric_R
1. We cut the pages
2. We set up permanent 301 redirects for all of them immediately.
3. And at the same time, we would always remove from our site all links pointing to these pages (to make sure users didn’t stumble upon the removed pages. When we cut the content pages, we would either delete them or unpublish them, causing them to 404 or 401, but this is probably a moot point since we gave them 301 redirects every time anyway. We thought we could signal to Google that we removed the content while avoiding generating lots of errors that way… I see that this is basically the exact opposite of Dr. Pete's advice and opposite what Kerry22 used in order to get a recovery, and meanwhile here we are still trying to help our site recover. We've been feeling that our site should no longer be under the shadow of Panda. So here is what I'm wondering, and I'd be very appreciative of advice or answers for the following questions: 1. Is it possible that Google still thinks we have this content on our site, and we continue to suffer from Panda because of this?
Could there be a residual taint caused by the way we removed it, or is it all water under the bridge at this point because Google would have figured out we removed it (albeit not in a preferred way)? 2. If there’s a possibility our former cutting process has caused lasting issues and affected how Google sees us, what can we do now (if anything) to correct the damage we did? Thank you in advance for your help,
Eric1 -
Should pages of old news articles be indexed?
My website published about 3 news articles a day and is set up so that old news articles can be accessed through a "back" button with articles going to page 2 then page 3 then page 4, etc... as new articles push them down. The pages include a link to the article and a short snippet. I was thinking I would want Google to index the first 3 pages of articles, but after that the pages are not worthwhile. Could these pages harm me and should they be noindexed and/or added as a canonical URL to the main news page - or is leaving them as is fine because they are so deep into the site that Google won't see them, but I also won't be penalized for having week content? Thanks for the help!
Intermediate & Advanced SEO | | theLotter0