What is the correct htaccess code for Canonicalization?
-
I've been working on a clients site and put up the following but when I check back on seomoz i have over 3000 errors and notices and its been crawling a silly amount of pages that don't exist!!
ErrorDocument 404 /404.html
Options +FollowSymLinksDirectoryIndex index.html RewriteEngine OnRewriteBase / RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index.html\ HTTP/ RewriteRule ^index.html$ http://hiperformanceautocentres.co.uk/ [R=301,L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.html [L]
-
It would be a good starting place for sites that are created in a similar way.
-
Should this basically be the htaccess starting point for every website that I create going forward?
-
Thats great thanks for that Chris.
-
This basically says change anything ending index.html to end / using a 301 redirect
<code>RewriteCond %{THE_REQUEST} ^.*\/index\.html?\ HTTP/</code>
<code>RewriteRule ^(.*)index.html?$ "/$1" [R=301,L]</code>
This says redirect anything that starts http://www.domain...... to just http://domain......
<code>RewriteCond %{HTTP_HOST} ^hiperformanceautocentres.co.uk [NC]``` RewriteRule ^(.*)$ http://www.hiperformanceautocentres.co.uk/$1 [L,R=301] ```</code>
-
Okay then you want
ErrorDocument 404 /404.html
Options +FollowSymLinksDirectoryIndex index.html
<code>RewriteEngine on</code>
<code>RewriteCond %{THE_REQUEST} ^.*/index.html?\ HTTP/</code>
<code>RewriteRule ^(.*)index\.html?$ "/$1" [R=301,L]</code>
<code>RewriteCond %{HTTP_HOST} ^hiperformanceautocentres.co.uk [NC]```
RewriteRule ^(.*)$ http://www.hiperformanceautocentres.co.uk/$1 [L,R=301] -
oops - guess i've knackered this page with that code!!
Could you explain what all the code means in detail? I just copied and pasted the original!!
-
-
You haven't redirected www and non www so you need to add:
RewriteCond %{HTTP_HOST} ^hiperformanceautocentres.co.uk [NC] RewriteRule ^(.*)$ http://www.hiperformanceautocentres.co.uk/$1 [L,R=301]
What other errors are you getting? 3000 seems a lot!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Htaccess and robots.txt and 902 error
Hi this is my first question in here I truly hope someone will be able to help. It's quite a detailed problem and I'd love to be able to fix it through your kind help. It regards htaccess files and robot.txt files and 902 errors. In October I created a WordPress website from what was previously a non-WordPress site it was quite dated. I had built the new site on a sub-domain I created on the existing site so that the live site could remain live whilst I created on the subdomain. The site I built on the subdomain is now live but I am concerned about the existence of the old htaccess files and robots txt files and wonder if I should just delete the old ones to leave the just the new on the new site. I created new htaccess and robots.txt files on the new site and have left the old htaccess files there. Just to mention that all the old content files are still sat on the server under a folder called 'old files' so I am assuming that these aren't affecting matters. I access the htaccess and robots.txt files by clicking on 'public html' via ftp I did a Moz crawl and was astonished to 902 network error saying that it wasn't possible to crawl the site, but then I was alerted by Moz later on to say that the report was ready..I see 641 crawl errors ( 449 medium priority | 192 high priority | Zero low priority ). Please see attached image. Each of the errors seems to have status code 200; this seems to be applying to mainly the images on each of the pages: eg domain.com/imagename . The new website is built around the 907 Theme which has some page sections on the home page, and parallax sections on the home page and throughout the site. To my knowledge the content and the images on the pages are not duplicated because I have made each page as unique and original as possible. The report says 190 pages have been duplicated so I have no clue how this can be or how to approach fixing this. Since October when the new site was launched, approx 50% of incoming traffic has dropped off at the home page and that is still the case, but the site still continues to get new traffic according to Google Analytics statistics. However Bing Yahoo and Google show a low level of Indexing and exposure which may be indicative of the search engines having difficulty crawling the site. In Google Analytics in Webmaster Tools, the screen text reports no crawl errors. W3TC is a WordPress caching plugin which I installed just a few days ago to speed up page speed, so I am not querying anything here about W3TC unless someone spots that this might be a problem, but like I said there have been problems re traffic dropping off when visitors arrive on the home page. The Yoast SEO plugin is being used. I have included information about the htaccess and robots.txt files below. The pages on the subdomain are pointing to the live domain as has been explained to me by the person who did the site migration. I'd like the site to be free from pages and files that shouldn't be there and I feel that the site needs a clean up as well as knowing if the robots.txt and htaccess files that are included in the old site should actually be there or if they should be deleted... ok here goes with the information in the files. Site 1) refers to the current website. Site 2) refers to the subdomain. Site 3 refers to the folder that contains all the old files from the old non-WordPress file structure. **************** 1) htaccess on the current site: ********************* BEGIN W3TC Browser Cache <ifmodule mod_deflate.c=""><ifmodule mod_headers.c="">Header append Vary User-Agent env=!dont-vary</ifmodule>
Moz Pro | | SEOguy1
<ifmodule mod_filter.c="">AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json
<ifmodule mod_mime.c=""># DEFLATE by extension
AddOutputFilter DEFLATE js css htm html xml</ifmodule></ifmodule></ifmodule> END W3TC Browser Cache BEGIN W3TC CDN <filesmatch ".(ttf|ttc|otf|eot|woff|font.css)$"=""><ifmodule mod_headers.c="">Header set Access-Control-Allow-Origin "*"</ifmodule></filesmatch> END W3TC CDN BEGIN W3TC Page Cache core <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteRule .* - [E=W3TC_ENC:_gzip]
RewriteCond %{HTTP_COOKIE} w3tc_preview [NC]
RewriteRule .* - [E=W3TC_PREVIEW:_preview]
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} =""
RewriteCond %{REQUEST_URI} /$
RewriteCond %{HTTP_COOKIE} !(comment_author|wp-postpass|w3tc_logged_out|wordpress_logged_in|wptouch_switch_toggle) [NC]
RewriteCond "%{DOCUMENT_ROOT}/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" -f
RewriteRule .* "/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" [L]</ifmodule> END W3TC Page Cache core BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]</ifmodule> END WordPress ....(((I have 7 301 redirects in place for old page url's to link to new page url's))).... #Force non-www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.domain.co.uk [NC]
RewriteRule ^(.*)$ http://domain.co.uk/$1 [L,R=301] **************** 1) robots.txt on the current site: ********************* User-agent: *
Disallow:
Sitemap: http://domain.co.uk/sitemap_index.xml **************** 2) htaccess in the subdomain folder: ********************* Switch rewrite engine off in case this was installed under HostPay. RewriteEngine Off SetEnv DEFAULT_PHP_VERSION 53 DirectoryIndex index.cgi index.php BEGIN WordPress <ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /WPnewsiteDee/
RewriteRule ^index.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /subdomain/index.php [L]</ifmodule> END WordPress **************** 2) robots.txt in the subdomain folder: ********************* this robots.txt file is empty **************** 3) htaccess in the Old Site folder: ********************* Deny from all *************** 3) robots.txt in the Old Site folder: ********************* User-agent: *
Disallow: / I have tried to be thorough so please excuse the length of my message here. I really hope one of you great people in the Moz community can help me with a solution. I have SEO knowledge I love SEO but I have not come across this before and I really don't know where to start with this one. Best Regards to you all and thank you for reading this. moz-site-crawl-report-image_zpsirfaelgm.jpg0 -
Is the Moz Ranking Report correct?
Hello here, today I got my weekly report from Moz by email and it reported a strong decline of rankings on some of our major keywords. Then I went to check on Google personally, and instead I found my pages ranked much higher than reported by Moz (please note that my results wasn't personalized...). I also use a desktop program to check my rankings, and what I found personally on Google corresponded to what my desktop program reported me. Here is an example: my website is virtualsheetmusic.com, and Moz reported a rank NOT in top 50 for the following keyword: "moonlight sonata sheet music" Whereas we are actually on the 19th spot. Another example for the keyword "czardas violin sheet music", Moz reports NOT in top 50, whereas we are actually at the 5th spot on the first page of Google results! Why's this? Anyone can explain? Is there anything wrong with Moz ranking report?
Moz Pro | | fablau0 -
Redirected pages still sending response code 200
SEO Moz tool reports missing title tags on all the links that have been redirected. E.g. this page: http://www.imoney.my/ms/personal-loan When I check the response code on the page with redirect checker it shows code 200 (page exists). Has it happened to anyone else? How can a redirected page send a 200 code?
Moz Pro | | imoney0 -
After I make corrections of my crawl diagnostics report, how can I tell is those corrections "took". Is there a way to immediatly refresh that report. Will it eventually refresh?'
I have made corrections to the crawl diagnostics report. Can I refresh this report? I would like to see if my corrections were correct. Thanks for your anticipated answer!
Moz Pro | | Bob550 -
Confused about canonicalization
Hello Guys, I have just started to use SEOMOZ and I am trying as much as possible to follow the advise from the initial scan and suggestions I received from SEOMOZ. However, it appears that the first changes I made has somehow made my website to disappear on Google and other search engines. Canonicalization The first changes I made was "Canonicalization" of my domain name (redirecting to a single dominant version) from the instructions here: http://www.seomoz.org/learn-seo/canonicalization. So I redirected and changed my domain name from "domainname.com" to "www.domainname.com" I did check my listing in Google before these updates and Google have my website down as "www.domainname.com" My keywords that were previously performing well before these recent updates have now disappeared which is causing me some great level of frustration and I am really not sure weather to continue with the instructions from SEOMOZ or not. However, it could be that I am being impatient or checking too soon. I'd appreciate some form of advise on what to do? Many thanks.
Moz Pro | | abbeylinks20020 -
I want to create a report of only de duplicate content pages as a csv file so i can create a script to canonicalize them.
I want to create a report of only de duplicate content pages as a csv file so i can create a script to canonicalize them. So i get something like: http://example.com/page1, http://example.com/page2, http://example.com/page3, http://example.com/page4, Because I now have to open each in "Issue: Duplicate Page Content", and this takes a lot of time. The same for duplicate page title.
Moz Pro | | nvs.nim0 -
Bad code on Learn Seo Redirection info Page
Is it just me, or is the Redirection resource page missing the exclamation point (!) in this code. If so, this could really mess someone's site up if they copy and paste. http://www.seomoz.org/learn-seo/redirection http://screencast.com/t/n7lknZ32G9xF Redirecting Canonical Hostnames: The original developers at SEOmoz needed to redirect any requests that do not start with www.seomoz.org to make sure they included the www. They did this not only because it looks better, but to avoid common canonicalization errors. Redirect: http://seomoz.org/To: http://www.seomoz.org/ Redirect: http://mail.seomoz.org/To: http://www.seomoz.org Redirect: http://seomoz.org/somefile.phpTo: http://www.seomoz.org/somefile.php Solution: Add the following directive: RewriteCond %{HTTP_HOST} ^seomoz.org [NC]RewriteRule (.*) http://www.seomoz.org/$1 [L,R=301] Explanation: This directive tells apache to examine the host the visitor is accessing (in this case: seomoz.org), and if it does not equal www.seomoz.org redirect to www.seomoz.org. The exclamation point (!) in front of www.seomoz.org negates the comparison, saying “if the host IS NOT www.seomoz.org, then perform RewriteRule.” In our case RewriteRule redirects them to www.seomoz.org while preserving the exact file they were accessing in a back-reference. *emphasis added by me
Moz Pro | | squareplug0