Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How do you disallow HTTPS?
-
I currently have a site (startuploans.org) that runs everything as http, recently we decided to start an online application to process loan apps. Now, for one certain section we configured ssl to work (https://www.startuploans.org/secure/).
If I go to the HTTPS url for any of my other pages they show up...I was going to just 301 everything from https but because it is in a subdirectiory I can't...
Also, canonical URL's won't work either because it's a totally different system and the pages are generated in an odd manor.
It's really just 1 page that needs to be disallowed..
Is there any way to disallow all HTTPS requests from robots.txt while keeping all the HTTP requests working as normal?
-
Hi Rick,
Your first thought was correct. If you apply the noindex meta tag to every page in the secure part of the site, then all of those pages will be de-indexed and you will have no duplicate content problem.
For Wordpress, you just need to install a plugin that allows you to edit and apply page elements and meta tags. My preference is Yoast SEO. If you do a plugin search from your dashboard you will find it.
Hope that helps,
Sha
-
Perfect. This is the answer I was looking for...I will just use the meta tag globally in HTTPS....BUT...what about the fact that my entire site is duplicated in HTTPS?
It's all good for the /secure/ part, but what about my Wordpress install...how do I handle that? Maybe my best option is to just load 2 different robots.txt files...
-
Hi Rick,
If you wish to use the robots.txt method to disallow all or part of your site's https protocol, you simply need to load two separate robots.txt files.
The http and https protocols are basically viewed by bots as if they were two completely separate root domains (which I guess you already know as you have mentioned the fact that port 443 is used for the secure protocol).
Google's advice is that to use this method, you should have a separate robots.txt file for each protocol with code as follows:
For your http protocol (http://www.startuploans.org/robots.txt
User-agent: *
Allow: /For the https protocol (https://www.startuploans.org/robots.txt
User-agent: *
Disallow: /However, blocking crawlers with robots.txt is not the most reliable method for excluding pages from Search engines. The reason for this is that the page will continue to be indexed if it happens to be found via a link from another page. Basically, the robots.txt is the sign on the front door that says "Please stay out of our house", but it is never seen by the people who enter via the rear exit or climb in a window!
The most reliable method of excluding pages is to add the noindex meta tag as suggested by MagentoWebDeveloper and Alan.When a bot encounters the noindex meta tag it will send a signal to the search engine to de-index the page and there is no further problem.
I would generally use noindex, follow rather than noindex, nofollow as the nofollow tag will stop the flow of link value through your site. In most cases, as long as the noindex is in place, there is no reason to be worried about the links on the pages being followed.
You should NEVER use both methods at the same time.
Hope that helps,
Sha
-
I agree. Best practices dictate that the proper answer is to block the entire folder from indexing.
-
Why not just NO INDEX / NO FOLLOW the page? What is the reason behind this? Do you want Google not to index your https page? Duplicate content? All checkouts have https.
-
I should have added that -the code above goes in the htaccess...that code would deliver two different robots.txt files based on if it's port 443 (secure) or the normal robots.txt file if it's any other port (normal).
Is there any easier way? I feel like one misstep on this and I could block bots from my site.
-
Nope...thanks though Code is no problem for us...it's just a technical question. Here is what I want:
I want to restrict robots from the HTTPS version (secure) of my site while leaving the HTTP version (unsecure) perfectly normal and accessible by bots.
Basically what I am asking is..is this the best way (below)? Is there a simpler way...to my knowledge robots.txt doesn't support protocols so doing something like disallow:https://......yada yada won't work.
RewriteEngine on
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt [L] -
Hello Rick,
First caveat is I am not sure what you want to accomplish: You want it so that once the app is done, the person is no longer in https:// ?? If that is it, then while I am not sure I will be able to help, I want to clarify the issue.
Currently, you have one page that is https: and that is your loan app page with url of https://startuploans.org/secure/site/step1 (I did not get a step two on my test, but the next page was https://startuploans.org/secure/step3.) You want a person to finish the app, and then not be in https when they return to the site?
I am not a coder per se, but I am wondering if y ou change the target on the menu link to the secure pages to open in a new window there would be no option to go back. once finished, page 3 have an option to close to secure my information. Then, they are left at the page they were on before going to application.
Now, if none of this was what you wanted, I owe you a beer.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect to http to https - Pros and Cons
Hi, I know its best practice to redirect a website from http to https, instead of having many entry point to your website. When a website has been running for a long time on http and https, what are the SEO Pros and Cons of implementing a redirect from Http to Https?
Technical SEO | | FreddyKgapza1 -
Disallow wildcard match in Robots.txt
This is in my robots.txt file, does anyone know what this is supposed to accomplish, it doesn't appear to be blocking URLs with question marks Disallow: /?crawler=1
Technical SEO | | AmandaBridge
Disallow: /?mobile=1 Thank you0 -
Move a Wordpress Site to HTTPS with Bluehost
HI Guys, do you think that the following guide is enoght to move a bluehost wordpress site to https in a seo best practive way? https://www.shoutmeloud.com/free-ssl-certificate-bluehost-hosting.html Basically their steps are: Install SSL on Bluehost panel Install Really Simple SSL Wp Plugin Edit Your .htacess File & Add The Code For HTTP To HTTPS Redirection Update All HTTP URLs In Database To HTTPS Using Search and Replace Plugin Use Broken Link Checker plugin & use its redirection module to find links to 3rd party sites with HTTP that should now be HTTPS. Last thing to do Submit your new HTTPS site to Google Search Console & submit your sitemap. Update your profile link on Google Analytics. Update your website links on social media profiles & anywhere else they exist. This step you can do in pieces in the coming days. Read this guide to learn more about HTTP to HTTPS migration & fixing mixed content. If you disabled Who.Is guard for your domain name, you can enable it now. Do you know a better practical guide for wordrpess? in term of usefull plugins to handle the migration? Tx to everyone!
Technical SEO | | Dreamrealemedia0 -
Proper 301 redirect code for http to https
I see lots of suggestions on the web for forwarding http to https. I've got several existing sites that want to take advantage of the SSL boost for SEO (however slight) and I don't want to lose SEO placements in the process. I can force all pages to be viewed through the SSL - that's no problem. But for SEO reasons, do I need to do a 301 redirect line of code for every page in the site to the new "https" version? Or is there a way to catch all with one line of code that Google, etc. will recognize & honor?
Technical SEO | | wcksmith10 -
How to change 302 redirect from http to https
Hi gang. Our site currently has a 302 redirect from the HTTP version of the homepage to the HTTPS version of the homepage. I understand this really should be changed to a 301 redirect but I'm having a little trouble figuring out exactly how this should be done. Some places on the internet are telling me I can edit our htaccess file to specify the type of redirect, however our htaccess file seems to be missing some of the information in theirs. Can anyone tell me what needs to be changed in the htaccess file - or if there's a simpler way to change the 302 to a 301? Many thanks 🙂 htaccess: BEGIN WordPress RewriteEngine On RewriteBase / RewriteRule ^index.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] END WordPress EXPIRES CACHING ExpiresActive On ExpiresByType image/jpg "access plus 6 months" ExpiresByType image/jpeg "access plus 6 months" ExpiresByType image/gif "access plus 6 months" ExpiresByType image/png "access plus 6 months" ExpiresByType text/css "access plus 10 days" ExpiresByType application/pdf "access plus 10 days" ExpiresByType application/x-shockwave-flash "access plus 10 days" ExpiresByType image/x-icon "access plus 6 months" ExpiresDefault "access plus 2 days" EXPIRES CACHING
Technical SEO | | davedon0 -
Disallow: /404/ - Best Practice?
Hello Moz Community, My developer has added this to my robots.txt file: Disallow: /404/ Is this considered good practice in the world of SEO? Would you do it with your clients? I feel he has great development knowledge but isn't too well versed in SEO. Thank you in advanced, Nico.
Technical SEO | | niconico1011 -
Duplicate content and http and https
Within my Moz crawl report, I have a ton of duplicate content caused by identical pages due to identical pages of http and https URL's. For example: http://www.bigcompany.com/accomodations https://www.bigcompany.com/accomodations The strange thing is that 99% of these URL's are not sensitive in nature and do not require any security features. No credit card information, booking, or carts. The web developer cannot explain where these extra URL's came from or provide any further information. Advice or suggestions are welcome! How do I solve this issue? THANKS MOZZERS
Technical SEO | | hawkvt10 -
Can I Disallow Faceted Nav URLs - Robots.txt
I have been disallowing /*? So I know that works without affecting crawling. I am wondering if I can disallow the faceted nav urls. So disallow: /category.html/? /category2.html/? /category3.html/*? To prevent the price faceted url from being cached: /category.html?price=1%2C1000
Technical SEO | | tylerfraser
and
/category.html?price=1%2C1000&product_material=88 Thanks!0