Can I Block https URLs using Host directive in robots.txt?
-
Hello Moz Community,
Recently, I have found that Google bots has started crawling HTTPs urls of my website which is increasing the number of duplicate pages at our website.
Instead of creating a separate robots.txt file for https version of my website, can I use Host directive in the robots.txt to suggest Google bots which is the original version of the website.
Host: http://www.example.com
I was wondering if this method will work and suggest Google bots that HTTPs URLs are the mirror of this website.
Thanks for all of the great responses!
Regards,
Ramendra -
Hi Ramendra,
To my knowledge, you can only provide directives in the robots.txt file for the domain on which it lives. This goes for both http/https and www/non-www versions of domains. This is why it's important to handle all preferred domain formatting with redirects, that point to your canonicalized version. So if you want http://www to index, all other versions redirect to that.
There might be a work around of some sort, but honestly, what I described above with redirection towards preferred versions is the direction you should take. Then you can manage one robots.txt file and your indexing will align with what you want better.
-
Thanks Logan,
I have read somewhere that using Host directive in the robots.txt file we can suggest Google bots which is the original version of the website if there are number of mirror sites. So, I was wondering if we can prevent indexing/crawling of HTTPS URLs by using Host directive in robots.txt of HTTP site.
We are using an ecommerce SAAS platform for our website where we have only one robots.txt file that we can use for HTTP site.
Is there any other way to prevent indexing/crawling of HTTPS URLs?
Regards,
Ramendra -
Hi Ramendra,
Based on what you said, it sounds like both versions of your site exist and are indexed, and you want to mitigate your duplicate content risk. If that's accurate, here are my recommendations on this:
- Robots.txt cannot be used on a HTTP site to prevent indexing/crawling of HTTPS URLs
- Google crawls HTTPS by default, so if your site is fully secure, then you need to redirect (this can be done with a redirect rule in HTACCESS, you don't need to do one-to-one redirects) HTTP URLs over to their HTTPS twin
- In addition to your HTTP>HTTPS redirects, you should also use canonical tags to push your preferred version to search engines
- Your HTTPS site should have its own robots.txt file
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Utilizing one robots.txt for two sites
I have two sites that are facilitated hosting in similar CMS. Maybe than having two separate robots.txt records (one for every space), my web office has made one which records the sitemaps for the two sites, similar to this:
Technical SEO | | eulabrant0 -
In writing the url, it is better to use the language used by the people of my country or English?
We speak Persian and all people search in Persian on Google. But I read in some sources that the url should be in English. Please tell me which language to use for url writing?
Technical SEO | | ghesta
For example, I brought down two models: 1fb0e134-10dc-4737-904f-bfdf07143a98-image.png https://ghesta.ir/blog/how-to-become-rich/
2)https://ghesta.ir/blog/چگونه-پولدار-شویم/0 -
High DA url rewrite to your url...would it increase the Ranking of a website?
Hi, my client use a recruiting management tool called njoyn.com. The url of his site look like: www.example.njoyn.com. Would it increase his ranking if I use this Url above that point to njoyn domain wich has a high DA, and rewrite it to his site www.example.com? If yes how? Thanks
Technical SEO | | bigrat950 -
One URL To All Sites, How Can I Avoid ?
I am using EMD and have an only 1 page which is the main url. Now my question is how can i avoid the penalty of submitting the same URL to the different platform like Web2.0, Article Directory etc. Please help.
Technical SEO | | seodadoo5670 -
Will an XML sitemap override a robots.txt
I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed. I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why. Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
Technical SEO | | KCBackofen0 -
Robots.txt and joomla
Hello, I use joomla for my website and automatically all those files are blocked is that good or bad, so I remove anything and if so why ? User-agent: *
Technical SEO | | seoanalytics
Disallow: /administrator/
Disallow: /cache/
Disallow: /components/
Disallow: /images/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /libraries/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /xmlrpc/ I also added to my robots.txt files my email address ( is that useful, I am afraid google passes PR to the email address )
and a javascript: void (0) because I have tabs on my webpage ( is that useful )
as well as a .pdf ( is it also useful ) any comments ? does anything need to be changed or is it ok ? Thank you,0 -
Is there a way I can track Arabic keywords on the Arabic version of Google Qatar using SEOMOZ Rank checker?
I have a Qatari website in Arabic and I would like to know if it is possible to track the Arabic keywords using google.com.qa in Arabic using SEOMoz rank checker. When selecting the three search engines, I have no choice over the language. Only the country can be modified. Any solution?
Technical SEO | | mrlee1 -
When moving my ecommerce website from one host to another should I also 301 all my image urls?
I'm going to be 301'ing a lot of pages, but should i also 301 my image URLS? Any other helpful hints would be awesome too, as this will be my first move online ever. We've been with our host 3 years. Thanks! Paul Serra STbands.com, Owner
Technical SEO | | Hyrule0