Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt on http vs. https
-
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https.
I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt?
Strangely, I cannot find a single ressource about this...
-
Glad to be of help. Check out this Google link to confirm you picked up the 180 day crawl
https://support.google.com/webmasters/answer/83106?hl=en
Second URLs helpful as well.
http://blog.raventools.com/moving-site-from-http-to-ssl/
all the best,
tom
-
Good point with the backlinks! Currently, both robots.txt files are open and google does not seem to have canonicalization problems so far. So it makes sense to leave it this way anyways... Thanks Thomas!
-
"Now that https is the canonical version, should I block the http-Version with robots.txt?"
Absolutely not GWT will handel all of it think about backlinks both https:// & http:// urls you will not want to lose the flow of link juice that you would cut off
Remake robost.txt with
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
But use https:// for the xml sitemap.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do I need a separate robots.txt file for my shop subdomain?
Hello Mozzers! Apologies if this question has been asked before, but I couldn't find an answer so here goes... Currently I have one robots.txt file hosted at https://www.mysitename.org.uk/robots.txt We host our shop on a separate subdomain https://shop.mysitename.org.uk Do I need a separate robots.txt file for my subdomain? (Some Google searches are telling me yes and some no and I've become awfully confused!
Technical SEO | | sjbridle0 -
Should I block Map pages with robots.txt?
Hello, I have a website that was started in 1999. On the website I have map pages for each of the offices listed on my site, for which there are about 120. Each of the 120 maps is in a whole separate html page. There is no content in the page other than the map. I know all of the offices love having the map pages so I don't want to remove the pages. So, my question is would these pages with no real content be hurting the rankings of the other pages on our site? Therefore, should I block the pages with my robots.txt? Would I also have to remove these pages (in webmaster tools?) from Google for blocking by robots.txt to really work? I appreciate your feedback, thanks!
Technical SEO | | imaginex0 -
Is there any value in having a blank robots.txt file?
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file? What is the minimum you would include in a basic robots.txt file?
Technical SEO | | NicDale0 -
WordPress - How to stop both http:// and https:// pages being indexed?
Just published a static page 2 days ago on WordPress site but noticed that Google has indexed both http:// and https:// url's. Usually I only get http:// indexed though. Could anyone please explain why this may have happened and how I can fix? Thanks!
Technical SEO | | Clicksjim1 -
Empty Meta Robots Directive - Harmful?
Hi, We had a coding update and a side-effect of that was that our directive was emptied, in other words it now reads as: on all of the site. I've since noticed that Google's cache date on all of the pages - at least, the ones I tested - have a Cached date of no later than 17 December '12 - that's the Monday after the directive was removed on mass. So, A, does anyone have solid evidence of an empty directive causing problems? Past experience, Matt Cutts, Fishkin quote, etc. And then B - It seems fairly well correlated but, does my entire site's homogenous Cached date point to this tag removal? Or is it fairly normal to have a particular cache date across a large site (we're a large ecommerce site). Our site: http://www.zando.co.za/ I'm having the directive reinstated as soon as Dev permitting. And then, for extra credit, is there a way with Google's API, or perhaps some other tool, to run an arbitrary list and retrieve Cached dates? I'd want to do this for diagnosis purposes and preferably in a way that OK with Google. I'd avoid CURLing for the cached URL and scraping out that dates with BASH, or any such kind of thing. Cheers,
Technical SEO | | RocketZando0 -
Removing robots.txt on WordPress site problem
Hi..am a little confused since I ticked the box in WordPress to allow search engines to now crawl my site (previously asked for them not to) but Google webmaster tools is telling me I still have robots.txt blocking them so am unable to submit the sitemap. Checked source code and the robots instruction has gone so a little lost. Any ideas please?
Technical SEO | | Wallander0 -
302 or 301 redirect to https ?
I am redirecting whole site to https. Is there a difference between 302 or 301 redirect for seo? Site never been indexed. Planning to do that with .htaccess command RewriteCond %{HTTPS} !=on
Technical SEO | | Kotkov
RewriteRule ^(.*) https://%{SERVER_NAME}/$1 [R,L] There are plenty of ways http://www.askapache.com/htaccess/ssl-example-usage-in-htaccess.html Which way would be the best? Thanks is advance0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0