Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt on http vs. https
-
We recently changed our domain from http to https. When a user enters any URL on http, there is an global 301 redirect to the same page on https.
I cannot find instructions about what to do with robots.txt. Now that https is the canonical version, should I block the http-Version with robots.txt?
Strangely, I cannot find a single ressource about this...
-
Glad to be of help. Check out this Google link to confirm you picked up the 180 day crawl
https://support.google.com/webmasters/answer/83106?hl=en
Second URLs helpful as well.
http://blog.raventools.com/moving-site-from-http-to-ssl/
all the best,
tom
-
Good point with the backlinks! Currently, both robots.txt files are open and google does not seem to have canonicalization problems so far. So it makes sense to leave it this way anyways... Thanks Thomas!
-
"Now that https is the canonical version, should I block the http-Version with robots.txt?"
Absolutely not GWT will handel all of it think about backlinks both https:// & http:// urls you will not want to lose the flow of link juice that you would cut off
Remake robost.txt with
http://www.internetmarketingninjas.com/seo-tools/robots-txt-generator/
But use https:// for the xml sitemap.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt Tester - syntax not understood
I've looked in the robots.txt Tester and I can see 3 warnings: There is a 'syntax not understood' warning for each of these. XML Sitemaps:
Technical SEO | | JamesHancocks1
https://www.pkeducation.co.uk/post-sitemap.xml
https://www.pkeducation.co.uk/sitemap_index.xml How do I fix or reformat these to remove the warnings? Many thanks in advance.
Jim0 -
Log in, sign up, user registration and robots
Hi all, We have an accommodation site that asks users only to register when they want to book a room, in the last step. Though this is the ideal situation when you have tons of users, nowadays we are having around 1500 - 2000 per day and making tests we found out that if we ask for a registration (simple, 1 click FB) we mail them all and through a good customer service we are increasing our sales. That is why, we would like to ask users to register right after the home page ie Home/accommodation or and all the rest. I am not sure how can I make to make that content still visible to robots.
Technical SEO | | Eurasmus.com
Will the authentication process block google crawling it? Maybe something we can do? We are not completely sure how to proceed so any tip would be appreciated. Thank you all for answering.3 -
Google indexing despite robots.txt block
Hi This subdomain has about 4'000 URLs indexed in Google, although it's blocked via robots.txt: https://www.google.com/search?safe=off&q=site%3Awww1.swisscom.ch&oq=site%3Awww1.swisscom.ch This has been the case for almost a year now, and it does not look like Google tends to respect the blocking in http://www1.swisscom.ch/robots.txt Any clues why this is or what I could do to resolve it? Thanks!
Technical SEO | | zeepartner0 -
Https redirect when certificate expired
Hi, How do we 301 an https version of a domain to a page on another website when the security certificate has run out? We have 301 redirected the http version but IT stuck on how to do the expired https. Thanks
Technical SEO | | Houses0 -
Does Bing ignore robots txt files?
Bonjour from "Its a miracle is not raining" Wetherby Uk 🙂 Ok here goes... Why despite a robots text file excluding indexing to site http://lewispr.netconstruct-preview.co.uk/ is the site url being indexed in Bing bit not Google? Does bing ignore robots text files or is there something missing from http://lewispr.netconstruct-preview.co.uk/robots.txt I need to add to stop bing indexing a preview site as illustrated below. http://i216.photobucket.com/albums/cc53/zymurgy_bucket/preview-bing-indexed.jpg Any insights welcome 🙂
Technical SEO | | Nightwing0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Robots.txt file getting a 500 error - is this a problem?
Hello all! While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file. As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed? Thanks in advance!
Technical SEO | | themegroup0 -
Syndication: Link back vs. Rel Canonical
For content syndication, let's say I have the choice of (1) a link back or (2) a cross domain rel canonical to the original page, which one would you choose and why? (I'm trying to pick the best option to save dev time!) I'm also curious to know what would be the difference in SERPs between the link back & the canonical solution for the original publisher and for sydication partners? (I would prefer not having the syndication partners disappeared entirely from SERPs, I just want to make sure I'm first!) A side question: What's the difference in real life between the Google source attribution tag & the cross domain rel canonical tag? Thanks! PS: Don't know if it helps but note that we can syndicate 1 article to multiple syndication partners (It would't be impossible to see 1 article syndicated to 50 partners)
Technical SEO | | raywatson0