What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
-
Now that Google considers subdomains as part of the TLD I'm a little leery of testing robots.txt with something like:
staging.domain.com
User-agent: *
Disallow: /in fear it might get the www.domain.com blocked as well. Has anyone had any success using robots.txt to block sub-domains? I know I could add a meta robots tag to the staging.domain.com pages but that would require a lot more work.
-
Just make sure that when/if you copy over the staging site to the live domain that you don't copy over the robots.txt, htaccess, or whatever means you use to block that site from being indexed and thus have your shiny new site be blocked.
-
I agree. The name of your subdomain being "staging" didn't register at all with me until Matt brought it up. I was offering a generic response to the subdomain question whereas I believe Matt focused on how to handle a staging site. Interesting viewpoint.
-
Matt/Ryan-
Great discussion, thanks for the input. The staging.domain.com is just one of the domains we don't want indexed. Some of them still need to be accessed by the public, some like staging could be restricted to specific IPs.
I realize after your discussion I probably should have used a different example of a sub-domain. On the other hand it might not have sparked the discussion so maybe it was a good example
-
.htaccess files can be placed at any directory level of a site so you can do it for just the subdomain or even just a directory of a domain.
-
Staging URL's are typically only used for testing so rather than do a deny I would recommend using a specific ALLOW for only the IP addresses that should be allowed access.
I would imagine you don't want it indexed because you don't want the rest of the world knowing about it.
You can also use HTACCESS to use username/passwords. It is simple but you can give that to clients if that is a concern/need.
-
Correct.
-
Toren, I would not recommend that solution. There is nothing to prevent Googlebot from crawling your site via almost any IP. If you found 100 IPs used by the crawler and blocked them all, there is nothing to stop the crawler from using IP #101 next month. Once the subdomain's content is located and indexed, it will be a headache fixing the issue.
The best solution is always going to be a noindex meta tag on the pages you do not wish to be indexed. If that method is too much work or otherwise undesirable, you can use the robots.txt solution. There is no circumstance I can imagine where you would modify your htaccess file to block googlebot.
-
Hi Matt.
Perhaps I misunderstood the question but I believe Toren only wishes to prevent the subdomain from being indexed. If you restrict subdomain access by IP it would prevent visitors from accessing the content which I don't believe is the goal.
-
Interesting, hadn't thought of using htaccess to block Googlebot.Thanks for the suggestion.
-
Thanks Ryan. So you don't see any issues with de-indexing the main site if I created a second robots.txt file, e.g.
http://staging.domin.com/robots.txt
User-agent: *
Disallow: /That was my initial thought but when Google announced they consider sub-domains part of the TLD I was afraid it might affect the htp://www.domain.com versions of the pages. So you're saying the subdomain is basically treated like a folder you block on the primary domain?
-
Use an .htaccess file to only allow from certain ip addresses or ranges.
Here is an article describing how: http://www.kirupa.com/html5/htaccess_tricks.htm
-
What is the best method to block a sub-domain, e.g. staging.domain.com/ from getting indexed?
Place a robots.txt file in the root of the subdomain.
User-agent: *
Disallow: /This method will block the subdomain while leaving your primary domain unaffected.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best way to absorb discontinued brand/domain?
Our parent company is looking to absorb the domain of a brand we are discontinuing. The domain we want to absorb has a thousands of blog posts from 2010 onward. Much of the content is old but still high-converting. We would like to keep as much of the potential traffic as possible, but we don't want the parent website to become too large or lose credibility with too many 301 redirects. Any advice on the best way to do this?
Technical SEO | | NichGunn1 -
How long does Google/Bing take to index
Hello we have 2-3 new pages being submitted every night to google/bing via our sitemap. Two issues I am noticing. Wondering if anyone else has same issues. a) 22 URL submitted via sitemap but only 1 indexed in two weeks. there are no errors showing b) If i submit manually using "Fetch As Google" and request indexing - the page gets indexed right way but after a day it seems to be unindexed - it will show up when i search (site:domain.com) but then disappear from the results doing the same search a few days later. Is this normal or do i have a problem that needs addressing? thank you
Technical SEO | | sancarlos0 -
What are the pros and cons of changing my domain from .com to .us in Google webmaster tools?
Hi, I'm migrating my site from a .com domain to local country domains. I'm wondering what to consider if i chose to move the .com to .us domain. What should I consider before deciding? BR
Technical SEO | | Quru0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Buying a SERPs competitor domain / site
For a specific term, we have the potential opportunity to purchase the domain (complete with site) that sits just above us in the Google search results... The domain has reasonable page authority of 49, domain authority of 38 with 168 linking root domains - 311 total links... Would the most beneficial use of the domain be to retain the site content as is and incorporate a few relevant links back to our site or... 301 the entire domain?
Technical SEO | | digitalarts1 -
Language/country redirect best practice?
Hi, What is SEO best practice when it comes to redirecting users from www.domain.com to their specific language/country, let's say www.domain.com/de for Germany? From what I heard in on of the whiteboard fridays, it seems to be Javascript based on IP and browser language, and then set a cookie - correct? Or should we let our users manually select their language/country at the first visit? Any suggestion appreciated, thanks!
Technical SEO | | rtora0 -
Domain Masking with New Keyword-Rich Domains
Hello, friends. We have an ecommerce site and we also own several keyword-rich domains but haven't done anything with them yet. Is there any value in using domain masking to point them to either product pages or special landing pages on our primary ecommerce site? Here's an example: Primary site is widgetzone.com Keyword rich URL is acmewidget.com (which is totally blank and isn't indexed) It could point to our category page for Acme Widgets: widgetzone.com/category/acme-widgets or it could point to a new landing page: widgetzone.com/acme-widgets My concern is that because the keyword-rich URL hasn't been utilized at all there's really no point in redirecting it. I'm of the mind that it's either going to be ineffective at best or a duplicate content issue at worst. What do you guys think? As a follow-up, if we don't redirect these domains, what should we do with them? Just try to sell them off rather than create totally new sites?
Technical SEO | | jbreeden0 -
Optimising a dot com domain for international datacentres
I've just been asked the question and didn't have a great answer, what a great reason to try the new Q&A! We have a .com domain based in the UK but we'd like to optimise for Australian searches. Are there any tips about useful practices to carry out on the site to highlights it's relevance for Australian users?
Technical SEO | | eazytiger0