Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block an entire subdomain with robots.txt?
-
Is it possible to block an entire subdomain with robots.txt?
I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
-
Awesome! That did the trick -- thanks for your help. The site is no longer listed

-
Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).
Best to request removal then wait a few days.
-
Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?
I'll try to verify via Webmaster Tools to speed up the process. Thanks
-
You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.
See this post on why the robots file alone won't work...
http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.
Will report back to confirm that the subdomain has been de-indexed.
-
Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.
There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.
Good luck
-
Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation
-
We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings
-
Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txtThen add:
User-agent: *
Disallow: /to the robots-subdomain.txt file
(untested)
-
Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?
Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...
-
Hey Ryan,
I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.
Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.
-
They both point to the same location on the server? So there's not a different folder for the subdomain?
If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).
Here is what your htaccess might look like:
<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule> -
Not to me LOL
I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.Andy

-
Hey Andy,
Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.
Does that make sense?
-
Hi Kyle
Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.User-agent: *
Disallow: /hope this helps

Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ecommerce store on subdomain - danger of keyword cannibalization?
Hi all, Scenario: Ecommerce website selling a food product has their store on a subdomain (store.website.com). A GOOD chunk of the URLs - primarily parameters - are blocked in Robots.txt. When I search for the products, the main domain ranks almost exclusively, while the store only ranks on deeper SERPs (several pages deep). In the end, only one variation of the product is listed on the main domain (ex: Original Flavor 1oz 24 count), while the store itself obviously has all of them (most of which are blocked by Robots.txt). Can anyone shed a little bit of insight into best practices here? The platform for the store is Shopify if that helps. My suggestion at this point is to recommend they all crawling in the subdomain Robots.txt and canonicalize the parameter pages. As for keywords, my main concern is cannibalization, or rather forcing visitors to take extra steps to get to the store on the subdomain because hardly any of the subdomain pages rank. In a perfect world, they'd have everything on their main domain and no silly subdomain. Thanks!
Intermediate & Advanced SEO | | Alces0 -
Subdomain cannibalization
Hi, I am doing the SEO for a webshop, which has a lot of linking and related websites on the same root domain. So the structure is for example: Root domain: example.com
Intermediate & Advanced SEO | | Mat_C
Shop: shop.example.com
Linking websites to shop: courses.example.com, software.example.com,... Do I have to check which keywords these linking websites are already ranking for and choose other keywords for my category and product pages on the webshop? The problem with this could be that the main keywords for the category pages on the webshop are mainly the same as for the other subdomains. The intention is that some people immediately come to the webshop instead of going first to the linking websites and then to the webshop. Thanks.0 -
What does Disallow: /french-wines/?* actually do - robots.txt
Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?* Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark? Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL? I think this has been done to block URLs containing query strings. Thanks, Luke
Intermediate & Advanced SEO | | McTaggart0 -
If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?
Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika
Intermediate & Advanced SEO | | Malika11 -
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
Blog On Subdomain - Do backlinks to the blog posts on Subdomain count as links for main site?
I want to put blog on my site. The IT department is asking that I use a subdomain (myblog.mysite.com) instead of a subfolder (mysite.com/myblog). I am worried b/c it was my understanding that any links I get to my blog posts (if on subdomain) will not count toward the main site (search engines would view almost as other website). The main purpose of this blog is to attract backlinks. That is why I prefer the subfolder location for the Blog. Can anyone tell me if I am thinking about this right? Another solution I am being offered is to use a reverse proxy. Thoughts? Thank you for your time.
Intermediate & Advanced SEO | | ecerbone0 -
Getting a Sitemap for a Subdomain into Webmaster Tools
We have a subdomain that is a Wordpress blog, and it takes days, sometimes weeks for most posts to be indexed. We are using the Yoast plugin for SEO, which creates the sitemap.xml file. The problem is that the sitemap.xml file is located at blog.gallerydirect.com/sitemap.xml, and Webmaster Tools will only allow the insertion of the sitemap as a directory under the gallerydirect.com account. Right now, we have the sitemap listed in the robots.txt file, but I really don't know if Google is finding and parsing the sitemap. As far as I can tell, I have three options, and I'd like to get thoughts on which of the three options is the best choice (that is, unless there's an option I haven't thought of): 1. Create a separate Webmaster Tools account for the blog 2. Copy the blog's sitemap.xml file from blog.gallerydirect.com/sitemap.xml to the main web server and list it as something like gallerydirect.com/blogsitemap.xml, then notify Webmaster Tools of the new sitemap on the galllerydirect.com account 3. Do an .htaccess redirect on the blog server, such as RewriteRule ^sitemap.xml http://gallerydirect.com/blogsitemap_index.xml Then notify Webmaster Tools of the new blog sitemap in the gallerydirect.com account. Suggestions on what would be the best approach to be sure that Google is finding and indexing the blog ASAP?
Intermediate & Advanced SEO | | sbaylor0 -
Robots.txt & url removal vs. noindex, follow?
When de-indexing pages from google, what are the pros & cons of each of the below two options: robots.txt & requesting url removal from google webmasters Use the noindex, follow meta tag on all doctor profile pages Keep the URLs in the Sitemap file so that Google will recrawl them and find the noindex meta tag make sure that they're not disallowed by the robots.txt file
Intermediate & Advanced SEO | | nicole.healthline0