Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block an entire subdomain with robots.txt?
-
Is it possible to block an entire subdomain with robots.txt?
I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas?
-
Awesome! That did the trick -- thanks for your help. The site is no longer listed

-
Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again).
Best to request removal then wait a few days.
-
Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take?
I'll try to verify via Webmaster Tools to speed up the process. Thanks
-
You should do a remove request in Google Webmaster Tools. You have to first verify the sub-domain then request the removal.
See this post on why the robots file alone won't work...
http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
-
Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea.
Will report back to confirm that the subdomain has been de-indexed.
-
Option 1 could come with a small performance hit if you have a lot of txt files being used on the server.
There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant.
Good luck
-
Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation
-
We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings
-
Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly. This means the server has to run all .txt file as (I presume) PHP.
Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain
RewriteEngine on
RewriteCond %{HTTP_HOST} ^subdomain.website.com$
RewriteRule ^robotx.txt$ robots-subdomain.txtThen add:
User-agent: *
Disallow: /to the robots-subdomain.txt file
(untested)
-
Placing canonical tags isn't an option? Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain?
Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Apparently, it's ok to have a canonical tag on a page pointing to itself. I haven't tried this, but if Matt Cutts says it's ok...
-
Hey Ryan,
I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain.
Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain.
-
They both point to the same location on the server? So there's not a different folder for the subdomain?
If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface).
Here is what your htaccess might look like:
<ifmodule mod_rewrite.c="">RewriteEngine on
# Redirect non-www to wwww
RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule> -
Not to me LOL
I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.Andy

-
Hey Andy,
Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file.
Does that make sense?
-
Hi Kyle
Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.User-agent: *
Disallow: /hope this helps

Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Ecommerce store on subdomain - danger of keyword cannibalization?
Hi all, Scenario: Ecommerce website selling a food product has their store on a subdomain (store.website.com). A GOOD chunk of the URLs - primarily parameters - are blocked in Robots.txt. When I search for the products, the main domain ranks almost exclusively, while the store only ranks on deeper SERPs (several pages deep). In the end, only one variation of the product is listed on the main domain (ex: Original Flavor 1oz 24 count), while the store itself obviously has all of them (most of which are blocked by Robots.txt). Can anyone shed a little bit of insight into best practices here? The platform for the store is Shopify if that helps. My suggestion at this point is to recommend they all crawling in the subdomain Robots.txt and canonicalize the parameter pages. As for keywords, my main concern is cannibalization, or rather forcing visitors to take extra steps to get to the store on the subdomain because hardly any of the subdomain pages rank. In a perfect world, they'd have everything on their main domain and no silly subdomain. Thanks!
Intermediate & Advanced SEO | | Alces0 -
Robots.txt & Disallow: /*? Question!
Hi, I have a site where they have: Disallow: /*? Problem is we need the following indexed: ?utm_source=google_shopping What would the best solution be? I have read: User-agent: *
Intermediate & Advanced SEO | | vetofunk
Allow: ?utm_source=google_shopping
Disallow: /*? Any ideas?0 -
Subdomains vs directories on existing website with good search traffic
Hello everyone, I operate a website called Icy Veins (www.icy-veins.com), which gives gaming advice for World of Warcraft and Hearthstone, two titles from Blizzard Entertainment. Up until recently, we had articles for both games on the main subdomain (www.icy-veins.com), without a directory structure. The articles for World of Warcraft ended in -wow and those for Hearthstone ended in -hearthstone and that was it. We are planning to cover more games from Blizzard entertainment soon, so we hired a SEO consultant to figure out whether we should use directories (www.icy-veins.com/wow/, www.icy-veins.com/hearthstone/, etc.) or subdomains (www.icy-veins.com, wow.icy-veins.com, hearthstone.icy-veins.com). For a number of reason, the consultant was adamant that subdomains was the way to go. So, I implemented subdomains and I have 301-redirects from all the old URLs to the new ones, and after 2 weeks, the amount of search traffic we get has been slowly decreasing, as the new URLs were getting index. Now, we are getting about 20%-25% less search traffic. For example, the week before the subdomains went live we received 900,000 visits from search engines (11-17 May). This week, we only received 700,000 visits. All our new URLs are indexed, but they rank slightly lower than the old URLs used to, so I was wondering if this was something that was to be expected and that will improve in time or if I should just go for subdomains. Thank you in advance.
Intermediate & Advanced SEO | | damienthivolle0 -
301 Redirect of subdomain?
Fellow Mozzers, I'm having a hard time wrapping my brain around a redirect issue and thought it was worth posing the question to the Moz community. I did a search first but couldn't find the exact answer I was looking for. How does a 301 redirect work when you redirect a sub domain example.homepage.com to www.homepage.com but you keep the sub directories of example.homepage.com/page-1 active and are trying to rank them? I'm dealing with a current project where this is happening and this doesn't make sense to me, to redirect the subdomain if you're also trying to rank/create search traffic for pages, sub directories on example.homepage.com. This also get's into the debate of if a sub domain site is viewed as it's own website and therefore has to rank itself. If this is true, it seems like we're kind of killing the authority of the site by redirecting it. Additionally, www.homepage.com has a much stronger link profile than example.homepage.com I hope this makes sense. Any thoughts are appreciated. Thanks for your time.
Intermediate & Advanced SEO | | SMG-Texas0 -
Meta NoIndex tag and Robots Disallow
Hi all, I hope you can spend some time to answer my first of a few questions 🙂 We are running a Magento site - layered/faceted navigation nightmare has created thousands of duplicate URLS! Anyway, during my process to tackle the issue, I disallowed in Robots.txt anything in the querystring that was not a p (allowed this for pagination). After checking some pages in Google, I did a site:www.mydomain.com/specificpage.html and a few duplicates came up along with the original with
Intermediate & Advanced SEO | | bjs2010
"There is no information about this page because it is blocked by robots.txt" So I had added in Meta Noindex, follow on all these duplicates also but I guess it wasnt being read because of Robots.txt. So coming to my question. Did robots.txt block access to these pages? If so, were these already in the index and after disallowing it with robots, Googlebot could not read Meta No index? Does Meta Noindex Follow on pages actually help Googlebot decide to remove these pages from index? I thought Robots would stop and prevent indexation? But I've read this:
"Noindex is a funny thing, it actually doesn’t mean “You can’t index this”, it means “You can’t show this in search results”. Robots.txt disallow means “You can’t index this” but it doesn’t mean “You can’t show it in the search results”. I'm a bit confused about how to use these in both preventing duplicate content in the first place and then helping to address dupe content once it's already in the index. Thanks! B0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
Robots.txt is blocking Wordpress Pages from Googlebot?
I have a robots.txt file on my server, which I did not develop, it was done by the web designer at the company before me. Then there is a word press plugin that generates a robots.txt file. How Do I unblock all the wordpress pages from googlebot?
Intermediate & Advanced SEO | | ENSO0 -
Move blog from subdomain to main domain on ecom site?
I am wondering what my fellow mozers think. Pretty set about my direction but want to get any other input to aid in my decision. Have an ecom site with a www.blog.maindomain.com. The blog is fairly new and no major rankings. There are only about 30 posts. This isn't a super competitive market and the blogging won't be a huge part of our content strategy but I would like to use it for passing juice etc. Would you go through the trouble to move the blog to www.site.com/blog and redirecting all the old content to new?
Intermediate & Advanced SEO | | PEnterprises0