Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Block an entire subdomain with robots.txt?
- 
					
					
					
					
 Is it possible to block an entire subdomain with robots.txt? I write for a blog that has their root domain as well as a subdomain pointing to the exact same IP. Getting rid of the option is not an option so I'd like to explore other options to avoid duplicate content. Any ideas? 
- 
					
					
					
					
 Awesome! That did the trick -- thanks for your help. The site is no longer listed  
- 
					
					
					
					
 Fact is, the robots file alone will never work (the link has a good explanation why - short form: all it does is stop the bots from indexing again). Best to request removal then wait a few days. 
- 
					
					
					
					
 Yeah. As of yet, the site has not been de-indexed. We placed the conditional rule in htaccess and are getting different robots.txt files for the domain and subdomain -- so that works. But I've never done this before so I don't know how long it's supposed to take? I'll try to verify via Webmaster Tools to speed up the process. Thanks 
- 
					
					
					
					
 You should do a remove request in Google Webmaster Tools.  You have to first verify the sub-domain then request the removal. See this post on why the robots file alone won't work... http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts 
- 
					
					
					
					
 Awesome. We used your second idea and so far it looks like it is working exactly how we want. Thanks for the idea. Will report back to confirm that the subdomain has been de-indexed. 
- 
					
					
					
					
 Option 1 could come with a small performance hit if you have a lot of txt files being used on the server. There shouldn't be any negative side effects to option 2 if the rewrite is clean (IE not accidently a redirect) and the content of the two files are robots compliant. Good luck 
- 
					
					
					
					
 Thanks for the suggestion. I'll definitely have to do a bit more research into this one to make sure that it doesn't have any negative side effects before implementation 
- 
					
					
					
					
 We have a plugin right now that places canonical tags, but unfortunately, the canonical for the subdomain points to the subdomain. I'll look around to see if I can tweak the settings 
- 
					
					
					
					
 Sounds like (from other discussions) you may be stuck requiring a dynamic robot.txt file which detects what domain the bot is on and changes the content accordingly.  This means the server has to run all .txt file as (I presume) PHP. Or, you could conditionally rewrite the /robot.txt URL to a new file according to sub-domain RewriteEngine on 
 RewriteCond %{HTTP_HOST} ^subdomain.website.com$
 RewriteRule ^robotx.txt$ robots-subdomain.txtThen add: User-agent: * 
 Disallow: /to the robots-subdomain.txt file (untested) 
- 
					
					
					
					
 Placing canonical tags isn't an option? Â Detect that the page is being viewed through the subdomain, and if so, write the canonical tag on the page back to the root domain? Or, just place a canonical tag on every page pointing back to the root domain (so the subdomain and root domain pages would both have them). Â Apparently, it's ok to have a canonical tag on a page pointing to itself. Â I haven't tried this, but if Matt Cutts says it's ok... 
- 
					
					
					
					
 Hey Ryan, I wasn't directly involved with the decision to create the subdomain, but I'm told that it is necessary to create in order to bypass certain elements that were affecting the root domain. Nevertheless, it is a blog and the users now need to login to the subdomain in order to access the Wordpress backend to bypass those elements. Traffic for the site still goes to the root domain. 
- 
					
					
					
					
 They both point to the same location on the server? So there's not a different folder for the subdomain? If that's the case then I suggest adding a rule to your htaccess file to 301 the subdomain back to the main domain in exactly the same way people redirect from non-www to www or vice-versa. However, you should ask why the server is configured to have a duplicate subdomain? You might just edit your apache settings to get rid of that subdomain (usually done through a cpanel interface). Here is what your htaccess might look like: <ifmodule mod_rewrite.c="">RewriteEngine on 
 Â # Redirect non-www to wwww
 Â RewriteCond %{HTTP_HOST} !^www.mydomain.org [NC]
 Â RewriteRule ^(.*)$ http://www.mydomain.org/$1 [R=301,L]</ifmodule>
- 
					
					
					
					
 Not to me LOL  I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one. I think you'll need someone with a bit more expertise in this area than I to assist in this case. Kyle, I'm sorry I couldn't offer more assistance... but I don't want to tell you something if I'm not 100% sure. I suspect one of the many bright SEOmozer's will quickly come to the rescue on this one.Andy  
- 
					
					
					
					
 Hey Andy, Herein lies the problem. Since the domain and subdomain point to the exact same place, they both utilize the same robots.txt file. Does that make sense? 
- 
					
					
					
					
 Hi Kyle  Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content. Yes, you can block an entire subdomain via robots.txt, however you'll need to create a robots.txt file and place it in the root of the subdomain, then add the code to direct the bots to stay away from the entire subdomain's content.User-agent: * 
 Disallow: /hope this helps  
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
- 
		
		Moz ToolsChat with the community about the Moz tools. 
- 
		
		SEO TacticsDiscuss the SEO process with fellow marketers 
- 
		
		CommunityDiscuss industry events, jobs, and news! 
- 
		
		Digital MarketingChat about tactics outside of SEO 
- 
		
		Research & TrendsDive into research and trends in the search industry. 
- 
		
		SupportConnect on product support and feature requests. 
Related Questions
- 
		
		
		
		
		
		Wildcarding Robots.txt for Particular Word in URL
 Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon Intermediate & Advanced SEO | | EvansHunt0
- 
		
		
		
		
		
		Should I disallow all URL query strings/parameters in Robots.txt?
 Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM Intermediate & Advanced SEO | | jmorehouse
 /Mulligan-Practitioner-CD-ROM?cat_id=87
 /Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0
- 
		
		
		
		
		
		"noindex, follow" or "robots.txt" for thin content pages
 Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here. Intermediate & Advanced SEO | | khi50
- 
		
		
		
		
		
		Block in robots.txt instead of using canonical?
 When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt? Intermediate & Advanced SEO | | YairSpolter0
- 
		
		
		
		
		
		301 Redirect of subdomain?
 Fellow Mozzers, I'm having a hard time wrapping my brain around a redirect issue and thought it was worth posing the question to the Moz community. Â I did a search first but couldn't find the exact answer I was looking for. How does a 301 redirect work when you redirect a sub domain example.homepage.com to www.homepage.com but you keep the sub directories of example.homepage.com/page-1 active and are trying to rank them? Â I'm dealing with a current project where this is happening and this doesn't make sense to me, to redirect the subdomain if you're also trying to rank/create search traffic for pages, sub directories on example.homepage.com. This also get's into the debate of if a sub domain site is viewed as it's own website and therefore has to rank itself. Â If this is true, it seems like we're kind of killing the authority of the site by redirecting it. Additionally, www.homepage.com has a much stronger link profile than example.homepage.com I hope this makes sense. Â Any thoughts are appreciated. Â Thanks for your time. Intermediate & Advanced SEO | | SMG-Texas0
- 
		
		
		
		
		
		Robots.txt, does it need preceding directory structure?
 Do you need the entire preceding path in robots.txt for it to match? e.g: I know if i add Disallow:Â /fish to robots.txt it will block /fish Intermediate & Advanced SEO | | Milian
 /fish.html
 /fish/salmon.html
 /fishheads
 /fishheads/yummy.html
 /fish.php?id=anything But would it block?: en/fish
 en/fish.html
 en/fish/salmon.html
 en/fishheads
 en/fishheads/yummy.html
 **en/fish.php?id=anything (taken from Robots.txt Specifications)** I'm hoping it actually wont match, that way writing this particular robots.txt will be much easier! As basically I'm wanting to block many URL that have BTS- in such as: http://www.example.com/BTS-something
 http://www.example.com/BTS-somethingelse
 http://www.example.com/BTS-thingybob But have other pages that I do not want blocked, in subfolders that also have BTS- in, such as: http://www.example.com/somesubfolder/BTS-thingy
 http://www.example.com/anothersubfolder/BTS-otherthingy Thanks for listening0
- 
		
		
		
		
		
		Recovering from robots.txt error
 Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance! Intermediate & Advanced SEO | | RikkiD220
- 
		
		
		
		
		
		Is it bad to host an XML sitemap in a different subdomain?
 Example: sitemap.example.com/sitemap.xml for pages on www.example.com. Intermediate & Advanced SEO | | SEOTGT0
 
			
		 
			
		 
			
		 
			
		 
					
				 
					
				 
					
				 
					
				 
					
				 
					
				