Robots.txt disallow subdomain
-
Hi all,
I have a development subdomain, which gets copied to the live domain. Because I don't want this dev domain to get crawled, I'd like to implement a robots.txt for this domain only. The problem is that I don't want this robots.txt to disallow the live domain. Is there a way to create a robots.txt for this development subdomain only?
Thanks in advance!
-
I would suggest you talk to the developers as Theo suggests to exclude visitors from your test site.
-
The copying is a manual process and I don't want any risks for the live environment. A Httphandler for robots.txt could be a solution and I'm going to discuss this with one of our developers. Other suggestions are still welcome of course!
-
Do you ftp copy one domain to the other? If this is a manual process the excluding the robots.txt that is on the test domain would be as simple as excluding it.
If you automate the copy and want code to function based on base url address then you could create a Httphandler for robots.txt that delivered a different version based on the request url host in the http request header.
-
You could use enviromental variables (for example in your env.ini or config.ini file) that are set to DEVELOPMENT, STAGING, or LIVE based on the appropriate environments the code finds itself in.
With the exact same code, your website would either be limiting IP addresses (on the development environment) or allow all IP addresses (in the live environment). With this setup you can also set different variables per environment such as the level of detail that is shown in your error reporting, connect to a testing database rather than a live one, etc.
[this was supposed to be a reply, but I accidentely clicked the wrong button. Hitting 'Delete reply' results in an error.]
-
Thanks for your quick reply, Theo. Unfortunately, this htpasswd will also get copied to the live environment, so our websites will get password protected live. Could there be any other solution for this?
-
I'm sure there is, but I'm guessing you don't want any human visitors to go to your development subdomain and view what is being done there as well? I'd suggest you either limit the visitors that have access by IP address (thereby effectively blocking out Google in one move) and/or implement a .htpasswd solution where developers can log in with their credentials to your development area (which blocks out Google as well).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
New Subdomain SEO questions
I have a main site - mysite.com. I just created a subdomain - leadform.mysite.com I plan to use the leadform.mysite.com as a 1 page lead form only. I will link to leadform.mysite.com from mysite.com and also from other websites I own (myothersite.com etc.) - filtering all traffic to this form to capture leads. (Note - the leadform.mysite.com has CNAME to other server that hosts the backend of the form) My questions are: How should I link from mysite.com to leadform.mysite.com? With dofollow or nofollow? (mysite.com has 1000's of pages and would link from every page with "get a quote' type button) 2) How should I link from myothersite.com to leadform.mysite.com? With dofollow or nofollow? Any SEO risk linking to leadform.mysite.com from an outside domain? (myothersite.com has 1000's of pages and would link from every page with "get a quote' type button) Does it make sense to build links from outside sites to leadform.mysite.com directly to try to get that lead capture page to rank on it's own? 4) Does it make sense to link back from leadform.mysite.com back to mysite.com for seo value? With dofollow or nofollow? Thanks in advance for any help.
Intermediate & Advanced SEO | | leadforms0 -
Effect on SEO with growing number of subdomains
Since a few days I'm having some concernes on our website structure regarding SEO. Since I can't find similar cases I'm curious if the Moz community maybe have a few thoughts on the issue I'm facing The situation is as follow: For every new client our company (hosting) receives through www.example.com a new subdomain is created. This subdomain is an backup of the original website of the client and is very much irrelevant to our business. Google can also crawl these subdomains and index them. Productvariant 1: clientxxx1.productX.example.com
Intermediate & Advanced SEO | | Steven87
Productvariant 2: clientxxx1.productY.example.com
Productvariant 3: cleintxx10.productZ.example.com So I think above situation is far from ideal and I think it can cause problems. The problems we could be facing where Im thinking of are: no control over content (spam, low quality, bad optimised pages) duplicate sites (the backup on our subdomain and the original one of the client) impossible to make/manage a property for each subdomain in search console. Huge amount of subdomains which could influence crawl/indexation by Google. Maybe there are some more issues we could face where I didn't think of? The most common fix would be to use an other domain for the backups like client1.host-example.com and prevent Google from crawling it. This way www.example.com wouldn't be affected. So my questions basically are: 1. How much will this influence rankings for www.example.com
2. Are there any similar cases and what effect did it have on rankings/crawl/indexation when it got fixed / didn't got fixed?0 -
Process to move blog from subdomain on Wordpress, to subfolder on BigCommerce store
Hi Having weighed up all the angles, it's time to bite the bullet and move our blog from a subdomain to a subfolder on our ecommerce store. But as someone new to SEO I am struggling to find the correct process for doing this properly for our situation. Can anyone help? I have outlined what I have learned so far in 10 steps below to hopefully help you understand my situation, where I am at and what I am struggling with. Advice, tips and suggested further reading on all/any of the 10 points would be great. Some quick background The blog is on Wordpress, and on a subdomain of our store (blog.store.com). It is four years old, with 80 original posts we want to move to a subfolder of the store (store.com/blog). The store has been built using BigCommerce, and has also been active for four years. Both the blog and the store exist as properties within our Google Search Console. The 10 steps required for the move, based on research so far, and the associated questions 1 Prepare new site: which I am guessing means reproducing all of the content over at the new subfolder location (store.com/blog)? 2 Setup errors for any pages not being transferred: I have no idea how to do this! 3 Make sure analytics is working for the new pages: it should be as the both the site the pages are moving to a subfolder of is already running with analytics and has been for years - is this a safe assumption? 4 Map all URLs being moved to their new counterparts: is this just record keeping? In a spreadsheet? Or is it a process I don't yet understand?? 5 Add rel='cannonical' tags: while I understand the concept of these, I have no idea how to implement them properly here! 6 Create and save new sitemaps: as both the blog.store.com and store.com exist in Google Search Console already, can I just refresh the sitemap for store.com/blog once the subfolder is created to achive this? 7 Setup and test 301 redirects: these can be created in BigCommerce for the new pages in the store.com/blog subfolder, and will refer back to the blog.store.com URLs the pages came from - is this the right way to do this? I am still learning here and know enough to know how much this can matter, but not enough to fully grasp the intricacies of the process 8 Move URLs simultaneously: I have no idea what this means or how to achieve it! is this just for big site moves? Does it still apply to 80 blog posts shifting from a subdomain to a subfolder on the same root? If so, how? 9 Submit a change of address in Google Search Console: This looks simple enough although Google ominously warn: ‘Don't use this tool unless you are moving your primary website presence to a new address’ Which makes me wonder how simple it really is - my primary website in this case is the store, which is not moving. address But does 'primary' here simply mean the individual property with search console? I am going in circles on this one! 10 Configure the old blog on the subdomain to redirect people and engines to the new pages: I thought the 301 redirects and rel='cannonical' stuff did that already? What did I miss?? For anyone still here, thanks for making it this far and if you still have the energy left, any advice would be great! Thanks
Intermediate & Advanced SEO | | Warren_331 -
Robots.txt vs noindex
I recently started working on a site that has thousands of member pages that are currently robots.txt'd out. Most pages of the site have 1 to 6 links to these member pages, accumulating into what I regard as something of link juice cul-d-sac. The pages themselves have little to no unique content or other relevant search play and for other reasons still want them kept out of search. Wouldn't it be better to "noindex, follow" these pages and remove the robots.txt block from this url type? At least that way Google could crawl these pages and pass the link juice on to still other pages vs flushing it into a black hole. BTW, the site is currently dealing with a hit from Panda 4.0 last month. Thanks! Best... Darcy
Intermediate & Advanced SEO | | 945010 -
Robots Disallow Backslash - Is it right command
Bit skeptical, as due to dynamic url and some other linkage issue, google has crawled url with backslash and asterisk character ex - www.xyz.com/\/index.php?option=com_product www.xyz.com/\"/index.php?option=com_product Now %5c is the encoded version of \ - backslash & %22 is encoded version of asterisk Need to know for command :- User-agent: * Disallow: \As am disallowing all backslash url through this - will it only remove the backslash url which are duplicates or the entire site,
Intermediate & Advanced SEO | | Modi0 -
Using 2 wildcards in the robots.txt file
I have a URL string which I don't want to be indexed. it includes the characters _Q1 ni the middle of the string. So in the robots.txt can I use 2 wildcards in the string to take out all of the URLs with that in it? So something like /_Q1. Will that pickup and block every URL with those characters in the string? Also, this is not directly of the root, but in a secondary directory, so .com/.../_Q1. So do I have to format the robots.txt as //_Q1* as it will be in the second folder or just using /_Q1 will pickup everything no matter what folder it is on? Thanks.
Intermediate & Advanced SEO | | seo1234560 -
New server update + wrong robots.txt = lost SERP rankings
Over the weekend, we updated our store to a new server. Before the switch, we had a robots.txt file on the new server that disallowed its contents from being indexed (we didn't want duplicate pages from both old and new servers). When we finally made the switch, we somehow forgot to remove that robots.txt file, so the new pages weren't indexed. We quickly put our good robots.txt in place, and we submitted a request for a re-crawl of the site. The problem is that many of our search rankings have changed. We were ranking #2 for some keywords, and now we're not showing up at all. Is there anything we can do? Google Webmaster Tools says that the next crawl could take up to weeks! Any suggestions will be much appreciated.
Intermediate & Advanced SEO | | 9Studios0 -
Reciprocal Links and nofollow/noindex/robots.txt
Hypothetical Situations: You get a guest post on another blog and it offers a great link back to your website. You want to tell your readers about it, but linking the post will turn that link into a reciprocal link instead of a one way link, which presumably has more value. Should you nofollow your link to the guest post? My intuition here, and the answer that I expect, is that if it's good for users, the link belongs there, and as such there is no trouble with linking to the post. Is this the right way to think about it? Would grey hats agree? You're working for a small local business and you want to explore some reciprocal link opportunities with other companies in your niche using a "links" page you created on your domain. You decide to get sneaky and either noindex your links page, block the links page with robots.txt, or nofollow the links on the page. What is the best practice? My intuition here, and the answer that I expect, is that this would be a sneaky practice, and could lead to bad blood with the people you're exchanging links with. Would these tactics even be effective in turning a reciprocal link into a one-way link if you could overlook the potential immorality of the practice? Would grey hats agree?
Intermediate & Advanced SEO | | AnthonyMangia0