Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Subdomain Removal in Robots.txt with Conditional Logic??
-
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live.
My specific situation is this:
I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content.
Should I:
a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense)
b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning...
Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
-
Here's how I dealt with a similar situation in the past.
Robots.txt on each of the dev subdomains and on the live domain. Dev subdomains robots.txt excluded the entire subdomain, and subdomains were verified in GWT and removed as needed.
Made live subdomain robots.txt read-only so it didn't get overwritten. Should have made dev subdomains robots.txt read-only as well, since they sometimes got refreshed with the live content (there was a UGC database that would occasionally get copied to a dev subdomain, and we'd have robots.txt get copied over too and dev subdomain indexed).
Set up a code monitor that checks the contents of all of the robots.txt daily and sends me an email if anything is changed.
Not perfect, but I was at least able to catch changes soon after they happened, and prevented a few changes.
-
you can't put logic in robots.txt and subdomains are seen as different sites, so you need to create separate robots.txt files for each subdomain and block them in their respective robots.txt files.
You'll need to also add the Google verification code and verify them, then in GWMT you can request to have the subdomain removed from Googles index, that's the fastest way.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Is sitemap required on my robots.txt?
Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad
Technical SEO | | Webicultors0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Does an subdomain hosted offsite provide SEO value
We have a job board hosted through an applicant processing system which we've setup as a subdomain (jobs.ourcompany.com), most of the assets are hosted on our primary domain (ourcompany.com). My question is does having it hosted offsite provide any value? Do we get credit for that content being shared and distributed on the web or does the applicant processing system? As I see it the options are (correct me if I'm wrong): Host the job listings on our primary domain (ourcompany.com/jobs) and have it point to the application on the subdomain. Advertise the job listings pointing to the primary domain on the paid sites. The free job listing sites will automatically point to the sub-domain because the applicant processing system automatically submits them. Host the job listings entirely on the sub-domain applicant tracking system and link to it from our primary site navigation. Advertise the job listings to the sub-domain so that both free and paid point to the same place. Obviously the second one would be much easier just not sure on the technical side of our website getting credit by search engines as the one who has produced the content.
Technical SEO | | r1200gsa0 -
Is there any value in having a blank robots.txt file?
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file? What is the minimum you would include in a basic robots.txt file?
Technical SEO | | NicDale0 -
Will an XML sitemap override a robots.txt
I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed. I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why. Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
Technical SEO | | KCBackofen0 -
Can I remove 301 redirects after some time?
Hello, We have an very large number of 301 redirects on our site and would like to find a way to remove some of them. Is there a time frame after which Google does not need a 301 any more? For example if A is 301 redirected to B, does Google know after a while not to serve A any more, and replaces any requests for A with B? How about any links that go to A? Or: Is the only option to have all links that pointed to A point to B and then the 301 can be removed after some time? Thank you for you you help!
Technical SEO | | Veva0 -
Should I set up a disallow in the robots.txt for catalog search results?
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
Technical SEO | | JordanJudson0