Help with Robots.txt On a Shared Root
-
Hi,
I posted a similar question last week asking about subdomains but a couple of complications have arisen.
Two different websites I am looking after share the same root domain which means that they will have to share the same robots.txt. Does anybody have suggestions to separate the two on the same file without complications? It's a tricky one.
Thank you in advance.
-
Okay so if you have one root domain you can only have one robots.txt file.
The reason I asked for an example is in the case there was something you could put in the robots.txt to differentiate the two.
For example if you have
thisdomain.com and thatdomain.com
However if "thatdomain.com" uses a folder called shop ("thatdomain.com/shop") than you could prefix all your robots.txt file entries with /shop provided that "thisdomain.com" doesn't use the folder shop, Then all the /shop entries would only be applicable to "thatdomain.com". Does this make sense?
Don
-
It's not so much that one is a subdomain, it's that they are as different as Google and Yahoo yet they share the same root. I wish I could show you but I can't because of confidentiality.
The 303 wasn't put in place by me, I would have strongly suggested another method. I think it was set up so that both websites could be controlled from the same login but it's opened a can of worms for SEO.
I don't want the two separate robots files, the developer insists it has to be that way.
-
Can you provide me an example of the way the domains look... Specifically where the root pages are.
Additionally, if you are redirecting 303 one of the domains to the other why do you want two different robots.txt files? The one being 303 will always redirect to the other...?
Depending on the structures you can create one robots.txt file that deals with 2 different domains provided there is something unique about the root folders.
-
Thanks for your help so far.
The two different websites are different name domains but share the same root as it's been built this way on Typo3. I don't know of the developer's justification for the 303, it's something I wish we could change.
I'm not sure if there are specific tags you can put in the sole robots.txt to differentiate the two, have read a few conflicting arguments about how to do it.
-
Okay so if you're using a 303 then you're saying the content you want for X site is actually located at Y site.Which means you do not have 2 different sub domains. So there is no need for 2 robots.txt files and your developer is correct you can't use 2 robots.txt files. Since one site would be pointing to the other you only have one sub-domain.
However, 303 is in general a poor way to use a redirect and likely should be 301.. but I would have to understand why the 303 is being used to say that with 100% certainty. See a quick article about 303 here..
Hope this answers the question,
Don
-
It's Fasthosts. The developer is certain that we can't use the two separate robots files. The second website has been set up on a 303.
-
What host are you using?
-
The developer of the website insists that they have to share the same robots.txt, I am really not sure how he's set it up this way. I am beyond befuddled with this!
-
The subdomain has to be separated from the root in some fashion. I would assume depending on your host that there is a separate folder for the subdomain stuff. Otherwise it would be chaos. Say you installed forums on your forum subdomain and a e-commerce on your shop subdomain... which index.php page would be served?
There has to be some separation, review your file manager and look for the sub-domain folders. Once found you simply put a robots.txt into each of those folders.
Hope this helps,
Don
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Scary bug in search console: All our pages reported as being blocked by robots.txt after https migration
We just migrated to https and created 2 days ago a new property in search console for the https domain. Webmaster Tools account for the https domain now shows for every page in our sitemap the warning: "Sitemap contains urls which are blocked by robots.txt."Also in the dashboard of the search console it shows a red triangle with warning that our root domain would be blocked by robots.txt. 1) When I test the URLs in search console robots.txt test tool all looks fine.2) When I fetch as google and render the page it renders and indexes without problem (would not if it was really blocked in robots.txt)3) We temporarily completely emptied the robots.txt, submitted it in search console and uploaded sitemap again and same warnings even though no robots.txt was online4) We run screaming frog crawl on whole website and it indicates that there is no page blocked by robots.txt5) We carefully revised the whole robots.txt and it does not contain any row that blocks relevant content on our site or our root domain. (same robots.txt was online for last decade in http version without problem)6) In big webmaster tools I could upload the sitemap and so far no error reported.7) we resubmitted sitemaps and same issue8) I see our root domain already with https in google SERPThe site is https://www.languagecourse.netSince the site has significant traffic, if google would really interpret for any reason that our site is blocked by robots we will be in serious trouble.
Intermediate & Advanced SEO | | lcourse
This is really scary, so even if it is just a bug in search console and does not affect crawling of the site, it would be great if someone from google could have a look into the reason for this since for a site owner this really can increase cortisol to unhealthy levels.Anybody ever experienced the same problem?Anybody has an idea where we could report/post this issue?0 -
Looking for help with my website
Hi does any one know of a good seo company that will get results, i.e., fix site issues and get the site improving in the serps.
Intermediate & Advanced SEO | | Taiger0 -
KW density and idiot clients. HELP!!!!
I have a client who insists on using KW1 @ a 3% rate in a 600-word piece, aka 18 references to KW1 in a two page piece. I upped the KW1 count to 18, but in doing so, added 100 words of text, getting the piece to 700 words. Now the client wants 21 KW1 appearances to maintain that 3% density. If I add 3 more KW1's, I'll up the word count again, requiring more KW1's to hit the 3% mark. Any suggestions for solving the never-ending problem of KW density and idiot clients? Thanks in advance. Paul
Intermediate & Advanced SEO | | webwordslinger0 -
Will disallowing in robots.txt noindex a page?
Google has indexed a page I wish to remove. I would like to meta noindex but the CMS isn't allowing me too right now. A suggestion o disallow in robots.txt would simply stop them crawling I expect or is it also an instruction to noindex? Thanks
Intermediate & Advanced SEO | | Brocberry0 -
Robots.txt file - How to block thosands of pages when you don't have a folder path
Hello.
Intermediate & Advanced SEO | | Unity
Just wondering if anyone has come across this and can tell me if it worked or not. Goal:
To block review pages Challenge:
The URLs aren't constructed using folders, they look like this:
www.website.com/default.aspx?z=review&PG1234
www.website.com/default.aspx?z=review&PG1235
www.website.com/default.aspx?z=review&PG1236 So the first part of the URL is the same (i.e. /default.aspx?z=review) and the unique part comes immediately after - so not as a folder. Looking at Google recommendations they show examples for ways to block 'folder directories' and 'individual pages' only. Question:
If I add the following to the Robots.txt file will it block all review pages? User-agent: *
Disallow: /default.aspx?z=review Much thanks,
Davinia0 -
Accidental Noindex/Mis-Canonicalisation - Please help!
Hi everybody, I was hoping somebody might be able to help as this is an issue my team and I have never come across before. A client of ours recently migrated to a new site design. 301 redirects were properly implemented and the transition was fairly smooth. However, we realised soon after that a sub-section of pages had either one or both of the following errors: They featured a canonical tag pointing to the wrong page They featured the 'meta noindex' tag After realising this, both the canonicals and the noindex tags were immediately removed. However, Google crawled the site while these were in place and the pages subsequently dropped out of Google's index. We re-submitted the affected pages to Google's index and used WMT to 'Fetch' the pages as Google. We have also since 'allowed' the pages in the robots.txt file as an extra measure. We found that the pages which just had the noindex tag were immediately re-indexed, while the pages which featured the noindex tag and which were mis-canonicalised are still not being re-indexed. Can anyone think of a reason why this might be the case? One of the pages which featured both tags was one of our most important organic landing pages, so we're eager to resolve this. Any help or advice would be appreciated. Thanks!
Intermediate & Advanced SEO | | robmarsden0 -
Search Engine Blocked by robots.txt for Dynamic URLs
Today, I was checking crawl diagnostics for my website. I found warning for search engine blocked by robots.txt I have added following syntax to robots.txt file for all dynamic URLs. Disallow: /*?osCsid Disallow: /*?q= Disallow: /*?dir= Disallow: /*?p= Disallow: /*?limit= Disallow: /*review-form Dynamic URLs are as follow. http://www.vistastores.com/bar-stools?dir=desc&order=position http://www.vistastores.com/bathroom-lighting?p=2 and many more... So, Why should it shows me warning for this? Does it really matter or any other solution for these kind of dynamic URLs.
Intermediate & Advanced SEO | | CommercePundit0 -
Category Pages - Canonical, Robots.txt, Changing Page Attributes
A site has category pages as such: www.domain.com/category.html, www.domain.com/category-page2.html, etc... This is producing duplicate meta descriptions (page titles have page numbers in them so they are not duplicate). Below are the options that we've been thinking about: a. Keep meta descriptions the same except for adding a page number (this would keep internal juice flowing to products that are listed on subsequent pages). All pages have unique product listings. b. Use canonical tags on subsequent pages and point them back to the main category page. c. Robots.txt on subsequent pages. d. ? Options b and c will orphan or french fry some of our product pages. Any help on this would be much appreciated. Thank you.
Intermediate & Advanced SEO | | Troyville0