Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Multiple robots.txt files on server
-
Hi!
I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step.
One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names:
robots.txt (original dupplicate)
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate)Would really appreciate help and expertise suggestions. Thanks!
-
So what's the best policy if a site uses an e-commerce platform like Magento, which has a robots file, but also has a Wordpress blog installed to another folder. eg: /blog and uses a plugin like YOAST which generated a robots file of the Wordpress installation.
Then you have 2 robots files, is this detrimental or no big deal?
-
Thanks very much for the help!
-
Thanks very much for the help!
-
Keep a backup and remove them.
Search engines are only going to look at the file which is exactly called robots.txt variations of file name will be ignored.
Do make sure the entries are correct in the main one though, you don't want Google crawling admin pages or other confidential areas of the site.
-
Hi, thanks for the answer and help!
Well, I only have one domain that has a webpage and no subdomains active (no blog-subdomain or similar) - so how can I configure that to the situation? Can I just remove all and upload the one I want, maybe?
-
That's a good question, EMS. The robots.txt protocol can get kind of
confusing when you think about it too long, and it sounds like you've
thought about this a bit. However, in this case, it might help to
look at robots.txt from the perspective of the spider.When a spider finds a URL, it takes the whole domain name (everything
between 'http://' and the next '/'), then sticks a '/robots.txt' on
the end of it and looks for that file. If that file exists, then the
spider should read it to see where it is allowed to crawl.In your case, Googlebot, or any other spider, should try to access
three URLs: domainA.com/robots.txt, domainB.domainA.com/robots.txt,
and domainB.com/robots.txt. The rules in each are treated as
separate, so disallowing robots from domainA.com/ should result in
domainA.com/ being removed from search results while
domainB.domainA.com/ remains unaffected, which does not sound like not
something you want.The problem you might have with the setup you have described is this--
in order to keep domainB.domainA.com out of the results, you would
need to have domainB.domainA.com/robots.txt exclude robots, while
domainB.com/robots.txt welcomes them. This means that you would need
to have a way to make domainB.domainA.com/ and domainB.com/ serve
different information, and judging from what you've described, you
have not set up your server to do so yet.Of course, it is always possible that I have assumed to much about
your situation, so it is a good idea to use Google's robots.txt
analysis tool (see http://www.google.com/support/webmasters/bin/topic.py?topic=8475
) to see if your robots.txt files already produce the results you
want.If using robots.txt files doesn't solve the problem, and assuming that
you want to continue hosting all of your content on domainA.com, one
strategy you really should look into would be setting up a 301
redirect from the pages on domainB.domainA.com/ to domainB.com/ . If
you need more advice on how to do this with your server software, your
hosting company's tech support would definitely be the best place to
start, but this group is here to help if more isues arise.Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Role of Robots.txt and Search Console parameters settings
Hi, wondering if anyone can point me to resources or explain the difference between these two. If a site has url parameters disallowed in Robots.txt is it redundant to edit settings in Search Console parameters to anything other than "Let Googlebot Decide"?
Technical SEO | | LivDetrick0 -
Crawl solutions for landing pages that don't contain a robots.txt file?
My site (www.nomader.com) is currently built on Instapage, which does not offer the ability to add a robots.txt file. I plan to migrate to a Shopify site in the coming months, but for now the Instapage site is my primary website. In the interim, would you suggest that I manually request a Google crawl through the search console tool? If so, how often? Any other suggestions for countering this Meta Noindex issue?
Technical SEO | | Nomader1 -
Redirect multiple domains to 1 domain or not?
Hi there, I have client who has multiple domains that already have some PA and DA. Problem is that most websites have the same content and rank better on different keywords.
Technical SEO | | Leaf-a-mark
I want to redirect all the websites to 1 domain because it’s easier to manage and it removes any duplicate content. Question is if I redirect domain x to domain y do the rankings of domain x increase on domain y? Or is it better to keep domain x separately to generate more referral traffic to domain y? Thanks in advance! Cheers0 -
Robots.txt on subdomains
Hi guys! I keep reading conflicting information on this and it's left me a little unsure. Am I right in thinking that a website with a subdomain of shop.sitetitle.com will share the same robots.txt file as the root domain?
Technical SEO | | Whittie0 -
Multiple H1 tags in Squarespace
Hi. I'm using Squarespace, and I've noticed they assign the page title and site title h1 tag status. So if I add an on-page h1 tag, that's three in total. I've seen what Matt Cutts said about multiple h1 tags being acceptable (although that video was back in 2009 and a lot has changed since then). But I'm still a little concerned that this is perhaps not the best way of structuring for SEO. Could anyone offer me any advice? Thanks.
Technical SEO | | The_Word_Department0 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Converting files from .html to .php or editing .htaccess file
Good day all, I have a bunch of files that are .html and I want to add some .php to them. It seems my 2 options are Convert .html to .php and 301 redirect or add this line of code to my .htaccess file and keep all files that are .html as .html AddType application/x-httpd-php .html My gut is that the 2nd way is better so as not alter any SEO rankings, but wanted to see if anybody had any experience with this line of code in their .htaccess file as definitely don't wan to mess up my entire site 🙂 Thanks for any help! John
Technical SEO | | JohnHerrigel0 -
On a dedicated server with multiple IP addresses, how can one address group be slow/time out and all other IP addresses OK?
We utilize a dedicated server to host roughly 60 sites on. The server is with a company that utilizes a lady who drives race cars.... About 4 months ago we realized we had a group of sites down thanks to monitoring alerts and checked it out. All were on the same IP address and the sites on the other IP address were still up and functioning well. When we contacted the support at first we were stonewalled, but eventually they said there was a problem and it was resolved within about 2 hours. Up until recently we had no problems. As a part of our ongoing SEO we check page load speed for our clients. A few days ago a client who has their site hosted by the same company was running very slow (about 8 seconds to load without cache). We ran every check we could and could not find a reason on our end. The client called the host and were told they needed to be on some other type of server (with the host) at a fee increase of roughly $10 per month. Yesterday, we noticed one group of sites on our server was down and, again, it was one IP address with about 8 sites on it. On chat with support, they kept saying it was our ISP. (We speed tested on multiple computers and were 22MB down and 9MB up +/-2MB). We ran a trace on the IP address and it went through without a problem on three occassions over about ten minutes. After about 30 minutes the sites were back up. Here's the twist: we had a couple of people in the building who were on other ISP's try and the sites came up and loaded on their machines. Does anyone have any idea as to what the issue is?
Technical SEO | | RobertFisher0