Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Multiple robots.txt files on server
-
Hi!
I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step.
One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names:
robots.txt (original dupplicate)
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate)Would really appreciate help and expertise suggestions. Thanks!
-
So what's the best policy if a site uses an e-commerce platform like Magento, which has a robots file, but also has a Wordpress blog installed to another folder. eg: /blog and uses a plugin like YOAST which generated a robots file of the Wordpress installation.
Then you have 2 robots files, is this detrimental or no big deal?
-
Thanks very much for the help!
-
Thanks very much for the help!
-
Keep a backup and remove them.
Search engines are only going to look at the file which is exactly called robots.txt variations of file name will be ignored.
Do make sure the entries are correct in the main one though, you don't want Google crawling admin pages or other confidential areas of the site.
-
Hi, thanks for the answer and help!
Well, I only have one domain that has a webpage and no subdomains active (no blog-subdomain or similar) - so how can I configure that to the situation? Can I just remove all and upload the one I want, maybe?
-
That's a good question, EMS. The robots.txt protocol can get kind of
confusing when you think about it too long, and it sounds like you've
thought about this a bit. However, in this case, it might help to
look at robots.txt from the perspective of the spider.When a spider finds a URL, it takes the whole domain name (everything
between 'http://' and the next '/'), then sticks a '/robots.txt' on
the end of it and looks for that file. If that file exists, then the
spider should read it to see where it is allowed to crawl.In your case, Googlebot, or any other spider, should try to access
three URLs: domainA.com/robots.txt, domainB.domainA.com/robots.txt,
and domainB.com/robots.txt. The rules in each are treated as
separate, so disallowing robots from domainA.com/ should result in
domainA.com/ being removed from search results while
domainB.domainA.com/ remains unaffected, which does not sound like not
something you want.The problem you might have with the setup you have described is this--
in order to keep domainB.domainA.com out of the results, you would
need to have domainB.domainA.com/robots.txt exclude robots, while
domainB.com/robots.txt welcomes them. This means that you would need
to have a way to make domainB.domainA.com/ and domainB.com/ serve
different information, and judging from what you've described, you
have not set up your server to do so yet.Of course, it is always possible that I have assumed to much about
your situation, so it is a good idea to use Google's robots.txt
analysis tool (see http://www.google.com/support/webmasters/bin/topic.py?topic=8475
) to see if your robots.txt files already produce the results you
want.If using robots.txt files doesn't solve the problem, and assuming that
you want to continue hosting all of your content on domainA.com, one
strategy you really should look into would be setting up a 301
redirect from the pages on domainB.domainA.com/ to domainB.com/ . If
you need more advice on how to do this with your server software, your
hosting company's tech support would definitely be the best place to
start, but this group is here to help if more isues arise.Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots txt. in page with 301 redirect
We currently have a a series of help pages that we would like to disallow from our robots txt. The thing is that these help pages are located in our old website, which now has a 301 redirect to current site. Which is the proper way to go around? 1- Add the pages we want to disallow to the robots.txt of the new website? 2- Break the redirect momentarily and add the pages to the robots.txt of the old one? Thanks
Technical SEO | | Kilgray0 -
Good robots txt for magento
Dear Communtiy, I am trying to improve the SEO ratings for my website www.rijwielcashencarry.nl (magento). My next step will be implementing robots txt to exclude some crawling pages.
Technical SEO | | rijwielcashencarry040
Does anybody have a good magento robots txt for me? And what need i copy exactly? Thanks everybody! Greetings, Bob0 -
Multiple H1 Tags on Page
Can having multiple H1 tags on a webpage be detrimental to its rankings?
Technical SEO | | AubbiefromAubenRealty0 -
Google insists robots.txt is blocking... but it isn't.
I recently launched a new website. During development, I'd enabled the option in WordPress to prevent search engines from indexing the site. When the site went public (over 24 hours ago), I cleared that option. At that point, I added a specific robots.txt file that only disallowed a couple directories of files. You can view the robots.txt at http://photogeardeals.com/robots.txt Google (via Webmaster tools) is insisting that my robots.txt file contains a "Disallow: /" on line 2 and that it's preventing Google from indexing the site and preventing me from submitting a sitemap. These errors are showing both in the sitemap section of Webmaster tools as well as the Blocked URLs section. Bing's webmaster tools are able to read the site and sitemap just fine. Any idea why Google insists I'm disallowing everything even after telling it to re-fetch?
Technical SEO | | ahockley0 -
Robots.txt to disallow /index.php/ path
Hi SEOmoz, I have a problem with my Joomla site (yeah - me too!). I get a large amount of /index.php/ urls despite using a program to handle these issues. The URLs cause indexation errors with google (404). Now, I fixed this issue once before, but the problem persist. So I thought, instead of wasting more time, couldnt I just disallow all paths containing /index.php/ ?. I don't use that extension, but would it cause me any problems from an SEO perspective? How do I disallow all index.php's? Is it a simple: Disallow: /index.php/
Technical SEO | | Mikkehl0 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0 -
On a dedicated server with multiple IP addresses, how can one address group be slow/time out and all other IP addresses OK?
We utilize a dedicated server to host roughly 60 sites on. The server is with a company that utilizes a lady who drives race cars.... About 4 months ago we realized we had a group of sites down thanks to monitoring alerts and checked it out. All were on the same IP address and the sites on the other IP address were still up and functioning well. When we contacted the support at first we were stonewalled, but eventually they said there was a problem and it was resolved within about 2 hours. Up until recently we had no problems. As a part of our ongoing SEO we check page load speed for our clients. A few days ago a client who has their site hosted by the same company was running very slow (about 8 seconds to load without cache). We ran every check we could and could not find a reason on our end. The client called the host and were told they needed to be on some other type of server (with the host) at a fee increase of roughly $10 per month. Yesterday, we noticed one group of sites on our server was down and, again, it was one IP address with about 8 sites on it. On chat with support, they kept saying it was our ISP. (We speed tested on multiple computers and were 22MB down and 9MB up +/-2MB). We ran a trace on the IP address and it went through without a problem on three occassions over about ten minutes. After about 30 minutes the sites were back up. Here's the twist: we had a couple of people in the building who were on other ISP's try and the sites came up and loaded on their machines. Does anyone have any idea as to what the issue is?
Technical SEO | | RobertFisher0 -
Should I set up a disallow in the robots.txt for catalog search results?
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
Technical SEO | | JordanJudson0