Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt File Redirects to Home Page
-
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering:
Is there a benfit to setup your robots.txt file to do this?
Will this effect how their site will get indexed?
Thanks for your response!
- Kyle
Site URL:
-
Yep, if you add a robots.txt it won't redirect. But I would look to remove the 404 redirect as well. It also looks to me like a meta refresh as well which has potential SEO problems. I would much prefer a 301 if they are really keen to redirect 404s.
The main reason for not redirecting 404s is that it stops you from seeing broken links on your website. Imagine you have a discreet link to a services page that is broken - you wouldn't be able to pick it up with link checkers like Xenu and it could go unnoticed for months if not years. Might be worth suggesting to them that they remove it.
-
This is not a normal behavior, you should respond to robots.txt, put the sitemap link in there or simply :
User-agent: *
Disallow:The actual robots.txt gives :
GET robots.txt 302 Found, which redirects to :
GET 404error.html 200 Ok, which redirect to the home with browser behavior :
<meta http-equiv="refresh" content="0;url=/">
You better change this to a normal response

-
Thanks for the input! I haven't had a chance to view their .htaccess file. I am still in the early stages of reviewing their site. I just wasn't sure if their would be a technical reason for them to do this or if it just happened by accident. It sounds like adding a basic robots.txt file would be the appropriate solution.
-
1. I wouldnt advise redirecting the robots.txt to redirect to home page. It seems that they hve a dynamic 404 redirect system - which when a URL doesnt exist the site redirects it to home. There are god and bad points about this strategy, hoever I would prefer NOT to do it.
2. Re getting site indexed - no it wouldnt hurt them, but would give you much less control over the robots directive, in case you want to add custom instructions. If Google crawlers cant get to it (as in its not user agent cloaked to allow the google bot) you will not be able to do so (eg excluding pages from being indexed via robots wont be ossible).
-
I would be surprised if they purposefully redirected it. Have you been able to take a look at what's in the .htaccess file? If you copy and paste what's in there I might be able to see what's going on with it.
Also, if it is being redirected then it won't get crawled and so it won't have any effect. That could be good or bad depending on what you had written in the .txt file.
EDIT:
Just had a quick look at the site. It seems to 404 straight away and then redirect. Therefore I imagine the robots.txt file doesn't exist and they have it set up to redirect 404ing pages to the homepage. Something that I would advise against (it's useful to know what's 404ing).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Keywords are indexed on the home page
Hello everyone, For one of our websites, we have optimized for many keywords. However, it seems that every keyword is indexed on the home page, and thus not ranked properly. This occurs only on one of our many websites. I am wondering if anyone knows the cause of this issue, and how to solve it. Thank you.
Technical SEO | | Ginovdw1 -
1000 Pages on old website. What to do with the 301 redirects for this domain?
Hi Moz Community, I have a 301 redirect question... I just acquired an old domain: Totally in my niche Domain is 14 years old Website exists of 1000 pages Great amount of backlinks Website is offline since about 2 weeks Will place a new website online asap with new url structure For the 50 best scoring pages I wrote a new, but fully comparable/related article. I will put a 301 redirect from those old to the new pages. My question: What to do with the 950 other url's? Should I put a 301 redirect to the homepage? Should I forward those pages to the 404 page? Should I divide the 950 url's with a 301 redirect to the 50 new ones? Another solution maybe? Any idea what would be the best solution so we can save as much Google juice as possible? Thanks in advance!
Technical SEO | | snorkel0 -
2 sitemaps on my robots.txt?
Hi, I thought that I just could link one sitemap from my site's robots.txt but... I may be wrong. So, I need to confirm if this kind of implementation is right or wrong: robots.txt for Magento Community and Enterprise ...
Technical SEO | | Webicultors
Sitemap: http://www.mysite.es/media/sitemap/es.xml
Sitemap: http://www.mysite.pt/media/sitemap/pt.xml Thanks in advance,0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Remove html file extension and 301 redirects
Hi Recently I ask for some work done on my website from a company, but I am not sure what they've done is right.
Technical SEO | | ulefos
What I wanted was html file extensions to be removed like
/ash-logs.html to /ash-logs
also the index.html to www.timports.co.uk
I have done a crawl diagnostics and have duplicate page content and 32 page title duplicates. This is so doing my head in please help This is what is in the .htaccess file <ifmodule pagespeed_module="">ModPagespeed on
ModPagespeedEnableFilters extend_cache,combine_css, collapse_whitespace,move_css_to_head, remove_comments</ifmodule> <ifmodule mod_headers.c="">Header set Connection keep-alive</ifmodule> <ifmodule mod_rewrite.c="">Options +FollowSymLinks -MultiViews</ifmodule> DirectoryIndex index.html RewriteEngine On
# Rewrite valid requests on .html files RewriteCond %{REQUEST_FILENAME}.html -f RewriteRule ^ %{REQUEST_URI}.html?rw=1 [L,QSA]
# Return 404 on direct requests against .html files RewriteCond %{REQUEST_URI} .html$
RewriteCond %{QUERY_STRING} !rw=1 [NC]
RewriteRule ^ - [R=404] AddCharset UTF-8 .html # <filesmatch “.(js|css|html|htm|php|xml|swf|flv|ashx)$”="">#SetOutputFilter DEFLATE #</filesmatch> <ifmodule mod_expires.c="">ExpiresActive On
ExpiresByType image/gif "access plus 1 years"
ExpiresByType image/jpeg "access plus 1 years"
ExpiresByType image/png "access plus 1 years"
ExpiresByType image/x-icon "access plus 1 years"
ExpiresByType image/jpg "access plus 1 years"
ExpiresByType text/css "access 1 years"
ExpiresByType text/x-javascript "access 1 years"
ExpiresByType application/javascript "access 1 years"
ExpiresByType image/x-icon "access 1 years"</ifmodule> <files 403.shtml="">order allow,deny allow from all</files> redirect 301 /PRODUCTS http://www.timports.co.uk/kiln-dried-logs
redirect 301 /kindling_firewood.html http://www.timports.co.uk/kindling-firewood.html
redirect 301 /about_us.html http://www.timports.co.uk/about-us.html
redirect 301 /log_delivery.html http://www.timports.co.uk/log-delivery.html redirect 301 /oak_boards_delivery.html http://www.timports.co.uk/oak-boards-delivery.html
redirect 301 /un_edged_oak_boards.html http://www.timports.co.uk/un-edged-oak-boards.html
redirect 301 /wholesale_logs.html http://www.timports.co.uk/wholesale-logs.html redirect 301 /privacy_policy.html http://www.timports.co.uk/privacy-policy.html redirect 301 /payment_failed.html http://www.timports.co.uk/payment-failed.html redirect 301 /payment_info.html http://www.timports.co.uk/payment-info.html1 -
OK to block /js/ folder using robots.txt?
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU) But what if you have lots and lots of JS and you dont want to waste precious crawl resources? Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc. And the legacy versions show up in Google Webmaster Tools as 404s. For example: http://www.discoverafrica.com/js/global_functions.js?v=1.1
Technical SEO | | AndreVanKets
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1 Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether? Isn't that what robots.txt was made for? Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks. We're just trying to power our content and UX elegantly with javascript. What do you guys say: Obey Matt? Or run the javascript gauntlet?0 -
Invisible robots.txt?
So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!
Technical SEO | | joshcanhelp0 -
Robots.txt file getting a 500 error - is this a problem?
Hello all! While doing some routine health checks on a few of our client sites, I spotted that a new client of ours - who's website was not designed built by us - is returning a 500 internal server error when I try to look at the robots.txt file. As we don't host / maintain their site, I would have to go through their head office to get this changed, which isn't a problem but I just wanted to check whether this error will actually be having a negative effect on their site / whether there's a benefit to getting this changed? Thanks in advance!
Technical SEO | | themegroup0