Invisible robots.txt?
-
So here's a weird one...
Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com"
So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK!
But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before.
Question is: can a robots.txt file be stored in a way that can't be seen?
Thanks!
-
Hi Josh
Did you ever find out how this was happening?
I've got the same issue with a wordpress site.. no robots.txt visible in FTP but it is accessible in a browser to view. -
I'm seeing the meta tag that's added for the first option:
<meta name="robots" content="index, follow" />
... but I could actually access a file at domain.com/robots.txt that had the content mentioned above. When I logged in via FTP, it wasn't there. I added an actual file there with the correct information and reloaded it to make sure it was showing the correct information.
I tested it on my local install and I'm not seeing a robots file being generated.
Very odd!
-
Yes, you probably answered your own question. In WordPress, there are two different settings under Settings > Privacy:
-
I would like my site visible to everyone, including search engines and archivers.
-
I would like to block search engines, but allow normal visitors
If option #2 was selected, WordPress doesn't create a robots.txt file for you but instead it automatically generates a tag on every single page.
I hope that helps!
-
-
Just make sure you don't set that Privacy setting in a live directory. It takes weeks/months to fully recover.
-
This is interesting. I am currently working on the robots.txt and testing it for different purposes. I also thought to do some test with wordpress websites as well so thanks for the update I’ll keep that in mind before actually testing different stuff.
Thanks!
-
I should mention that this is a WordPress site and, with that, I may have answered my own question. Perhaps WordPress generates a robots.txt dynamically when the setting is active at Settings > Privacy?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Blocking subdomains with Robots.txt file
We noticed that Google is indexing our pre-production site ibweb.prod.interstatebatteries.com in addition to indexing our main site interstatebatteries.com. Can you all help shed some light on the proper way to no-index our pre-prod site without impacting our live site?
Technical SEO | | paulwatley0 -
Shopify robots blocking stylesheets causing inconsistent mobile-friendly test results?
One of our shopify sites suffered an extreme rankings drop. Recent Google algorithm updates include mobile first so I tested the site and our team got different mobile-friendly test results. However, search console is also flagging pages as not mobile friendly. So, while us end-users see the site as OK on mobile, this may not be the case for Google? I researched more about inconsistent mobile test results and found answers that say it may be due to robots.txt blocking stylesheets. Do you recognise any directory blocked that might be affecting Google's rendering? We can't edit shopify robots.txt unfortunately. Our dev said the only thing that stands out to him is Disallow: /design_theme_id and the rest shouldn't be hindering Google bots. Here are some of the files blocked: Disallow: /admin
Technical SEO | | nhhernandez
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /9103034/checkouts
Disallow: /9103034/orders
Disallow: /carts
Disallow: /account
Disallow: /collections/+
Disallow: /collections/%2B
Disallow: /collections/%2b
Disallow: /blogs/+
Disallow: /blogs/%2B
Disallow: /blogs/%2b
Disallow: /design_theme_id
Disallow: /preview_theme_id
Disallow: /preview_script_id
Disallow: /discount/*
Disallow: /gift_cards/*
Disallow: /apple-app-site-association0 -
Robots.txt Disallow: / in Search Console
Two days ago I found out through search console that my website's Robots.txt has changed to User-agent: *
Technical SEO | | RAN_SEO
Disallow: / When I check the robots.txt in the website it looks fine - I see its blocked just in search console( in the robots.txt tester). when I try to do fetch as google to the homepage I see its blocked. Any ideas why would robots.txt block my website? it was fine until the weekend. before that, in the last 3 months I saw I had blocked resources in the website and I brought back pages with fetch as google. Any ideas?0 -
Two META Robots tags on a page - which will win?
Hi, Does anybody know which meta-robots tag will "win" if there is more than one on a page? The situation:
Technical SEO | | jmueller
our CMS is not very flexible and so we have segments of META-Tags on the page that originate from templates.
Now any author can add any meta-tag from within his article-editor.
The logic delivering the pages does not care if there might be more than one meta-robots tag present (one from template, one from within the article). Now we could end up with something like this: Which one will be regarded by google & co?
First?
Last?
None? Thanks a lot,
Jan0 -
'External nofollow' in a robots meta tag? (advertorial links)
I believe this has never worked? It'd be an easy way of preventing any penalties from Google's recent crackdown on paid links via advertorials. When it's not possible to nofollow each external link individually, what are people doing? Nofollowing and/or noindexing the whole page?
Technical SEO | | Alex-Harford0 -
Meta-robots Nofollow on logins and admins
In my SEO MOZ reports I am getting over 400 errors as Meta-robots Nofollow. These are all leading to my admin login page which I do not want robots in. Should I put some code on these pages so the robots know this and don't attempt to and I do not get these errors in my reports?
Technical SEO | | Endora0 -
What are your thoughts on security of placing CMS-related folders in a robots.txt file?
So I was just about to add a whole heap of CMS-related folders to my robots.txt file to exclude them from search, and thought "hey, I'm publicly telling people where my admin folders are"...surely that's not right?! Should I leave them out of the robots.txt file, and hope for the best that they never get indexed? Should I use noindex meta data on every page? What are people's thoughts? Thanks, James PS. I know this is similar to lots of other discussions around meta noindex vs. robots.txt, but I'm after specific thoughts around the security aspect of listing your admin folders in a robots.txt file...
Technical SEO | | James-Distinction0 -
Robots exclusion
Hi All, I have an issue whereby print versions of my articles are being flagged up as "duplicate" content / page titles. In order to get around this, I feel that the easiest way is to just add them to my robots.txt document with a disallow. Here is my URL make up: Normal article: www.mysite.com/displayarticle=12345 Print version of my article www.mysite.com/displayarticle=12345&printversion=yes I know that having dynamic parameters in my URL is not best practise to say the least, but I'm stuck with this for the time being... My question is, how do I add just the print versions of articles to my robots file without disallowing articles too? Can I just add the parameter to the document like so? Disallow: &printversion=yes I also know that I can do add a meta noindex, nofollow tag into the head of my print versions, but I feel a robots.txt disallow will be somewhat easier... Many thanks in advance. Matt
Technical SEO | | Horizon0