Invisible robots.txt?
-
So here's a weird one...
Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com"
So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK!
But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before.
Question is: can a robots.txt file be stored in a way that can't be seen?
Thanks!
-
Hi Josh
Did you ever find out how this was happening?
I've got the same issue with a wordpress site.. no robots.txt visible in FTP but it is accessible in a browser to view. -
I'm seeing the meta tag that's added for the first option:
<meta name="robots" content="index, follow" />
... but I could actually access a file at domain.com/robots.txt that had the content mentioned above. When I logged in via FTP, it wasn't there. I added an actual file there with the correct information and reloaded it to make sure it was showing the correct information.
I tested it on my local install and I'm not seeing a robots file being generated.
Very odd!
-
Yes, you probably answered your own question. In WordPress, there are two different settings under Settings > Privacy:
-
I would like my site visible to everyone, including search engines and archivers.
-
I would like to block search engines, but allow normal visitors
If option #2 was selected, WordPress doesn't create a robots.txt file for you but instead it automatically generates a tag on every single page.
I hope that helps!
-
-
Just make sure you don't set that Privacy setting in a live directory. It takes weeks/months to fully recover.
-
This is interesting. I am currently working on the robots.txt and testing it for different purposes. I also thought to do some test with wordpress websites as well so thanks for the update I’ll keep that in mind before actually testing different stuff.
Thanks!
-
I should mention that this is a WordPress site and, with that, I may have answered my own question. Perhaps WordPress generates a robots.txt dynamically when the setting is active at Settings > Privacy?
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt allows wp-admin/admin-ajax.php
Hello, Mozzers!
Technical SEO | | AndyKubrin
I noticed something peculiar in the robots.txt used by one of my clients: Allow: /wp-admin/admin-ajax.php What would be the purpose of allowing a search engine to crawl this file?
Is it OK? Should I do something about it?
Everything else on /wp-admin/ is disallowed.
Thanks in advance for your help.
-AK:2 -
Multiple robots.txt files on server
Hi! I have previously hired a developer to put up my site and noticed afterwards that he did not know much about SEO. This lead me to starting to learn myself and applying some changes step by step. One of the things I am currently doing is inserting sitemap reference in robots.txt file (which was not there before). But just now when I wanted to upload the file via FTP to my server I found multiple ones - in different sizes - and I dont know what to do with them? Can I remove them? I have downloaded and opened them and they seem to be 2 textfiles and 2 dupplicates. Names: robots.txt (original dupplicate)
Technical SEO | | mjukhud
robots.txt-Original (original)
robots.txt-NEW (other content)
robots.txt-Working (other content dupplicate) Would really appreciate help and expertise suggestions. Thanks!0 -
Robots.txt and Multiple Sitemaps
Hello, I have a hopefully simple question but I wanted to ask to get a "second opinion" on what to do in this situation. I am working on a clients robots.txt and we have multiple sitemaps. Using yoast I have my sitemap_index.xml and I also have a sitemap-image.xml I do put them in google and bing by hand but wanted to have it added into the robots.txt for insurance. So my question is, when having multiple sitemaps called out on a robots.txt file does it matter if one is before the other? From my reading it looks like you can have multiple sitemaps called out, but I wasn't sure the best practice when writing it up in the file. Example: User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /wp-content/plugins/ Sitemap: http://sitename.com/sitemap_index.xml Sitemap: http://sitename.com/sitemap-image.xml Thanks a ton for the feedback, I really appreciate it! :) J
Technical SEO | | allstatetransmission0 -
Two META Robots tags on a page - which will win?
Hi, Does anybody know which meta-robots tag will "win" if there is more than one on a page? The situation:
Technical SEO | | jmueller
our CMS is not very flexible and so we have segments of META-Tags on the page that originate from templates.
Now any author can add any meta-tag from within his article-editor.
The logic delivering the pages does not care if there might be more than one meta-robots tag present (one from template, one from within the article). Now we could end up with something like this: Which one will be regarded by google & co?
First?
Last?
None? Thanks a lot,
Jan0 -
"Extremely high number of URLs" warning for robots.txt blocked pages
I have a section of my site that is exclusively for tracking redirects for paid ads. All URLs under this path do a 302 redirect through our ad tracking system: http://www.mysite.com/trackingredirect/blue-widgets?ad_id=1234567 --302--> http://www.mysite.com/blue-widgets This path of the site is blocked by our robots.txt, and none of the pages show up for a site: search. User-agent: * Disallow: /trackingredirect However, I keep receiving messages in Google Webmaster Tools about an "extremely high number of URLs", and the URLs listed are in my redirect directory, which is ostensibly not indexed. If not by robots.txt, how can I keep Googlebot from wasting crawl time on these millions of /trackingredirect/ links?
Technical SEO | | EhrenReilly0 -
How ro write a robots txt file to point to your site map
Good afternoon from still wet & humid wetherby UK... I want to write a robots text file that instruct the bots to index everything and give a specific location to the sitemap. The sitemap url is:http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspx Is this correct: User-agent: *
Technical SEO | | Nightwing
Disallow:
SITEMAP: http://business.leedscityregion.gov.uk/CMSPages/GoogleSiteMap.aspx Any insight welcome 🙂0 -
Wordpress Robots.txt Sitemap submission?
Alright, my question comes directly from this article by SEOmoz http://www.seomoz.org/learn-seo/r... Yes, I have submitted the sitemap to google, bing's webmaster tools and and I want to add the location of our site's sitemaps and does it mean that I erase everything in the robots.txt right now and replace it with? <code>User-agent: * Disallow: Sitemap: http://www.example.com/none-standard-location/sitemap.xml</code> <code>???</code> because Wordpress comes with some default disallows like wp-admin, trackback, plugins. I have also read this, but was wondering if this is the correct way to add sitemap on Wordpress Robots.txt. [http://www.seomoz.org/q/removing-...](http://www.seomoz.org/q/removing-robots-txt-on-wordpress-site-problem) I am using Multisite with Yoast plugin so I have more than one sitemap.xml to submit Do I erase everything in Robots.txt and replace it with how SEOmoz recommended? hmm that sounds not right. like <code> <code>
Technical SEO | | joony2008
<code>User-agent: *
Disallow: </code> Sitemap: http://www.example.com/sitemap_index.xml</code> <code>``` Sitemap: http://www.example.com/sub/sitemap_index.xml ```</code> <code>?????????</code> ```</code>0 -
Restricted by robots.txt does this cause problems?
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
Technical SEO | | ocelot0