Robots.txt gone wild

wearehappymedia

Hi guys, a site we manage, http://hhhhappy.com received an alert through web master tools yesterday that it can't be crawled. No changes were made to the site.

Don't know a huge amount about the robots.txt configuration expect that using Yoast by default it sets it not to crawl wp admin folder and nothing else. I checked this against all other sites and the settings are the same. And yet 12 hours later after the issue Happy is still not being crawled and meta data is not showing in search results. Any ideas what may have triggered this?

MattRoney

Hi Radi!

Have Matt and/or Martijn answered your question? If so, please mark one or both of their responses "Good Answer."

Otherwise, what's still tripping you up?

Martijn_Scheijbeler

Have you checked the downtime of the site recently? Sometimes it could be that Google isn't able to reach your robots.txt file and because of that they'll stop crawling your site temporarily.

MattAntonino

Are you getting the message in Search Console that there were errors crawling your page?

This typically means that your host was temporarily down when Google landed on your page. These types of things happen all the time and are no big deal.

Your homepage cache shows a crawl date of today so I'm assuming things are working properly ... if you really want to find out, try doing a "Fetch" of your site in Search Console.

Crawl > Fetch as Google > Fetch (big red button)

You should get a status of "Complete." If you get anything else there should be an error message with it. If so, paste that here.

I have checked the site headers, cache, crawlability with Screaming Frog, and everything is fine. This seems like one of those temporary messages but if the problem persists definitely let us know!

wearehappymedia

Our host has just offered this response which does not get me any closer:

Hi Radi,

It looks like your site has its own robots.txt file, which is not blocking any user agents. The only thing it's doing is blocking bots from indexing your admin area:

<code>User-agent: *
Disallow: /wp-admin/</code>

This is a standard robots.txt file, and you shouldn't be having any issues with Google indexing your site from a hosting standpoint. To test this, I curled the site as Googlebot and received a 200OK response:

<code>curl -A "Googlebot/2.1" -IL [hhhhappy.com](http://hhhhappy.com)
HTTP/1.1 200 OK
Date: Sat, 05 Mar 2016 22:17:26 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Set-Cookie: __cfduid=d3177a1baa04623fb2573870f1d4b4bac1457216246; expires=Sun, 05-Mar-17 22:17:26 GMT; path=/; domain=.[hhhhappy.com](http://hhhhappy.com); HttpOnly
X-Cacheable: bot
Cache-Control: max-age=10800, must-revalidate
X-Cache: HIT: 17
X-Cache-Group: bot
X-Pingback: [http://hhhhappy.com/xmlrpc.php](http://hhhhappy.com/xmlrpc.php)
Link: <[http://hhhhappy.com/](http://hhhhappy.com/)>; rel=shortlink
Expires: Thu, 19 Nov 1981 08:52:00 GMT
X-Type: default
X-Pass-Why:
Set-Cookie: X-Mapping-fjhppofk=2C42B261F74DA203D392B5EC5BF07833; path=/
Server: cloudflare-nginx
CF-RAY: 27f0f02445920f09-IAD</code>

I didn't see any plugins on your site that looked like they would overwrite robots.txt, but I urge you to take another look at them, and then dive into your site's settings for the meta value that Googlebot would pick up. Everything on our end seems to be giving the green light.

Please let us know if you have any other questions or issues in the meantime.

Cheers,

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt gone wild

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt blocked internal resources Wordpress

Robots.txt wildcards - the devs had a disagreement - which is correct?

Using Meta Header vs Robots.txt

When you add 10.000 pages that have no real intention to rank in the SERP, should you: "follow,noindex" or disallow the whole directory through robots? What is your opinion?

Meta No INDEX and Robots - Optimizing Crawl Budget

What should I block with a robots.txt file?

High ranked web site on Google GONE - but webspam team says nothing wrong

Block an entire subdomain with robots.txt?