Robots.txt - Googlebot - Allow... what's it for?

McTaggart

Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke

User-Agent: Googlebot

Allow: /.js

Allow: /.css

McTaggart

Thanks Tom - that's very useful - appreciated - and thanks also Clever PhD re: the robots.txt tester info - Luke

CleverPhD

Just as a follow-up to Tom's great post. If you were wanting to test a robots.txt setup, especially if you were using a wildcard or using an allow combined with a disallow, Google Search Console under the Crawl section has a robots.txt Tester. You will see your most recent robots.txt file there that Google has a copy of. You can then modify that version and then enter a URL at the bottom to see if everything is set correctly or not. It is pretty handy, especially if you have a big robots.txt file. Note that this tool does not change how Google crawls your site or your robots.txt file, it is just for testing. Once you find the configuration that works, you would still need to update the robots.txt on your server.

TomRayner

Hi Luke

As you have correctly assumed, that particular robots command would be pointless.

The Googlebot does follow allow commands (while other ones do not), but it should only be used if it is an exception to a disallow rule.

So, for example, if you had a rule that blocked pages within a sub-directory, with:

Disallow: /example/*

You could create an allow rule that indexes a specific page within that directory to be indexed, like:

Allow: /example/page.html

Couple of things to point out here. "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule." (Google Source). In this example, because the more specific rule is the allow rule, that will prevail. It is also best practice to put your "allow" rules at the top of the robots.txt file.

But in your example, if they have allow rules for JS and CSS files without having disavow rules for those directories/paths etc - it's a waste of space. Google will attempt to crawl anything it can by default - unless you disavow access.

TL;DR - You don't need to proactively tell Google to crawl CSS and JS - it will by default.

Hope this helps.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt - Googlebot - Allow... what's it for?

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Robots.txt & Disallow: /*? Question!

Is it necessary to use Google's Structured Data Markup or alternative for my B2B site?

Default Robots.txt in WordPress - Should i change it??

Why are our sites top landing pages URL's that no longer exist and retrun 404 errors?

Is it worth submitting a blog's RSS feed...

Canonical URL's - Do they need to be on the "pointed at" page?

Best solution to get mass URl's out the SE's index

Questions regarding Google's "improved url handling parameters"