Robots.txt - Googlebot - Allow... what's it for?
-
Hello - I just came across this in robots.txt for the first time, and was wondering why it is used? Why would you have to proactively tell Googlebot to crawl JS/CSS and why would you want it to? Any help would be much appreciated - thanks, Luke
User-Agent: Googlebot
Allow: /.js
Allow: /.css
-
Thanks Tom - that's very useful - appreciated - and thanks also Clever PhD re: the robots.txt tester info - Luke
-
Just as a follow-up to Tom's great post. If you were wanting to test a robots.txt setup, especially if you were using a wildcard or using an allow combined with a disallow, Google Search Console under the Crawl section has a robots.txt Tester. You will see your most recent robots.txt file there that Google has a copy of. You can then modify that version and then enter a URL at the bottom to see if everything is set correctly or not. It is pretty handy, especially if you have a big robots.txt file. Note that this tool does not change how Google crawls your site or your robots.txt file, it is just for testing. Once you find the configuration that works, you would still need to update the robots.txt on your server.
-
Hi Luke
As you have correctly assumed, that particular robots command would be pointless.
The Googlebot does follow allow commands (while other ones do not), but it should only be used if it is an exception to a disallow rule.
So, for example, if you had a rule that blocked pages within a sub-directory, with:
Disallow: /example/*
You could create an allow rule that indexes a specific page within that directory to be indexed, like:
Allow: /example/page.html
Couple of things to point out here. "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule." (Google Source). In this example, because the more specific rule is the allow rule, that will prevail. It is also best practice to put your "allow" rules at the top of the robots.txt file.
But in your example, if they have allow rules for JS and CSS files without having disavow rules for those directories/paths etc - it's a waste of space. Google will attempt to crawl anything it can by default - unless you disavow access.
TL;DR - You don't need to proactively tell Google to crawl CSS and JS - it will by default.
Hope this helps.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
What's the best way to noindex pages but still keep backlinks equity?
Hello everyone, Maybe it is a stupid question, but I ask to the experts... What's the best way to noindex pages but still keep backlinks equity from those noindexed pages? For example, let's say I have many pages that look similar to a "main" page which I solely want to appear on Google, so I want to noindex all pages with the exception of that "main" page... but, what if I also want to transfer any possible link equity present on the noindexed pages to the main page? The only solution I have thought is to add a canonical tag pointing to the main page on those noindexed pages... but will that work or cause wreak havoc in some way?
Intermediate & Advanced SEO | | fablau3 -
Add noindex,nofollow prior to removing pages resulting in 404's
We're working with another site that unfortunately due to how their website has been programmed creates a bit of a mess. Whenever an employee removes a page from their site through their homegrown 'content management system', rather than 301'ing to another location on their site, the page is deleted and results in a 404. The interim question until they implement a better solution in managing their website is: Should they first add noindex,nofollow to the pages that are scheduled to be removed. Then once they are removed, they become 404's? Of note, it is possible that some of these pages will be used again in the future, and I would imagine they could submit them to Google through Webmaster Tools and adding the pages to their sitemap.
Intermediate & Advanced SEO | | Prospector-Plastics0 -
Do links from twitter count in SEOMoz's Toolbar link count?
I am using the Chrome extension and looking at a SERP, when a page is said to have 2000 incoming links, does that include tweets with a link back to this page? What about retweets. Are those counted separately or as one? And what about independent tweets that have exactly the same content (tweet text + link)
Intermediate & Advanced SEO | | davhad0 -
Robots.txt file - How to block thosands of pages when you don't have a folder path
Hello.
Intermediate & Advanced SEO | | Unity
Just wondering if anyone has come across this and can tell me if it worked or not. Goal:
To block review pages Challenge:
The URLs aren't constructed using folders, they look like this:
www.website.com/default.aspx?z=review&PG1234
www.website.com/default.aspx?z=review&PG1235
www.website.com/default.aspx?z=review&PG1236 So the first part of the URL is the same (i.e. /default.aspx?z=review) and the unique part comes immediately after - so not as a folder. Looking at Google recommendations they show examples for ways to block 'folder directories' and 'individual pages' only. Question:
If I add the following to the Robots.txt file will it block all review pages? User-agent: *
Disallow: /default.aspx?z=review Much thanks,
Davinia0 -
Is it possible to lose rank because my site's IP changed?
I manage a site on the 3dCart e-commerce platform. I recently updated the SSL certificate. Today, when I tried to log-in via FTP, I couldn't connect. The reason I couldn't connect was because my IP had changed. Last week the site experienced almost across the board rankings drops on lmost every important keyword. Not gigantic drops, a lot just lost 2-4 postiions, but that's a lot when you were #2 and you drop to #4 or # 6. Initially I thought it was because I was attempting to markup my product pages using structured data following guidelines from schema.org. I am not a coder so it was a real struggle, especially trying to navigate 3dCart's listing templates. I thought the rankings drops were Google slapping me for bad code, but now I wonder....could I really have dropped down because of that IP address change? Does anyone have a take on this? Thanks!
Intermediate & Advanced SEO | | danatanseo0 -
Best way to view Global Navigation bar from GoogleBot's perspective
Hi, Links in the global navigation bar of our website do not show up when we look at Google cache --> text only version of the page. These links use "style="<a class="attribute-value">display:none;</a>" when we looked at HTML source. But if I use "user agent switcher" add-on in Firefox and set it to Googlebot, the links in global nav are displayed. I am wondering what is the best way to find out if Google can/can not see the links. Thanks for the help! Supriya.
Intermediate & Advanced SEO | | SShiyekar0 -
My homepage doesn't rank anymore. It's been replaced by irrelevant subpages which rank around 100-200 instead of top 5.
Hey guys, I think I got some kind of penalty for my homepage. I was in top5 for my keywords. Then a few days ago, my homepage stopped ranking for anything except searching for my domain name in Google. sitename.com/widget-reviews/ previously ranked #3 for "widget reviews"
Intermediate & Advanced SEO | | wearetribe
but now....
sitename.com/widget-training-for-pet-cats/ is ranking #84 for widget reviews instead. Similarly across all my other keywords, irrelevant, wrong pages are ranking. Did I get some kind of penalty?0