Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
OK to block /js/ folder using robots.txt?
-
I know Matt Cutts suggestions we allow bots to crawl css and javascript folders (http://www.youtube.com/watch?v=PNEipHjsEPU)
But what if you have lots and lots of JS and you dont want to waste precious crawl resources?
Also, as we update and improve the javascript on our site, we iterate the version number ?v=1.1... 1.2... 1.3... etc.
And the legacy versions show up in Google Webmaster Tools as 404s. For example:
http://www.discoverafrica.com/js/global_functions.js?v=1.1
http://www.discoverafrica.com/js/jquery.cookie.js?v=1.1
http://www.discoverafrica.com/js/global.js?v=1.2
http://www.discoverafrica.com/js/jquery.validate.min.js?v=1.1
http://www.discoverafrica.com/js/json2.js?v=1.1Wouldn't it just be easier to prevent Googlebot from crawling the js folder altogether?
Isn't that what robots.txt was made for?
Just to be clear - we are NOT doing any sneaky redirects or other dodgy javascript hacks.
We're just trying to power our content and UX elegantly with javascript.
What do you guys say:
Obey Matt? Or run the javascript gauntlet?
-
Hey!
So, I listened to Matt's video. I see his point about wanting to crawl the JS files just in case something tricky is going on. Do understand that this is a risk you take. I don't see an issue blocking crawling of those files from a logical perspective, but if you or someone that takes over for you in the future does do something sneaky with JS and you are caught ... plus you have blacked access to the offending files ... it is going to take a lot more work to get back in good graces with them.
It's like a cop searching your car. You have every right to ban them from doing so, but if you have nothing to hide, why make trouble? Matt is right, banning crawling of these files is not going to save you much but if you think it's an issue, feel free. Just know that they might take it as a possible flag in the future.
Kate
-
Harald, it looks like the response you've quoted is from http://groups.google.com/a/googleproductforums.com/forum/#!category-topic/webmasters/crawling-indexing--ranking/9MGYEoROdkg, which is a question about a menu that has javascript. I think this poster has a slightly different question. I'll ask another associate to come on in and take a look.
-
Hi Discover,I think that whenever we access the web pages , we have seen number of times that there is run time error & they asking for debug. This error message is helpful for the developers only but not for the users.
I think that you should please refer to the following link:
The truth about non javascript
I hope that above content help to solve your query.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Folders in url structure?
Hello, Revamping an out-of-date website and am wondering if I need to include the folders (categories) in the url structure? The proposed structure has 8 main folders. I've been reading that Google is ok if the folder is not included in the url, but is it really? The hesitation I have is that the urls are getting long and the main folder only has only a sub folder beneath it. So, /folder-name/facility-name/treatment-overview. This looks too long, doesn't it? Thanks!
Technical SEO | | lfrazer1230 -
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Do I use /es/, /mx/ or /es-mx/ for my Spanish site for Mexico only
I currently have the Spanish version of my site under myurl.com/es/ When I was at Pubcon in Vegas last year a panel reviewed my site and said the Spanish version should be in /mx/ rather than /es/ since es is for Spain only and my site is for Mexico only. Today while trying to find information on the web I found /es-mx/ as a possibility. I am changing my site and was planning to change to /mx/ but want confirmation on the correct way to do this. Does anyone have a link to Google documentation that will tell me for sure what to use here? The documentation I read led me to the /es/ but I cannot find that now.
Technical SEO | | RoxBrock0 -
No indexing url including query string with Robots txt
Dear all, how can I block url/pages with query strings like page.html?dir=asc&order=name with robots txt? Thanks!
Technical SEO | | HMK-NL0 -
Internal search : rel=canonical vs noindex vs robots.txt
Hi everyone, I have a website with a lot of internal search results pages indexed. I'm not asking if they should be indexed or not, I know they should not according to Google's guidelines. And they make a bunch of duplicated pages so I want to solve this problem. The thing is, if I noindex them, the site is gonna lose a non-negligible chunk of traffic : nearly 13% according to google analytics !!! I thought of blocking them in robots.txt. This solution would not keep them out of the index. But the pages appearing in GG SERPS would then look empty (no title, no description), thus their CTR would plummet and I would lose a bit of traffic too... The last idea I had was to use a rel=canonical tag pointing to the original search page (that is empty, without results), but it would probably have the same effect as noindexing them, wouldn't it ? (never tried so I'm not sure of this) Of course I did some research on the subject, but each of my finding recommanded one of the 3 methods only ! One even recommanded noindex+robots.txt block which is stupid because the noindex would then be useless... Is there somebody who can tell me which option is the best to keep this traffic ? Thanks a million
Technical SEO | | JohannCR0 -
Subdomain Removal in Robots.txt with Conditional Logic??
I would like to see if there is a way to add conditional logic to the robots.txt file so that when we push from DEV to PRODUCTION and the robots.txt file is pushed, we don't have to remember to NOT push the robots.txt file OR edit it when it goes live. My specific situation is this: I have www.website.com, dev.website.com and new.website.com and somehow google has indexed the DEV.website.com and NEW.website.com and I'd like these to be removed from google's index as they are causing duplicate content. Should I: a) add 2 new GWT entries for DEV.website.com and NEW.website.com and VERIFY ownership - if I do this, then when the files are pushed to LIVE won't the files contain the VERIFY META CODE for the DEV version even though it's now LIVE? (hope that makes sense) b) write a robots.txt file that specifies "DISALLOW: DEV.website.com/" is that possible? I have only seen examples of DISALLOW with a "/" in the beginning... Hope this makes sense, can really use the help! I'm on a Windows Server 2008 box running ColdFusion websites.
Technical SEO | | ErnieB0 -
Is using a Href in Div OK?
Hi, I was just wondering what your thoughts are on using a Href in a Div, which contains anchor text. We currently use the Href on the div, as opposed to just the anchor text as I want the whole div to be clickable as opposed to just the anchor text. So currently I have: Keword 1
Technical SEO | | James77
Keyword 2 Is this perfectly fine to do it like this as opposed to using <a tags="" ???<br="">I suppose there are various alternatives - if you must use the</a><a tag="" like:<="" p=""></a> <a tag="" like:<="" p=""></a> Keword 1
Keyword 2 However I would assume a search engine is smart enought to know its the same thing??? Thanks0 -
Google Off/On Tags
I came across this article about telling google not to crawl a portion of a webpage, but I never hear anyone in the SEO community talk about them. http://perishablepress.com/press/2009/08/23/tell-google-to-not-index-certain-parts-of-your-page/ Does anyone use these and find them to be effective? If not, how do you suggest noindexing/canonicalizing a portion of a page to avoid duplicate content that shows up on multiple pages?
Technical SEO | | Hakkasan1