Removing CSS & JS Files from Index
-
Hi,
Google has indexed a few .CSS and .JS files that belong to our WordPress plugins and themes. I had them blocked via robots, but realized this doesn't prevent indexation (and can likely hurt us since Google wants to access these files).
I've since removed the robots instructions, submitted a removal request via Search Console, but want to make sure they don't come back.
Is there a way to put a noindex tag within .CSS and .JS files? Or should I do something with .htaccess instead?
-
I figured .htaccess would be the best route. Thank you for researching and confirming. I appreciate it.
-
Hi Tim,
Assigning a noindex tag to these files will not block them, only prevent them from showing in SERPs. This is the intended goal and the reason I deleted my robots.txt file which prevented crawling.
-
There's quite a big difference between crawling directives, which block and indexing directives. This article by (former?) Moz user S_ebastian_ is a good foundation read.
This article at developers.google.com is a good second read. If I'm understanding it right, Google thinks in terms of crawling directives vs indexing / serving directives.
My attempt at <tl rl="">:</tl>
crawling = looking, using in any way :: controlled via robots.txt
indexing / serving = indexing, archiving, displaying snippets in results, etc :: controlled via html meta tags or web server htaccess (or similar for other web servers).
I'm not convinced yet, that asking for noindex via htaccess causes the same sort of grief that deny in robots.txt causes.
-
I would seriously think again when it comes to blocking/no-indexing your CSS and JS files - Google has in the past stated that if they cannot fully render your site properly then this could lead to poorer rankings.
You will also likely get notifications in your Search Console as errors for this too.
Check out this great article from July this year which goes into more details.
-
I haven't encountered undesirable .css or .js indexing myself (yet), but as you surmised, maybe this htaccess directive might be worth trying?
<filesmatch ".(txt|log|xml|css|js)$"="">Header set X-Robots-Tag "noindex"</filesmatch>
Google seems to support it
-
Unless I'm severely misreading the links provided, which I've read before, it seems Google is stating that they read, render, and sometimes index .CSS and .JS files. Here's an article written a week after the second article you posted.
The aforementioned WordPress plugin and theme files hosted on my server are indeed showing up in Google SERPs.
I do not want to prevent Googlebot from reaching these files as they're needed for optimal site performance, but I do want them to be no-indexed. Thus, I don't want robots.txt to prevent crawling, only indexing.
Let me know if I'm misunderstanding.
-
TL;DR - You're hesitated about problem that doesn't exist.
Googlebot doesn't index CSS or JS files. They index text files, HTML, PDF, DOC, XLS, etc. But doesn't index style sheets or javascript files.
All you need in WordPress is to create blank robots.txt file where WP is installed with this content:
User-agent: *
Disallow:
Sitemap: http://site/sitemap-file-name.xmlAnd that's all. This is explain many times:
http://googlewebmastercentral.blogspot.bg/2014/05/understanding-web-pages-better.html
http://googlewebmastercentral.blogspot.bg/2014/10/updating-our-technical-webmaster.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Historic issue with incomplete indexing
Hi there We run quite a big site in the UK in the commercial real-estate space. Historically we have always had a challenge getting our "primary" landing pages indexed, which are location based property result pages. e.g. https://realla.co/to-rent/commercial-property/oxford For example, for the "towns" category we have 8,549 submitted in our xml sitemap, with only 3,171 indexed. This is a general issue across all our sitemaps. 120k submitted, 80k indexed. Our pages are linked through breadcrumbs, and nearby links. In the new search console these pages are reported as "crawled - currently not indexed" These all sit under the folder: site:https://realla.co/to-rent/commercial-property/* site:https://realla.co/to-rent/office/* We have done extensive work to optimise performance, including AMP pages. Each location page has many details pages for individual properties e.g. https://realla.co/to-rent/details/0ffbbd0a1a1147edb8847c5ce6179509 One action we have remaining is to nest the details under the locations pages, which may help. These details pages are indexed fully. Any feedback much appreciated
Technical SEO | | ianparryuk0 -
Dropped out of Bing index!! (and Yahoo too)
When I search for my site via site:domain.com or url:domain.com the are ZERO results except for this Some results have been removed which goes to http://help.bing.microsoft.com/#apex/18/en-US/10016/0 We are a total white hat website. What should I do? Is there someone at Bing I can contact? I dont see anyway via Bing Webmaster tools.
Technical SEO | | corlin0 -
Having Problems to Index all URLs on Sitemap
Hi all again ! Thanks in advance ! My client's site is having problems to index all its pages. I even bought the full extension of XML Sitemaps and the number of urls increased, but we still have problems to index all of them. What are the reasons? The robots.txt is open for all robots, we only prohibit users and spiders to enter our Intranet. I've read that duplicate content and 404's can be the reason. Anything else?
Technical SEO | | Tintanus0 -
Do Sitespect links get indexed?
I put a link on one of my websites using sitespect because the next release is not for a few weeks. The reason for the link is to pass domain authority (SEO Juice) to the linked site. In my next release I will add the link in the actual code, but am hoping that from now till then google will crawl and index this link. So the question is, will google crawl and index links adding to webpages via sitespect? Here is the code: | * [http://www.](<a class=)yourdomain.com" class="" >YourDomain |
Technical SEO | | AlyssaN
| | | Link to Sitespect: http://www.sitespect.com/0 -
No index on subdomains
Hi, We have a subdomain that is appearing in the search results - I want to hide this as it looks really bad. If I were to add the no index tag to the sub domain would URL would this affect the whole domain or just that sub domain? The main domain is vitally important - it is just that sub domain I need to hide. Many thanks
Technical SEO | | Creditsafe0 -
Round 3 & still no indexing for varicose veins :-(
Greetings from 11 degrees C partly suuny Wetherby 🙂 Every so oftem you hit an SEO mission that just consistently hits a brick wall. For the third time i'm investigating why this page:
Technical SEO | | Nightwing
http://www.collegeofphlebology.com/varicose-veins/what-are-they/ fails to even reach the bottom of page 3. Ive gone back to basic and ran an SEO audit of sorts in an attempt to see if I'd missed anything. Here is the audit: http://i216.photobucket.com/albums/cc53/zymurgy_bucket/audit-for-moz.jpg So my question is please: From a technical SEO perspective is there anything wrong with this page http://www.collegeofphlebology.com/varicose-veins/what-are-they/ to explain why it does not rank for target term "Varicose Veins" Thanks in advance,
David0 -
How to properly remove 404 errors
Hi, According to seomoz report I have two 404 errors on my site. (http://screencast.com/t/2FG8fA1dvGB) I removed them from google webmasters central about 2 weeks ago (http://screencast.com/t/MQ8XBvrFm ) , but they're still showing as an error in the next report (weekly update). Is there anything else you do about 404 or just remove urls through gwc? Or maybe seomoz data is delayed? Thanks in advance, JJ
Technical SEO | | jjtech0 -
How should I properly setup my .htaccess file?
I have searched google for 'how to setup .htaccess file' and it seems that every website has some variation. For example: RewriteCond %{HTTP_HOST} ^yoursite.com RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R=permanent,L] On SEOMOZ someone posted this: RewriteCond %{HTTP_HOST} !^www.yoursite.com [NC] RewriteRule (.*) http://www.yoursite.com/$1 [L,R=301] On yet another website, I found this: RewriteEngine On RewriteCond %{HTTP_HOST} !^your-site.com$ [NC] RewriteRule ^(.*)$ http://your-site.com/$1 [L,R=301] As you can see there are slight differences. Which one do I use? I'm on Apache CentOS and I have HTML5 websites and several Joomla! wesites. Would the HTACCESS File be different for both?
Technical SEO | | maxduveen0