Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Removing CSS & JS Files from Index
-
Hi,
Google has indexed a few .CSS and .JS files that belong to our WordPress plugins and themes. I had them blocked via robots, but realized this doesn't prevent indexation (and can likely hurt us since Google wants to access these files).
I've since removed the robots instructions, submitted a removal request via Search Console, but want to make sure they don't come back.
Is there a way to put a noindex tag within .CSS and .JS files? Or should I do something with .htaccess instead?
-
I figured .htaccess would be the best route. Thank you for researching and confirming. I appreciate it.
-
Hi Tim,
Assigning a noindex tag to these files will not block them, only prevent them from showing in SERPs. This is the intended goal and the reason I deleted my robots.txt file which prevented crawling.
-
There's quite a big difference between crawling directives, which block and indexing directives. This article by (former?) Moz user S_ebastian_ is a good foundation read.
This article at developers.google.com is a good second read. If I'm understanding it right, Google thinks in terms of crawling directives vs indexing / serving directives.
My attempt at <tl rl="">:</tl>
crawling = looking, using in any way :: controlled via robots.txt
indexing / serving = indexing, archiving, displaying snippets in results, etc :: controlled via html meta tags or web server htaccess (or similar for other web servers).
I'm not convinced yet, that asking for noindex via htaccess causes the same sort of grief that deny in robots.txt causes.
-
I would seriously think again when it comes to blocking/no-indexing your CSS and JS files - Google has in the past stated that if they cannot fully render your site properly then this could lead to poorer rankings.
You will also likely get notifications in your Search Console as errors for this too.
Check out this great article from July this year which goes into more details.
-
I haven't encountered undesirable .css or .js indexing myself (yet), but as you surmised, maybe this htaccess directive might be worth trying?
<filesmatch ".(txt|log|xml|css|js)$"="">Header set X-Robots-Tag "noindex"</filesmatch>
Google seems to support it
-
Unless I'm severely misreading the links provided, which I've read before, it seems Google is stating that they read, render, and sometimes index .CSS and .JS files. Here's an article written a week after the second article you posted.
The aforementioned WordPress plugin and theme files hosted on my server are indeed showing up in Google SERPs.
I do not want to prevent Googlebot from reaching these files as they're needed for optimal site performance, but I do want them to be no-indexed. Thus, I don't want robots.txt to prevent crawling, only indexing.
Let me know if I'm misunderstanding.
-
TL;DR - You're hesitated about problem that doesn't exist.
Googlebot doesn't index CSS or JS files. They index text files, HTML, PDF, DOC, XLS, etc. But doesn't index style sheets or javascript files.
All you need in WordPress is to create blank robots.txt file where WP is installed with this content:
User-agent: *
Disallow:
Sitemap: http://site/sitemap-file-name.xmlAnd that's all. This is explain many times:
http://googlewebmastercentral.blogspot.bg/2014/05/understanding-web-pages-better.html
http://googlewebmastercentral.blogspot.bg/2014/10/updating-our-technical-webmaster.html
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate content issue with ?utm_source=rss&utm_medium=rss&utm_campaign=
Hello,
Technical SEO | | Dinsh007
Recently, I was checking how my site content is getting indexed in Google and from today I noticed 2 links indexed on google for the same article: This is the proper link - https://techplusgame.com/hideo-kojima-not-interested-in-new-silent-hills-revival-insider-claims/ But why this URL was indexed, I don't know - https://techplusgame.com/hideo-kojima-not-interested-in-new-silent-hills-revival-insider-claims/?utm_source=rss&utm_medium=rss&utm_campaign=hideo-kojima-not-interested-in-new-silent-hills-revival-insider-claims Could you please tell me how to solve this issue? Thank you1 -
My video sitemap is not being index by Google
Dear friends, I have a videos portal. I created a video sitemap.xml and submit in to GWT but after 20 days it has not been indexed. I have verified in bing webmaster as well. All videos are dynamically being fetched from server. My all static pages have been indexed but not videos. Please help me where am I doing the mistake. There are no separate pages for single videos. All the content is dynamically coming from server. Please help me. your answers will be more appreciated................. Thanks
Technical SEO | | docbeans0 -
No Index PDFs
Our products have about 4 PDFs a piece, which really inflates our indexed pages. I was wondering if I could add REL=No Index to the PDF's URL? All of the files are on a file server, so they are embedded with links on our product pages. I know I could add a No Follow attribute, but I was wondering if any one knew if the No Index would work the same or if that is even possible. Thanks!
Technical SEO | | MonicaOConnor0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
301 Redirect with index.asp
I am very new to all of this so forgive the newbie questions I will get better. Ok so after starting a campaign I see that I have many issues including where some pages are being deemed as duplicate content. 1. The report says the http://lucid8.com has duplicate content on 2 other pages 2. When I look at them it shows that http://lucid8.com/index.asp and http://www.lucid8.com are duplicates. 3. Really these are the exactly the same page because the default page that is opened for www.lucid8.com http://www.lucid8.com etc always opens the index.asp page. 4. Now I read that I should do permanent redirects and how to do this via IIS and I tried to do a redirect from index.asp to www.lucid8.com but that does not work because www.lucid8.com is pointing to index.asp and so we end up in a circle. So the question is how do I get rid of these duplicate page references without causing problems. Thanks
Technical SEO | | TroyW0 -
How to determine which pages are not indexed
Is there a way to determine which pages of a website are not being indexed by the search engines? I know Google Webmasters has a sitemap area where it tells you how many urls have been submitted and how many are indexed out of those submitted. However, it doesn't necessarily show which urls aren't being indexed.
Technical SEO | | priceseo1 -
Duplicate content problem from an index.php file
Hi One of my sites is flagging a duplicate content problem which is affecting the search rankings. The duplicate problem is caused by http://www.mydomain.com/index.php which has a page rank of 26 How can I sort the duplicate content problem, as the main page should just be http://www.mydomain.com which has a page rank of 42 and is the stronger page with stronger links etc Many Thanks
Technical SEO | | ocelot0 -
Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google
Hi, I come across a tough problem. I am working on an online-store website which contains the functionlaity of viewing products details in .PDF format (by the way, the website is built on Joomla CMS), now when I search my site's name in Google, the SERP simply displays my .PDF files in the first couple positions (shown in normal .PDF files format: [PDF]...)and I cannot find the normal pages there on SERP #1 unless I search the full site domain in Google. I really don't want this! Would you please tell me how to figure the problem out and solve it. I can actually remove the corresponding component (Virtuemart) that are in charge of generating the .PDF files. Now I am trying to redirect all the .PDF pages ranking in Google to a 404 page and remove the functionality, I plan to regenerate a sitemap of my site and submit it to Google, will it be working for me? I really appreciate that if you could help solve this problem. Thanks very much. Sincerely SEOmoz Pro Member
Technical SEO | | fugu0