Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What happens to crawled URLs subsequently blocked by robots.txt?
We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed. I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page. The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling. Which is the better practice?
Intermediate & Advanced SEO | | AspenFasteners1 -
Does removal of internal redirects(301) help in SEO
I am planning to completely remove 301 redirects manually by replacing such links with actual live pages/links. So there will be no redirects internally in the website. Will this boost our SEO efforts? Auto redirects will be there for incoming links to non-existing pages. Thanks, Satish
Intermediate & Advanced SEO | | vtmoz0 -
Should I disallow all URL query strings/parameters in Robots.txt?
Webmaster Tools correctly identifies the query strings/parameters used in my URLs, but still reports duplicate title tags and meta descriptions for the original URL and the versions with parameters. For example, Webmaster Tools would report duplicates for the following URLs, despite it correctly identifying the "cat_id" and "kw" parameters: /Mulligan-Practitioner-CD-ROM
Intermediate & Advanced SEO | | jmorehouse
/Mulligan-Practitioner-CD-ROM?cat_id=87
/Mulligan-Practitioner-CD-ROM?kw=CROM Additionally, theses pages have self-referential canonical tags, so I would think I'd be covered, but I recently read that another Mozzer saw a great improvement after disallowing all query/parameter URLs, despite Webmaster Tools not reporting any errors. As I see it, I have two options: Manually tell Google that these parameters have no effect on page content via the URL Parameters section in Webmaster Tools (in case Google is unable to automatically detect this, and I am being penalized as a result). Add "Disallow: *?" to hide all query/parameter URLs from Google. My concern here is that most backlinks include the parameters, and in some cases these parameter URLs outrank the original. Any thoughts?0 -
Internal Links - Dofollow or Nofollow and why?
Hey there Mozzers, I am a question about internal links. If I am writing a article about something and want to link to another one of my articles inside my blog, do i have to make that link nofollow or dofollow? If possible tell me why also. Thanks in advance
Intermediate & Advanced SEO | | Angelos_Savvaidis0 -
Google is indexing wordpress attachment pages
Hey, I have a bit of a problem/issue what is freaking me out a bit. I hope you can help me. If i do site:www.somesitename.com search in Google i see that Google is indexing my attachment pages. I want to redirect attachment URL's to parent post and stop google from indexing them. I have used different redirect plugins in hope that i can fix it myself but plugins don't work. I get a error:"too many redirects occurred trying to open www.somesitename.com/?attachment_id=1982 ". Do i need to change something in my attachment.php fail? Any idea what is causing this problem? get_header(); ?> /* Run the loop to output the attachment. * If you want to overload this in a child theme then include a file * called loop-attachment.php and that will be used instead. */ get_template_part( 'loop', 'attachment' ); ?>
Intermediate & Advanced SEO | | TauriU0 -
Blocking HTTP 1.0?
One of my clients believes someone is trying to hack their site. We are seeing the requests with a server protocol or HTTP 1.0 so they want to block 1.0 entirely. Will this cause any problems with search engines or regular, non-spamming visitors?
Intermediate & Advanced SEO | | BryanPhelps-BigLeapWeb0 -
External 404 vs Internal 404
Which one is bad? External - when someone adds an incorrect link to your site, maybe does a typo when linking to an inner page. This page never existed on your site, google shows this as a 404 in Webmaster tools. Internal - a page existed, google indexed it, and you deleted it and didnt add a 301. Internal ones are in the webmaster's control, and i can understand if google gets upset if it sees a 404 for a URL that existed before, however surely "externally created" 404 shoudnt cause any harm cause that page never existed. And someone has inserted an incorrect link to your site.
Intermediate & Advanced SEO | | SamBuck0