Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Robots.txt blocked internal resources Wordpress
-
Hi all,
We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpHowever, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts.
Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO?
Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index?
Thanks for your thoughts!
-
Thanks for the answer!
Last question: is /wp-admin/admin-ajax.php an important part that has to be crawled? I found this explanation: https://wordpress.stackexchange.com/questions/190993/why-use-admin-ajax-php-and-how-does-it-work/191073#191073
However, on this specific website there is no html at all when I check the source code, only one line with 0 on it.
-
I would leave all the disallows out except for the /wp-admin/ section. For example, I'd rewrite the robots.txt file to read:
User-agent: *
Disallow: /wp-admin/Also, you kind of want Google to index your cached content. In the event your servers go down it will still be able to make your content available.
I hope that helps. Let me know how that works out for you!
-
Thanks for the clear answer.
I've changed the robots.txt to:
User-agent: *
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.phpThis should avoid problems with not indexing (parts of) cached content.
Or should I leave all the Disallows out?
-
Hey there --
Blocking resources with the robots.txt file prevents search engines from crawling content the no-index tag would be better suited for preventing content from being indexed.
However, previous best practice would dictate blocking access to /wp-includes/ and /wp-content/ directories, etc but that's no longer necessary.
Today, Google will fetch all your styling and JavaScript files so they can render your pages completely. Search engines now try to understand your page's layout and presentation as a key part of how they evaluate quality.
So, yeah this might have some impact on your SEO.
Also, if you're using a plugin to cache content you should want Google to crawl your cache content. And in my experience, Googlebot does a good job of not indexing /wp-content/ sections.
So, for your example page, https://example.com/wp-content/cache/minify/df983.js it shouldn't end up in their index.
Hope this helps some.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Internal links to landing pages
Hi, we are in the process of building a new website and we have 12 different locations and for theses 12 locations we have landing pages with unique copy on the following: 1. Marketing...2 SEO....3. PPC....4. Web Design Therefor there are 48 landing pages. The marketing pages are the most important ones to us in terms of traffic and priority. My question is: 1. Should we put a dropdown of the are pages in the main header under locations that link to the area marketing pages? 2. What is the best way to link all the sub pages such as London Web Design? Should these links just be coming off the London marketing page? or should we have a sitemap in the footer that lists every page? Thanks
Intermediate & Advanced SEO | | Caffeine_Marketing0 -
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
If I block a URL via the robots.txt - how long will it take for Google to stop indexing that URL?
Intermediate & Advanced SEO | | Gabriele_Layoutweb0 -
Wordpress Blog in 2 languages. How to SEO or structure it?
Hi Moz community, I have got a wordpress blog currently in the spanish language. I want to create the same blog content but in english version. (manually translate it to english instead of using translation service such as Google Translate). How should i structure the blog for SEO? How will it work? Any structure markups i should know about? Any examples? Thanks
Intermediate & Advanced SEO | | WayneRooney0 -
Dilemma about "images" folder in robots.txt
Hi, Hope you're doing well. I am sure, you guys must be aware that Google has updated their webmaster technical guidelines saying that users should allow access to their css files and java-scripts file if it's possible. Used to be that Google would render the web pages only text based. Now it claims that it can read the css and java-scripts. According to their own terms, not allowing access to the css files can result in sub-optimal rankings. "Disallowing crawling of Javascript or CSS files in your site’s robots.txt directly harms how well our algorithms render and index your content and can result in suboptimal rankings."http://googlewebmastercentral.blogspot.com/2014/10/updating-our-technical-webmaster.htmlWe have allowed access to our CSS files. and Google bot, is seeing our webapges more like a normal user would do. (tested it in GWT)Anyhow, this is my dilemma. I am sure lot of other users might be facing the same situation. Like any other e commerce companies/websites.. we have lot of images. Used to be that our css files were inside our images folder, so I have allowed access to that. Here's the robots.txt --> http://www.modbargains.com/robots.txtRight now we are blocking images folder, as it is very huge, very heavy, and some of the images are very high res. The reason we are blocking that is because we feel that Google bot might spend almost all of its time trying to crawl that "images" folder only, that it might not have enough time to crawl other important pages. Not to mention, a very heavy server load on Google's and ours. we do have good high quality original pictures. We feel that we are losing potential rankings since we are blocking images. I was thinking to allow ONLY google-image bot, access to it. But I still feel that google might spend lot of time doing that. **I was wondering if Google makes a decision saying, hey let me spend 10 minutes for google image bot, and let me spend 20 minutes for google-mobile bot etc.. or something like that.. , or does it have separate "time spending" allocations for all of it's bot types. I want to unblock the images folder, for now only the google image bot, but at the same time, I fear that it might drastically hamper indexing of our important pages, as I mentioned before, because of having tons & tons of images, and Google spending enough time already just to crawl that folder.**Any advice? recommendations? suggestions? technical guidance? Plan of action? Pretty sure I answered my own question, but I need a confirmation from an Expert, if I am right, saying that allow only Google image access to my images folder. Sincerely,Shaleen Shah
Intermediate & Advanced SEO | | Modbargains1 -
Baidu Spider appearing on robots.txt
Hi, I'm not too sure what to do about this or what to think of it. This magically appeared in my companies robots.txt file (literally magically appeared/text is below) User-agent: Baiduspider
Intermediate & Advanced SEO | | IceIcebaby
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: / I know that Baidu is the Google of China, but I'm not sure why this would appear in our robots.txt all of a sudden. Should I be worried about a hack? Also, would I want to disallow Baidu from crawling my companies website? Thanks for your help,
-Reed0 -
Wordpress blog in a subdirectory not being indexed by Google
HI MozzersIn my websites sitemap.xml, pages are listed, such as /blog/ and /blog/textile-fact-or-fiction-egyptian-cotton-explained/These pages are visible when you visit them in a browser and when you use the Google Webmaster tool - Fetch as Google to view them (see attachment), however they aren't being indexed in Google, not even the root directory for the blog (/blog/) is being indexed, and when we query:site: www.hilden.co.uk/blog/ It returns 0 results in Google.Also note that:The Wordpress installation is located at /blog/ which is a subdirectory of the main root directory which is managed by Magento. I'm wondering if this causing the problem.Any help on this would be greatly appreciated!AnthonyToTOHuj.png?1
Intermediate & Advanced SEO | | Tone_Agency0 -
Duplicate internal links on page, any benefit to nofollow
Link spam is naturally a hot topic amongst SEO's, particularly post Penguin. While digging around forums etc, I watched a video blog from Matt Cutts posted a while ago that suggests that Google only pays attention to the first instance of a link on the page As most websites will have multiple instances of a links (header, footer and body text), is it beneficial to nofollow the additional instances of the link? Also as the first instance of a link will in most cases be within the header nav, does that then make the content link text critical or can good on page optimisation be pulled from the title attribute? I would appreciate the experiences and thoughts Mozzers thoughts on this thanks in advance!
Intermediate & Advanced SEO | | JustinTaylor880 -
Does Blocking ICMP Requests Affect SEO?
All in the title really. One of our clients came up with errors with a server header check, so I pinged them and it times out. The hosting company have told them that it's because they're blocking ICMP requests and this doesn't affect SEO at all... but I know that sometimes pinging posts, etc... can be beneficial so is this correct? Thanks, Steve.
Intermediate & Advanced SEO | | SteveOllington0