Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to allow bots to crawl all but WP-content
-
Hello,
I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/User-agent: GoogleBot
Allow: /User-agent: GoogleBot-Mobile
Allow: /User-agent: GoogleBot-Image
Allow: /User-agent: Bingbot
Allow: /User-agent: Slurp
Allow: / -
Thank you for the help, Gaston!
-
Yeap, with that you are allowing every file ending with that extension
-
Can I do so with:
Allow: *.jpg
Allow: *.png
-
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
Oh god, my mistake!
Im deeply sorry, yes, this configuration will block images! that follow that folder structure!I'll correct myself.
Thanks for pointing it out! -
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!
-
Hi Tom,
Yes, this config will allow images to be crawled,
No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png
How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure.
Hope it helps.
Best luck.
GR -
Hi Gaston,
I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?
Thanks!
-
Awesome. Thanks, Gaston!
-
Yes it does.
As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing.
Check the image i've attached.Hope it helps.
Best luck
GR -
Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?
Best
-
Hi Tom!
That Robots.txt config is pretty redundant.
To acheive what you what, thy this:User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Allow: *.js
Allow: *.cssJust 3 things to note here:
1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
3- Those Allow:/ dont prevent the disallow.To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu.
Remember that by default google considers that you are not blocking nothing.
More info here: The web robots.tat pageHope it helps.
Best luck.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt allows wp-admin/admin-ajax.php
Hello, Mozzers!
Technical SEO | | AndyKubrin
I noticed something peculiar in the robots.txt used by one of my clients: Allow: /wp-admin/admin-ajax.php What would be the purpose of allowing a search engine to crawl this file?
Is it OK? Should I do something about it?
Everything else on /wp-admin/ is disallowed.
Thanks in advance for your help.
-AK:2 -
Collapsible sections - content
**Hi,****I am looking to improve the aesthetics of some pages on my website by adding written content into collapsible tabs. I was wondering whether the content that is ‘hidden’ by tabs is given less weight by Google from the perspective of SEO? **Some articles I have read suggest that tabbed content is weighted equally with the content which is already immediately visible to the user, but others suggest that this may not be the case. **Please, can I request opinions on the matter? Any advice would be greatly appreciated, many thanks.**Katarina
Technical SEO | | Katarina-Borovska0 -
Sudden Indexation of "Index of /wp-content/uploads/"
Hi all, I have suddenly noticed a massive jump in indexed pages. After performing a "site:" search, it was revealed that the sudden jump was due to the indexation of many pages beginning with the serp title "Index of /wp-content/uploads/" for many uploaded pieces of content & plugins. This has appeared approximately one month after switching to https. I have also noticed a decline in Bing rankings. Does anyone know what is causing/how to fix this? To be clear, these pages are **not **normal /wp-content/uploads/ but rather "index of" pages, being included in Google. Thank you.
Technical SEO | | Tom3_150 -
Duplicate Content Issues with Pagination
Hi Moz Community, We're an eCommerce site so we have a lot of pagination issues but we were able to fix them using the rel=next and rel=prev tags. However, our pages have an option to view 60 items or 180 items at a time. This is now causing duplicate content problems when for example page 2 of the 180 item view is the same as page 4 of the 60 item view. (URL examples below) Wondering if we should just add a canonical tag going to the the main view all page to every page in the paginated series to get ride of this issue. https://www.example.com/gifts/for-the-couple?view=all&n=180&p=2 https://www.example.com/gifts/for-the-couple?view=all&n=60&p=4 Thoughts, ideas or suggestions are welcome. Thanks
Technical SEO | | znotes0 -
Z-indexed content
I have some content on a page that I am not using any type of css hiding techniques, but I am using an image with a higher z-index in order to prevent the text from being seen until a user clicks a link to have the content scroll down. Are there any negative repercussions for doing this in regards to SEO?
Technical SEO | | cokergroup0 -
SEO for User Authenticated Content
Hi Everyone - I have a potential client who is seeking SEO for a site that contains about 95% of content only accessible through user authentication . Does anyone have tips for getting this indexed without having to open it up to the public? I was considering adding "snippets" into the robots.txt or creating an additional page with snippets linking to the login page. I'd appreciate any thoughts! Thanks!
Technical SEO | | manutx0 -
Duplicate Content and URL Capitalization
I have multiple URLs that SEOMoz is reporting as duplicate content. The reason is that there are characters in the URL that may, or may not, be capitalized depending on user input. A couple examples are: www.househitz.com/Pennsylvania/Houses-for-sale www.househitz.com/Pennsylvania/houses-for-sale www.househitz.com/Pennsylvania/Houses-for-rent www.househitz.com/Pennsylvania/houses-for-rent There are currently thousands of instances of this on the site. Is this something I should spend effort to try and resolve (may not be minor effort), or should I just ignore it and move on?
Technical SEO | | Jom0 -
Cloaking? Best Practices Crawling Content Behind Login Box
Hi- I'm helping out a client, who publishes sale information (fashion sales etc.) In order for the client to view the sale details (date, percentage off etc.) they need to register for the site. If I allow google bot to crawl the content, (identify the user agent) but serve up a registration light box to anyone who isn't google would this be considered cloaking? Does anyone know what the best practice for this is? Any help would be greatly appreciated. Thank you, Nopadon
Technical SEO | | nopadon0