Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to allow bots to crawl all but WP-content
-
Hello,
I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/User-agent: GoogleBot
Allow: /User-agent: GoogleBot-Mobile
Allow: /User-agent: GoogleBot-Image
Allow: /User-agent: Bingbot
Allow: /User-agent: Slurp
Allow: / -
Thank you for the help, Gaston!
-
Yeap, with that you are allowing every file ending with that extension
-
Can I do so with:
Allow: *.jpg
Allow: *.png
-
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
Oh god, my mistake!
Im deeply sorry, yes, this configuration will block images! that follow that folder structure!I'll correct myself.
Thanks for pointing it out! -
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!
-
Hi Tom,
Yes, this config will allow images to be crawled,
No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png
How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure.
Hope it helps.
Best luck.
GR -
Hi Gaston,
I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?
Thanks!
-
Awesome. Thanks, Gaston!
-
Yes it does.
As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing.
Check the image i've attached.Hope it helps.
Best luck
GR -
Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?
Best
-
Hi Tom!
That Robots.txt config is pretty redundant.
To acheive what you what, thy this:User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Allow: *.js
Allow: *.cssJust 3 things to note here:
1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
3- Those Allow:/ dont prevent the disallow.To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu.
Remember that by default google considers that you are not blocking nothing.
More info here: The web robots.tat pageHope it helps.
Best luck.
GR
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Content and Subdirectories
Hi there and thank you in advance for your help! I'm seeking guidance on how to structure a resources directory (white papers, webinars, etc.) while avoiding duplicate content penalties. If you go to /resources on our site, there is filter function. If you filter for webinars, the URL becomes /resources/?type=webinar We didn't want that dynamic URL to be the primary URL for webinars, so we created a new page with the URL /resources/webinar that lists all of our webinars and includes a featured webinar up top. However, the same webinar titles now appear on the /resources page and the /resources/webinar page. Will that cause duplicate content issues? P.S. Not sure if it matters, but we also changed the URLs for the individual resource pages to include the resource type. For example, one of our webinar URLs is /resources/webinar/forecasting-your-revenue Thank you!
Technical SEO | | SAIM_Marketing0 -
Set Canonical for Paginated Content
Hi Guys, This is a follow up on this thread: http://moz.com/community/q/dynamic-url-parameters-woocommerce-create-404-errors# I would like to know how I can set a canonical link in Wordpress/Woocommerce which points to "View All" on category pages on our webshop.
Technical SEO | | jeeyer
The categories on my website can be viewed as 24/48 or All products but because the quanity constantly changes viewing 24 or 48 products isn't always possible. To point Google in the right direction I want to let them know that "View All" is the best way to go.
I've read that Google's crawler tries to do this automatically but not sure if this is the case on on my website. Here is some more info on the issue: https://support.google.com/webmasters/answer/1663744?hl=en
Thanks for the help! Joost0 -
Duplicate content on job sites
Hi, I have a question regarding job boards. Many job advertisers will upload the same job description to multiple websites e.g. monster, gumtree, etc. This would therefore be viewed as duplicate content. What is the best way to handle this if we want to ensure our particular site ranks well? Thanks in advance for the help. H
Technical SEO | | HiteshP0 -
Solving duplicate content with WP authors, tags, categories
I've been kind of neglecting wordpress installations on my websites and noticed many showing duplicate content for pages showing under author and tags, tags and single post, categories and single post. Should this be a concern? Whats the best way of fixing this? Thanks
Technical SEO | | cgman0 -
Cloaking? Best Practices Crawling Content Behind Login Box
Hi- I'm helping out a client, who publishes sale information (fashion sales etc.) In order for the client to view the sale details (date, percentage off etc.) they need to register for the site. If I allow google bot to crawl the content, (identify the user agent) but serve up a registration light box to anyone who isn't google would this be considered cloaking? Does anyone know what the best practice for this is? Any help would be greatly appreciated. Thank you, Nopadon
Technical SEO | | nopadon0 -
How to resolve this Duplicate content?
Hi , There is page i get when i do proper menu navigation Caratlane.com>jewellery>rings>casualsrings> http://www.caratlane.com/jewellery/rings/casual-rings/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html When i do a site search in my search box by my product code number "JR00219" The same page is appears with different url http://www.caratlane.com/leaves-dew-diamond-0-03-ct-peridot-1-ct-ring-18k-yellow-gold.html So there is a duplicate content. How can we resolve it. Regards, kathir caratlane.com
Technical SEO | | kathiravan0 -
Does Google pass link juice a page receives if the URL parameter specifies content and has the Crawl setting in Webmaster Tools set to NO?
The page in question receives a lot of quality traffic but is only relevant to a small percent of my users. I want to keep the link juice received from this page but I do not want it to appear in the SERPs.
Technical SEO | | surveygizmo0 -
Are recipes excluded from duplicate content?
Does anyone know how recipes are treated by search engines? For example, I know press releases are expected to have lots of duplicates out there so they aren't penalized. Does anyone know if recipes are treated the same way. For example, if you Google "three cheese beef pasta shells" you get the first two results with identical content.
Technical SEO | | RiseSEO0