Block parent folder in robot.txt, but not children
-
Example:
I want to block this URL (which shows up in Webmaster Tools as an error):
http://www.siteurl.com/news/events-calendar/usa
But not this:
-
The idea from Andrew is nice, but my guess would be that you're targeting multiple events so that might run into issues. What you could do is add some more regular expression and make it like this:
Disallow: ^/news/events-calendar/usa$
-
You could use "allow" in your robots.txt file for just this problem.
allow: news/events-calendar/usa/event-name
disallow: /news/events-calendar/usa
See the allow directive section of this page: https://en.wikipedia.org/wiki/Robots_exclusion_standard
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Reason for robots.txt file blocking products on category pages?
Hi I have a website with thosands of products. On the category pages, all the products are linked to with the code “?cgid” in the URL. But “?cgid” is also blocked in the robots.txt file for some reason. So I'm thinking it's stopping all my products getting crawled by Google. Am I right here? Is there any reason why a website would want to limit so many URL's? I'm only here a week and the sites getting great traffic, so don't want to go breaking it!!! Thanks
Web Design | | Frankie-BTDublin0 -
Best Location for Copy Block
We are having discussions around the appropriate location to place the SEO copy block on an eCommerce category page. Would like to get the communities opinion to share with the creative team.
Web Design | | TukTown0 -
Disallow: /sr/ and Disallow: /si/ - robots.txt
Hello Mozzers - I have come across the two directives above in a robots.txt file of a website - the web dev isn't sure what they meant although he implemented robots.txt - I think just legacy stuff that nobody has analysed for years - I vaguely recall sr means search request but can't remember. If any of you know what these directives do, then please let me know.
Web Design | | McTaggart0 -
Google tag manager on blocked beta site - will it phone home to Google and cause site to get indexed?
We want to develop a beta site, in a directory with the robots.txt blocking bots. We want to include the Google Tag Manager tags and event layer tracking code on this beta site. My question is that by including the Google Tag Manager code, that phones home to Google, will it cause Google to index this beta site when we don't want it indexed?
Web Design | | CFSSEO0 -
Best way to move blog from subdomain to folder?
Hey all, Our company has 4 product websites, and each has its own separate blog. They are currently set up as subdomain blogs (blog.company.com) hosted on wordpress.com, but I would like to transition them over to root folders (company.com/blog) in order to improve accessibility and SEO. What is the best way to go about doing this? Should I continue to host the blogs on wordpress or are there better options? Would I migrate the blog posts over or just redirect? I'd like to get a general framework/plan of action going in order to know what to expect. Thanks!
Web Design | | kslusarski0 -
How to fix and issue with robot.txt ?
I am receiving the following error message through webmaster tools http://www.sourcemarketingdirect.com/: Googlebot can't access your site Oct 26, 2012
Web Design | | skehoe
Over the last 24 hours, Googlebot encountered 35 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. The site has dropped out of Google search.0 -
Search directory - How to apply robots
Hi. On the site I'm working on, we use a search directory to display our search results. It displays as follows - Mydomain.com/search-results/# With the dynamic search results appearing after the hash tag. Because of the structure of the website, many of the lefthand nav defers back to this directory. I know that most websites "noindex, nofollow" the search results pages, but due to the ease of customers generating them, I'm afraid that if I do this, we'll miss out on the inevitable links customers will provide...and, even though it's just the main search directory, these links will still help my domain. The search is all java-generated so there's nothing for spiders to follow within this directory - save the standard category nav. How should I handle this? Thanks.
Web Design | | Blenny0 -
Correct use for Robots.txt
I'm in the process of building a website and am experimenting with some new pages. I don't want search engines to begin crawling the site yet. I would like to add the Robot.txt on my pages that I don't want them to crawl. If I do this, can I remove it later and get them to crawl those pages?
Web Design | | EricVallee340