Skip to content
    Moz logo Menu open Menu close
    • Products
      • Moz Pro
      • Moz Pro Home
      • Moz Local
      • Moz Local Home
      • STAT
      • Moz API
      • Moz API Home
      • Compare SEO Products
      • Moz Data
    • Free SEO Tools
      • Domain Analysis
      • Keyword Explorer
      • Link Explorer
      • Competitive Research
      • MozBar
      • More Free SEO Tools
    • Learn SEO
      • Beginner's Guide to SEO
      • SEO Learning Center
      • Moz Academy
      • MozCon
      • Webinars, Whitepapers, & Guides
    • Blog
    • Why Moz
      • Digital Marketers
      • Agency Solutions
      • Enterprise Solutions
      • Small Business Solutions
      • The Moz Story
      • New Releases
    • Log in
    • Log out
    • Products
      • Moz Pro

        Your all-in-one suite of SEO essentials.

      • Moz Local

        Raise your local SEO visibility with complete local SEO management.

      • STAT

        SERP tracking and analytics for enterprise SEO experts.

      • Moz API

        Power your SEO with our index of over 44 trillion links.

      • Compare SEO Products

        See which Moz SEO solution best meets your business needs.

      • Moz Data

        Power your SEO strategy & AI models with custom data solutions.

      Track AI Overviews in Keyword Research
      Moz Pro

      Track AI Overviews in Keyword Research

      Try it free!
    • Free SEO Tools
      • Domain Analysis

        Get top competitive SEO metrics like DA, top pages and more.

      • Keyword Explorer

        Find traffic-driving keywords with our 1.25 billion+ keyword index.

      • Link Explorer

        Explore over 40 trillion links for powerful backlink data.

      • Competitive Research

        Uncover valuable insights on your organic search competitors.

      • MozBar

        See top SEO metrics for free as you browse the web.

      • More Free SEO Tools

        Explore all the free SEO tools Moz has to offer.

      NEW Keyword Suggestions by Topic
      Moz Pro

      NEW Keyword Suggestions by Topic

      Learn more
    • Learn SEO
      • Beginner's Guide to SEO

        The #1 most popular introduction to SEO, trusted by millions.

      • SEO Learning Center

        Broaden your knowledge with SEO resources for all skill levels.

      • On-Demand Webinars

        Learn modern SEO best practices from industry experts.

      • How-To Guides

        Step-by-step guides to search success from the authority on SEO.

      • Moz Academy

        Upskill and get certified with on-demand courses & certifications.

      • MozCon

        Save on Early Bird tickets and join us in London or New York City

      Unlock flexible pricing & new endpoints
      Moz API

      Unlock flexible pricing & new endpoints

      Find your plan
    • Blog
    • Why Moz
      • Digital Marketers

        Simplify SEO tasks to save time and grow your traffic.

      • Small Business Solutions

        Uncover insights to make smarter marketing decisions in less time.

      • Agency Solutions

        Earn & keep valuable clients with unparalleled data & insights.

      • Enterprise Solutions

        Gain a competitive edge in the ever-changing world of search.

      • The Moz Story

        Moz was the first & remains the most trusted SEO company.

      • New Releases

        Get the scoop on the latest and greatest from Moz.

      Surface actionable competitive intel
      New Feature

      Surface actionable competitive intel

      Learn More
    • Log in
      • Moz Pro
      • Moz Local
      • Moz Local Dashboard
      • Moz API
      • Moz API Dashboard
      • Moz Academy
    • Avatar
      • Moz Home
      • Notifications
      • Account & Billing
      • Manage Users
      • Community Profile
      • My Q&A
      • My Videos
      • Log Out

    The Moz Q&A Forum

    • Forum
    • Questions
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. Home
    2. SEO Tactics
    3. Intermediate & Advanced SEO
    4. Robots.txt: how to exclude sub-directories correctly?

    Moz Q&A is closed.

    After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

    Robots.txt: how to exclude sub-directories correctly?

    Intermediate & Advanced SEO
    3
    10
    53166
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with question management privileges can see it.
    • fablau
      fablau last edited by

      Hello here,

      I am trying to figure out the correct way to tell SEs to crawls this:

      http://www.mysite.com/directory/

      But not this:

      http://www.mysite.com/directory/sub-directory/

      or this:

      http://www.mysite.com/directory/sub-directory2/sub-directory/...

      But with the fact I have thousands of sub-directories with almost infinite combinations, I can't put the following definitions in a manageable way:

      disallow: /directory/sub-directory/

      disallow: /directory/sub-directory2/

      disallow: /directory/sub-directory/sub-directory/

      disallow: /directory/sub-directory2/subdirectory/

      etc...

      I would end up having thousands of definitions to disallow all the possible sub-directory combinations.

      So, is the following way a correct, better and shorter way to define what I want above:

      allow: /directory/$

      disallow: /directory/*

      Would the above work?

      Any thoughts are very welcome! Thank you in advance.

      Best,

      Fab.

      1 Reply Last reply Reply Quote 1
      • MickEdwards
        MickEdwards @sjunaidali last edited by

        I mentioned both.  You add a meta robots to noindex and remove from the sitemap.

        1 Reply Last reply Reply Quote 0
        • sjunaidali
          sjunaidali @MickEdwards last edited by

          But google is still free to index a link/page even if it is not included in xml sitemap.

          MickEdwards 1 Reply Last reply Reply Quote 0
          • MickEdwards
            MickEdwards @sjunaidali last edited by

            Install Yoast Wordpress SEO plugin and use that to restrict what is indexed and what is allowed in a sitemap.

            sjunaidali 1 Reply Last reply Reply Quote 1
            • sjunaidali
              sjunaidali @MickEdwards last edited by

              I am using wordpress, Enfold theme (themeforest).

              I want some files to be accessed by google, but those should not be indexed.

              Here is an example: http://prntscr.com/h8918o

              I have currently blocked some JS directories/files using robots.txt (check screenshot)

              But due to this I am not able to pass Mobile Friendly Test on Google: http://prntscr.com/h8925z (check screenshot)

              Is its possible to allow access, but use a tag like noindex in the robots.txt file. Or is there any other way out.

              MickEdwards 1 Reply Last reply Reply Quote 0
              • fablau
                fablau last edited by

                Yes, everything looks good, Webmaster Tools gave me the expected results with the following directives:

                allow: /directory/$

                disallow: /directory/*

                Which allows this URL:

                http://www.mysite.com/directory/

                But doesn't allow the following one:

                http://www.mysite.com/directory/sub-directory2/...

                This page also gives an update similar to mine:

                https://support.google.com/webmasters/answer/156449?hl=en

                I think I am good! Thanks 🙂

                1 Reply Last reply Reply Quote 2
                • fablau
                  fablau last edited by

                  Thank you Michael, it is my understanding then that my idea of doing this:

                  allow: /directory/$

                  disallow: /directory/*

                  Should work just fine. I will test it within Google Webmaster Tools, and let you know if any problems arise.

                  In the meantime if anyone else has more ideas about all this and can confirm me that would be great!

                  Thank you again.

                  1 Reply Last reply Reply Quote 1
                  • MickEdwards
                    MickEdwards @fablau last edited by

                    I've always stuck to Disallow and followed -

                    "This is currently a bit awkward, as there is no "Allow" field. The easy way is to put all files to be disallowed into a separate directory, say "stuff", and leave the one file in the level above this directory:"

                    http://www.robotstxt.org/robotstxt.html

                    From https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt this seems contradictory

                    | /* | equivalent to / | equivalent to / | Equivalent to "/" -- the trailing wildcard is ignored. |

                    I think this post will be very useful  for you - http://moz.com/community/q/allow-or-disallow-first-in-robots-txt

                    1 Reply Last reply Reply Quote 1
                    • fablau
                      fablau @MickEdwards last edited by

                      Thank you Michael,

                      Google and other SEs actually recognize the "allow:" command:

                      https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

                      The fact is: if I don't specify that, how can I be sure that the following single command:

                      disallow: /directory/*

                      Doesn't prevent SEs to spider the /directory/ index as I'd like to?

                      MickEdwards 1 Reply Last reply Reply Quote 0
                      • MickEdwards
                        MickEdwards last edited by

                        As long as you dont have directories somewhere in /* that you want indexed then I think that will work.  There is no allow so you don't need the first line just

                        disallow: /directory/*

                        You can test out here- https://support.google.com/webmasters/answer/156449?rd=1

                        fablau sjunaidali 2 Replies Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post

                        Got a burning SEO question?

                        Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                        Start my free trial


                        Browse Questions

                        Explore more categories

                        • Moz Tools

                          Chat with the community about the Moz tools.

                        • SEO Tactics

                          Discuss the SEO process with fellow marketers

                        • Community

                          Discuss industry events, jobs, and news!

                        • Digital Marketing

                          Chat about tactics outside of SEO

                        • Research & Trends

                          Dive into research and trends in the search industry.

                        • Support

                          Connect on product support and feature requests.

                        • See all categories

                        Related Questions

                        • vetofunk

                          Robots.txt & Disallow: /*? Question!

                          Hi, I have a site where they have: Disallow: /*? Problem is we need the following indexed: ?utm_source=google_shopping What would the best solution be? I have read: User-agent: *
                          Allow: ?utm_source=google_shopping
                          Disallow: /*? Any ideas?

                          Intermediate & Advanced SEO | | vetofunk
                          0
                        • Dan-Louis

                          URL Structure & Best Practice when Facing 4+ Sub-levels

                          Hi. I've spent the last day fiddling with the setup of a new URL structure for a site, and I can't "pull the trigger" on it. Example: - domain.com/games/type-of-game/provider-name/name-of-game/ Specific example: - arcade.com/games/pinball/deckerballs/starshooter2k/ The example is a good description of the content that I have to organize. The aim is to a) define url structure, b) facilitate good ux, **c) **create a good starting point for content marketing and SEO, avoiding multiple / stuffing keywords in urls'. The problem? Not all providers have the same type of game. Meaning, that once I get past the /type-of-game/, I must write a new category / page / content for /provider-name/. No matter how I switch the different "sub-levels" around in the url, at one point, the provider-name doesn't fit as its in need of new content, multiple times. The solution? I can skip "provider-name". The caveat though is that I lose out on ranking for provider keywords as I don't have a cornerstone content page for them. Question: Using the URL structure as outlined above in WordPress, would you A) go with "Pages", or B) use "Posts"

                          Intermediate & Advanced SEO | | Dan-Louis
                          0
                        • Malika1

                          If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?

                          Hi MOZers, This probably is a dumb question but I have a case where the robots.tags has an image url blocked but this image is used on a page (lets call it Page A) which can be indexed. If the image on Page A has an Alt tags, then how is this information digested by crawlers? A) would Google totally ignore the image and the ALT tags information? OR B) Google would consider the ALT tags information? I am asking this because all the images on the website are blocked by robots.txt at the moment but I would really like website crawlers to crawl the alt tags information. Chances are that I will ask the webmaster to allow indexing of images too but I would like to understand what's happening currently. Looking forward to all your responses 🙂 Malika

                          Intermediate & Advanced SEO | | Malika1
                          1
                        • EvansHunt

                          Wildcarding Robots.txt for Particular Word in URL

                          Hey All, So I know that this isn't a standard robots.txt, I'm aware of how to block or wildcard certain folders but I'm wondering whether it's possible to block all URL's with a certain word in it? We have a client that was hacked a year ago and now they want us to help remove some of the pages that were being autogenerated with the word "viagra" in it. I saw this article and tried implementing it https://builtvisible.com/wildcards-in-robots-txt/ and it seems that I've been able to remove some of the URL's (although I can't confirm yet until I do a full pull of the SERPs on the domain). However, when I test certain URL's inside of WMT it still says that they are allowed which makes me think that it's not working fully or working at all. In this case these are the lines I've added to the robots.txt Disallow: /*&viagra Disallow: /*&Viagra I know I have the solution of individually requesting URL's to be removed from the index but I want to see if anybody has every had success with wildcarding URL's with a certain word in their robots.txt? The individual URL route could be very tedious. Thanks! Jon

                          Intermediate & Advanced SEO | | EvansHunt
                          0
                        • YairSpolter

                          Block in robots.txt instead of using canonical?

                          When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?

                          Intermediate & Advanced SEO | | YairSpolter
                          0
                        • HD_Leona

                          Blocking Pages Via Robots, Can Images On Those Pages Be Included In Image Search

                          Hi! I have pages within my forum where visitors can upload photos.  When they upload photos they provide a simple statement about the photo but no real information about the image,definitely not enough for the page to be deemed worthy of being indexed.  The industry however is one that really leans on images and having the images in Google Image search is important to us. The url structure is like such:  domain.com/community/photos/~username~/picture111111.aspx I wish to block the whole folder from Googlebot to prevent these low quality pages from being added to Google's main SERP results.  This would be something like this: User-agent: googlebot Disallow: /community/photos/ Can  I disallow Googlebot specifically rather than just using User-agent:  * which would then allow googlebot-image to pick up the photos?  I plan on configuring a way to add meaningful alt attributes and image names to assist in visibility, but the actual act of blocking the pages and getting the images picked up... Is this possible? Thanks! Leona

                          Intermediate & Advanced SEO | | HD_Leona
                          0
                        • knowyourbank

                          URL Structure for Directory Site

                          We have a directory that we're building and we're not sure if we should try to make each page an extension of the root domain or utilize sub-directories as users narrow down their selection. What is the best practice here for maximizing your SERP authority? Choice #1 - Hyphenated Architecture (no sub-folders): State Page /state/ City Page /city-state/ Business Page /business-city-state/
                          4) Location Page  /locationname-city-state/ or.... Choice #2 - Using sub-folders on drill down: State Page /state/ City Page /state/city Business Page /state/city/business/
                          4) Location Page  /locationname-city-state/ Again, just to clarify, I need help in determining what the best methodology is for achieving the greatest SEO benefits. Just by looking it would seem that choice #1 would work better because the URL's are very clear and SEF. But, at the same time it may be less intuitive for search. I'm not sure. What do you think?

                          Intermediate & Advanced SEO | | knowyourbank
                          0
                        • Peter264

                          All page files in root? Or to use directories?

                          We have thousands of pages on our website; news articles, forum topics, download pages... etc - and at present they all reside in the root of the domain /. For example: /aosta-valley-i6816.html
                          /flight-sim-concorde-d1101.html
                          /what-is-best-addon-t3360.html We are considering moving over to a new URL system where we use directories.  For example, the above URLs would be the following: /images/aosta-valley-i6816.html
                          /downloads/flight-sim-concorde-d1101.html
                          /forums/what-is-best-addon-t3360.html Would we have any benefit in using directories for SEO purposes?  Would our current system perhaps mean too many files in the root / flagging as spammy?  Would it be even better to use the following system which removes file endings completely and suggests each page is a directory: /images/aosta-valley/6816/
                          /downloads/flight-sim-concorde/1101/
                          /forums/what-is-best-addon/3360/ If so, what would be better: /images/aosta-valley/6816/ or /images/6816/aosta-valley/ Just looking for some clarity to our problem! Thank you for your help guys!

                          Intermediate & Advanced SEO | | Peter264
                          0

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy

                        Looks like your connection to Moz was lost, please wait while we try to reconnect.