Skip to content
    Moz logo Menu open Menu close
    • Products
      • Moz Pro
      • Moz Pro Home
      • Moz Local
      • Moz Local Home
      • STAT
      • Moz API
      • Moz API Home
      • Compare SEO Products
      • Moz Data
    • Free SEO Tools
      • Domain Analysis
      • Keyword Explorer
      • Link Explorer
      • Competitive Research
      • MozBar
      • More Free SEO Tools
    • Learn SEO
      • Beginner's Guide to SEO
      • SEO Learning Center
      • Moz Academy
      • SEO Q&A
      • Webinars, Whitepapers, & Guides
    • Blog
    • Why Moz
      • Agency Solutions
      • Enterprise Solutions
      • Small Business Solutions
      • Case Studies
      • The Moz Story
      • New Releases
    • Log in
    • Log out
    • Products
      • Moz Pro

        Your all-in-one suite of SEO essentials.

      • Moz Local

        Raise your local SEO visibility with complete local SEO management.

      • STAT

        SERP tracking and analytics for enterprise SEO experts.

      • Moz API

        Power your SEO with our index of over 44 trillion links.

      • Compare SEO Products

        See which Moz SEO solution best meets your business needs.

      • Moz Data

        Power your SEO strategy & AI models with custom data solutions.

      NEW Keyword Suggestions by Topic
      Moz Pro

      NEW Keyword Suggestions by Topic

      Learn more
    • Free SEO Tools
      • Domain Analysis

        Get top competitive SEO metrics like DA, top pages and more.

      • Keyword Explorer

        Find traffic-driving keywords with our 1.25 billion+ keyword index.

      • Link Explorer

        Explore over 40 trillion links for powerful backlink data.

      • Competitive Research

        Uncover valuable insights on your organic search competitors.

      • MozBar

        See top SEO metrics for free as you browse the web.

      • More Free SEO Tools

        Explore all the free SEO tools Moz has to offer.

      NEW Keyword Suggestions by Topic
      Moz Pro

      NEW Keyword Suggestions by Topic

      Learn more
    • Learn SEO
      • Beginner's Guide to SEO

        The #1 most popular introduction to SEO, trusted by millions.

      • SEO Learning Center

        Broaden your knowledge with SEO resources for all skill levels.

      • On-Demand Webinars

        Learn modern SEO best practices from industry experts.

      • How-To Guides

        Step-by-step guides to search success from the authority on SEO.

      • Moz Academy

        Upskill and get certified with on-demand courses & certifications.

      • SEO Q&A

        Insights & discussions from an SEO community of 500,000+.

      Unlock flexible pricing & new endpoints
      Moz API

      Unlock flexible pricing & new endpoints

      Find your plan
    • Blog
    • Why Moz
      • Small Business Solutions

        Uncover insights to make smarter marketing decisions in less time.

      • Agency Solutions

        Earn & keep valuable clients with unparalleled data & insights.

      • Enterprise Solutions

        Gain a competitive edge in the ever-changing world of search.

      • The Moz Story

        Moz was the first & remains the most trusted SEO company.

      • Case Studies

        Explore how Moz drives ROI with a proven track record of success.

      • New Releases

        Get the scoop on the latest and greatest from Moz.

      Surface actionable competitive intel
      New Feature

      Surface actionable competitive intel

      Learn More
    • Log in
      • Moz Pro
      • Moz Local
      • Moz Local Dashboard
      • Moz API
      • Moz API Dashboard
      • Moz Academy
    • Avatar
      • Moz Home
      • Notifications
      • Account & Billing
      • Manage Users
      • Community Profile
      • My Q&A
      • My Videos
      • Log Out

    The Moz Q&A Forum

    • Forum
    • Questions
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. Home
    2. SEO Tactics
    3. Technical SEO
    4. Allow or Disallow First in Robots.txt

    Moz Q&A is closed.

    After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

    Allow or Disallow First in Robots.txt

    Technical SEO
    7
    12
    30756
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with question management privileges can see it.
    • irvingw
      irvingw last edited by

      If I want to override a Disallow directive in robots.txt with an Allow command, do I have the Allow command before or after the Disallow command?

      example:

      Allow: /models/ford///page*

      Disallow: /models////page

      1 Reply Last reply Reply Quote 0
      • Net66SEO
        Net66SEO last edited by

        Just caught this a bit late and probably to late to add something but my two pence is test it in Webmaster Tools, via Crawl -> Robot.txt tester - if you've not used this before simply add the url you want to test and Google highlights the directive that allows or disallows it.

        1 Reply Last reply Reply Quote 0
        • fablau
          fablau @Cyrus-Shepard last edited by

          Thank you Cyrus, yes, I have tried your suggested robots.txt checker and despite it validates the file, it shows me a couple of warnings about the "unusual" use of wildcard. It is my understanding that I would probably need to discuss all this with Google folks directly.

          Thank you for you answer... and, yes Keri, I know this is a old thread, but still useful today!

          Thanks 🙂

          1 Reply Last reply Reply Quote 0
          • Cyrus-Shepard
            Cyrus-Shepard @fablau last edited by

            Can't say with 100% confidence, but sounds like it might work. You could always upload it to a server and use a robots.txt checker to validate, although sometimes the validator tools may incorporate slight differences in edge cases like this that make them moot.

            fablau 1 Reply Last reply Reply Quote 1
            • KeriMorgret
              KeriMorgret @fablau last edited by

              Just a quick note, this question is actually from spring of 2012.

              1 Reply Last reply Reply Quote 0
              • fablau
                fablau last edited by

                What about something like:

                allow: /directory/$

                disallow: /directory/*

                Where I want this to be indexed:

                http://www.mysite.com/directory/

                But not this:

                http://www.mysite.com/directory/sub-directory/

                Ideas?

                KeriMorgret Cyrus-Shepard 2 Replies Last reply Reply Quote 0
                • irvingw
                  irvingw @Cyrus-Shepard last edited by

                  I really appreciate all that effort you put in to ensure your method was correct. many thanks.

                  1 Reply Last reply Reply Quote 0
                  • Cyrus-Shepard
                    Cyrus-Shepard last edited by

                    Interesting question - I've had this discussion a couple of times with different SEOs. Here's my best understanding: There are actually 2 different answers - one if you are talking about Google, and one for every other search engine.

                    For most search engines, the "Allow" should come first. This is because the first matching pattern always wins, for the reasons Geoff stated.

                    But Google is different. They state:

                    "At a group-member level, in particular for allow and disallow directives, the most specific rule based on the length of the [path] entry will trump the less specific (shorter) rule. The order of precedence for rules with wildcards is undefined."

                    Robots.txt Specifications - Webmasters — Google Developers

                    So for Google, order is not important, only the specificity of the rule based on the length of the entry. But the order of precedence for rules with wildcards is undefined.

                    This last part is important, because your directives contain wildcards. If I'm reading this right, your particular directives:

                    Allow: /models/ford///page*

                    Disallow: /models////pageSo if it's "undefined" which directive will Google follow, if order isn't important? Fortunately, there's a simple way to find out.Google Webmaster allows you to test any robots.txt file. I created a dummy file based on your rules, In this case, your directives worked perfectly no matter what order I put them in.

                    | http://cyrusshepard.com/models/ford/test/test/pages | Allowed by line 2: Allow: /models/ford///page* | Allowed by line 2: Allow: /models/ford///page* |
                    | http://cyrusshepard.com/models/chevy/test/test/pages | Blocked by line 3: Disallow: /models////page | Blocked by line 3: Disallow: /models////page |

                    So, to summarize:1. Always put Allow directives first, as most search engines follow the "first rule counts" rule.2. Google doesn't care about order, but rather the specificity based on the length of the entry.3. The order of precedence for rules with wildcards is undefined.4. When in doubt, check your robots.txt file in Google Webmaster tools.Hope this helps.(sorry for the very long answer which basically says you were right all along 🙂

                    irvingw 1 Reply Last reply Reply Quote 3
                    • NakulGoyal
                      NakulGoyal @irvingw last edited by

                      I understand your concern. I am basing my answer based on the fact that if you don't have a robots.txt at all, Google will still crawl you, which means its an allow by default. So all that matters in my opinion is the disallow, but because you need an allow from the wildcard disallow, you could allow that and disallow next.

                      Honestly, I don't think it matters. If you think the way a bot would work, it's not like robots.txt 1 line is read, then the bot goes crawling and then comes back reads the next line and so on. Does that make sense ? It reads all the lines in the robots.txt and then follows the directives. But to be sure, you can do either of the scenarios and see for yourself. I am sure the results would be same either way.

                      1 Reply Last reply Reply Quote 1
                      • zigojacko
                        zigojacko last edited by

                        The allow directives need to come before the disallow directives for the same directory/file paths. (I have never personally tested this although it makes logical sense to instruct a robot to access one particular path within a directory structure before it sees that it is blocked from crawling that directory).

                        For example:-

                        Allow: /profiles

                        Disallow: /s2/profiles/me

                        Allow: /s2/profiles

                        Allow: /s2/photos

                        Allow: /s2/static

                        Disallow: /s2

                        As per how Google have formatted their robots.txt.

                        1 Reply Last reply Reply Quote 2
                        • irvingw
                          irvingw @NakulGoyal last edited by

                          Thanks. I want to make sure I get this right in a syntax universally understood by all engines. I have seen webmasters all over the place on this one with some saying that crawlers use a first matching rule and others that say that crawlers use a last matching rule. I am almost thinking to have the allow command twice - before and after, to cover all bases.

                          NakulGoyal 1 Reply Last reply Reply Quote 0
                          • NakulGoyal
                            NakulGoyal last edited by

                            I don't think it matters, but I think I would disallow first, because by default everything is an Allow.

                            irvingw 1 Reply Last reply Reply Quote 0
                            • 1 / 1
                            • First post
                              Last post

                            Got a burning SEO question?

                            Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                            Start my free trial


                            Browse Questions

                            Explore more categories

                            • Moz Tools

                              Chat with the community about the Moz tools.

                            • SEO Tactics

                              Discuss the SEO process with fellow marketers

                            • Community

                              Discuss industry events, jobs, and news!

                            • Digital Marketing

                              Chat about tactics outside of SEO

                            • Research & Trends

                              Dive into research and trends in the search industry.

                            • Support

                              Connect on product support and feature requests.

                            • See all categories

                            Related Questions

                            • LabeliumUSA

                              Robot.txt : How to block a specific file type in several subdirectories ?

                              Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.

                              Technical SEO | | LabeliumUSA
                              0
                            • kcb8178

                              Is there a limit to how many URLs you can put in a robots.txt file?

                              We have a site that has way too many urls caused by our crawlable faceted navigation.  We are trying to purge 90% of our urls from the indexes.  We put no index tags on the url combinations that we do no want indexed anymore, but it is taking google way too long to find the no index tags.  Meanwhile we are getting hit with excessive url warnings and have been it by Panda. Would it help speed the process of purging urls if we added the urls to the robots.txt file?  Could this cause any issues for us?  Could it have the opposite effect and block the crawler from finding the urls, but not purge them from the index? The list could be in excess of 100MM urls.

                              Technical SEO | | kcb8178
                              0
                            • Mark_Ginsberg

                              Blocking Affiliate Links via robots.txt

                              Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site  as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark

                              Technical SEO | | Mark_Ginsberg
                              0
                            • TalkInThePark

                              Googlebot does not obey robots.txt disallow

                              Hi Mozzers! We are trying to get Googlebot to steer away from our internal search results pages by adding a parameter "nocrawl=1" to facet/filter links and then robots.txt disallow all URLs containing that parameter. We implemented this late august and since that, the GWMT message "Googlebot found an extremely high number of URLs on your site", stopped coming. But today we received yet another. The weird thing is that Google gives many of our nowadays robots.txt disallowed URLs as examples of URLs that may cause us problems. What could be the reason? Best regards, Martin

                              Technical SEO | | TalkInThePark
                              0
                            • MirandaP

                              How to allow googlebot past paywall

                              Does anyone know of any ways or ideas to allow Google/Bing etc. to index your content, but have it behind a paywall for users?

                              Technical SEO | | MirandaP
                              0
                            • joshcanhelp

                              Invisible robots.txt?

                              So here's a weird one... Client comes to me for some simple changes, turns out there are some major issues with the site, one of which is that none of the correct content pages are showing up in Google, just ancillary (outdated) ones. Looks like an issue because even the main homepage isn't showing up with a "site:domain.com" So, I add to Webmaster Tools and, after an hour or so, I get the red bar of doom, "robots.txt is blocking important pages." I check it out in Webmasters and, sure enough, it's a "User agent: * Disallow /" ACK! But wait... there's no robots.txt to be found on the server. I can go to domain.com/robots.txt and see it but nothing via FTP. I upload a new one and, thankfully, that is now showing but I've never seen that before. Question is: can a robots.txt file be stored in a way that can't be seen? Thanks!

                              Technical SEO | | joshcanhelp
                              0
                            • nicole.healthline

                              Is blocking RSS Feeds with robots.txt necessary?

                              Is it necessary to block an rss feed with robots.txt? It seems they are automatically not indexed (http://googlewebmastercentral.blogspot.com/2007/12/taking-feeds-out-of-our-web-search.html) And, google says here that it's important not to block RSS feeds (http://googlewebmastercentral.blogspot.com/2009/10/using-rssatom-feeds-to-discover-new.html) I'm just checking!

                              Technical SEO | | nicole.healthline
                              0
                            • JordanJudson

                              Should I set up a disallow in the robots.txt for catalog search results?

                              When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?

                              Technical SEO | | JordanJudson
                              0

                            Get started with Moz Pro!

                            Unlock the power of advanced SEO tools and data-driven insights.

                            Start my free trial
                            Products
                            • Moz Pro
                            • Moz Local
                            • Moz API
                            • Moz Data
                            • STAT
                            • Product Updates
                            Moz Solutions
                            • SMB Solutions
                            • Agency Solutions
                            • Enterprise Solutions
                            Free SEO Tools
                            • Domain Authority Checker
                            • Link Explorer
                            • Keyword Explorer
                            • Competitive Research
                            • Brand Authority Checker
                            • Local Citation Checker
                            • MozBar Extension
                            • MozCast
                            Resources
                            • Blog
                            • SEO Learning Center
                            • Help Hub
                            • Beginner's Guide to SEO
                            • How-to Guides
                            • Moz Academy
                            • API Docs
                            About Moz
                            • About
                            • Team
                            • Careers
                            • Contact
                            Why Moz
                            • Case Studies
                            • Testimonials
                            Get Involved
                            • Become an Affiliate
                            • MozCon
                            • Webinars
                            • Practical Marketer Series
                            • MozPod
                            Connect with us

                            Contact the Help team

                            Join our newsletter
                            Moz logo
                            © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                            • Accessibility
                            • Terms of Use
                            • Privacy

                            Looks like your connection to Moz was lost, please wait while we try to reconnect.