undefined
Skip to content
Moz logo Menu open Menu close
  • Products
    • Moz Pro
    • Moz Pro Home
    • Moz Local
    • Moz Local Home
    • STAT
    • Moz API
    • Moz API Home
    • Compare SEO Products
    • Moz Data
  • Free SEO Tools
    • Domain Analysis
    • Keyword Explorer
    • Link Explorer
    • Competitive Research
    • MozBar
    • More Free SEO Tools
  • Learn SEO
    • Beginner's Guide to SEO
    • SEO Learning Center
    • Moz Academy
    • SEO Q&A
    • Webinars, Whitepapers, & Guides
  • Blog
  • Why Moz
    • Agency Solutions
    • Enterprise Solutions
    • Small Business Solutions
    • Case Studies
    • The Moz Story
    • New Releases
  • Log in
  • Log out
  • Products
    • Moz Pro

      Your all-in-one suite of SEO essentials.

    • Moz Local

      Raise your local SEO visibility with complete local SEO management.

    • STAT

      SERP tracking and analytics for enterprise SEO experts.

    • Moz API

      Power your SEO with our index of over 44 trillion links.

    • Compare SEO Products

      See which Moz SEO solution best meets your business needs.

    • Moz Data

      Power your SEO strategy & AI models with custom data solutions.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Free SEO Tools
    • Domain Analysis

      Get top competitive SEO metrics like DA, top pages and more.

    • Keyword Explorer

      Find traffic-driving keywords with our 1.25 billion+ keyword index.

    • Link Explorer

      Explore over 40 trillion links for powerful backlink data.

    • Competitive Research

      Uncover valuable insights on your organic search competitors.

    • MozBar

      See top SEO metrics for free as you browse the web.

    • More Free SEO Tools

      Explore all the free SEO tools Moz has to offer.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Learn SEO
    • Beginner's Guide to SEO

      The #1 most popular introduction to SEO, trusted by millions.

    • SEO Learning Center

      Broaden your knowledge with SEO resources for all skill levels.

    • On-Demand Webinars

      Learn modern SEO best practices from industry experts.

    • How-To Guides

      Step-by-step guides to search success from the authority on SEO.

    • Moz Academy

      Upskill and get certified with on-demand courses & certifications.

    • MozCon

      Save on Early Bird tickets and join us in London or New York City

    Unlock flexible pricing & new endpoints
    Moz API

    Unlock flexible pricing & new endpoints

    Find your plan
  • Blog
  • Why Moz
    • Small Business Solutions

      Uncover insights to make smarter marketing decisions in less time.

    • Agency Solutions

      Earn & keep valuable clients with unparalleled data & insights.

    • Enterprise Solutions

      Gain a competitive edge in the ever-changing world of search.

    • The Moz Story

      Moz was the first & remains the most trusted SEO company.

    • Case Studies

      Explore how Moz drives ROI with a proven track record of success.

    • New Releases

      Get the scoop on the latest and greatest from Moz.

    Surface actionable competitive intel
    New Feature

    Surface actionable competitive intel

    Learn More
  • Log in
    • Moz Pro
    • Moz Local
    • Moz Local Dashboard
    • Moz API
    • Moz API Dashboard
    • Moz Academy
  • Avatar
    • Moz Home
    • Notifications
    • Account & Billing
    • Manage Users
    • Community Profile
    • My Q&A
    • My Videos
    • Log Out

The Moz Q&A Forum

  • Forum
  • Questions
  • Users
  • Ask the Community

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

  1. Home
  2. SEO Tactics
  3. Intermediate & Advanced SEO
  4. What happens to crawled URLs subsequently blocked by robots.txt?

Moz Q&A is closed.

After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

What happens to crawled URLs subsequently blocked by robots.txt?

Intermediate & Advanced SEO
3
6
3.1k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as question
Log in to reply
This topic has been deleted. Only users with question management privileges can see it.
  • AspenFasteners
    AspenFasteners Subscriber last edited by Jun 29, 2021, 4:36 PM

    We have a very large store with 278,146 individual product pages. Since these are all various sizes and packaging quantities of less than 200 product categories my feeling is that Google would be better off making sure our category pages are indexed.

    I would like to block all product pages via robots.txt until we are sure all category pages are indexed, then unblock them. Our product pages rarely change, no ratings or product reviews so there is little reason for a search engine to revisit a product page.

    The sales team is afraid blocking a previously indexed product page will result in in it being removed from the Google index and would prefer to submit the categories by hand, 10 per day via requested crawling.

    Which is the better practice?

    terentyev 1 Reply Last reply Jul 4, 2021, 10:37 AM Reply Quote 1
    • seoelevated
      seoelevated Subscriber @AspenFasteners last edited by Jul 27, 2021, 9:02 PM

      @aspenfasteners To my understanding, disallowing a page or folder in robots.txt does not remove pages from Google's index. It merely gives a directive to not crawl those pages/folders. In fact, when pages are accidentally indexed and one wants to remove them from the index, it is important to actually NOT disallow them in robots.txt, so that Google can crawl those pages and discover the meta NOINDEX tags on the pages. The meta NOINDEX tags are the directive to remove a page from the index, or to not index it in the first place. This is different than a robots.txt directive, whcih is intended to allow or disallow crawling. Crawling does not equal indexing.

      So, you could keep the pages indexable, and simply block them in your robots.txt file, if you want. If they've already been indexed, they should not disappear quickly (they might, over time though). BUT if they haven't been indexed yet, this would prevent them from being discovered.

      All of that said, from reading your notes, I don't think any of this is warranted. The speed at which Google discovers pages on a website is very fast. And existing indexed pages shouldn't really get in the way of new discovery. In fact, they might help the category pages be discovered, if they contain links to the categories.

      I would create a categories sitemap xml file, link to that in your robots.txt, and let that do the work of prioritizing the categories for crawling/discovery and indexation.

      1 Reply Last reply Reply Quote 0
      • terentyev
        terentyev @AspenFasteners last edited by Jul 6, 2021, 7:55 AM

        @aspenfasteners to answer your question: "do we KNOW that Google will immediately de-index URL's blocked by robots.txt?"

        Google will not immediately de-index URLs that are blocked by robots.txt, based on my experience. I've dealt with very similar situation but with much greater scale - around 8M automatically generated pages that got into Google index. It may take a year or more to de-index these pages completely. Of course, every case is different, but based on my understanding, if you block these low-quality product pages, Google will slowly start re-evaluating these pages, and it will start with the ones that get some traffic.

        Here is what happens when Google re-evaluates your individual product pages:

        When deciding, whether to keep a page in its index or not, Google takes into account multiple factors, and one of the most important ones is how many backlinks (both internal and external) are leading to a page. Other factors - content quality, if the page is similar or duplicate to another page, Core Web Vitals score, amount of your crawl budget, and, of course, external backlinks (which is irrelevant for your case).

        If you are afraid of loosing some traffic that comes to these product pages, or you have other concerns, just do a smaller experiment: take a sample of 1000-2000 pages, block them in robots.txt or by adding meta robots "noindex, follow" directive, and observe Google's reaction in 1-6 weeks, depending on your crawl budget.

        Another thing to check:

        If you use Screaming Frog, it has a nice feature to show internal pagerank and the number of internal incoming links that lead to every page. As a rule of thumb, if an individual product page has at least 10 internal incoming links from canonicalized pages, there is a high probability it will get indexed.

        1 Reply Last reply Reply Quote 0
        • AspenFasteners
          AspenFasteners Subscriber @terentyev last edited by Jul 5, 2021, 2:07 PM

          @terentyev - sorry, can't edit my questions once submitted and I wait for approval (why?) the statement should read my question SHOULD be very specific, whereas my original question was much more general - you answered that question very nicely. Sorry for any misunderstanding

          terentyev seoelevated 2 Replies Last reply Jul 27, 2021, 9:02 PM Reply Quote 0
          • AspenFasteners
            AspenFasteners Subscriber @terentyev last edited by Jul 5, 2021, 1:53 PM

            @terentyev thanks for the reply. We have no reason to believe these URL's are backlinked. These aren't consumer products that individual are interested in, our site is a wholesale B2B selling very narrow categories in bulk quantities typically for manufacturing. Therefore, almost zero chance for backlinks anywhere for something as specific as a particular size/material/package quantity of a product.

            We have already initiated a canonicalization project started but we are stuck between two concerns from sales, 1) we can't wait for canonicalization (which is complex) we need sales now and 2) don't touch robots.txt because MAYBE the individual products are indexed.

            So that is why my question is very specific - do we KNOW that Google will immediately de-index URL's blocked by robots.txt?

            1 Reply Last reply Reply Quote 0
            • terentyev
              terentyev @AspenFasteners last edited by Jul 4, 2021, 10:37 AM

              @aspenfasteners thanks for interesting question.
              to summarize my understanding:

              1. you have ~300K individual product pages, many of them are duplicates; eg. a single product can have multiple characteristics (eg. size or quantity) but the pages are essentially the same.
              2. your goal is to index 200 product categories that contain a collection of these products, and remove the low-quality duplicate individual pages from Google index in the long run.
              3. my assumption is that these 300K product pages have been historically accumulating some backlinks, which is one of the reasons why they are indexed.

              If I am right about the 1 and 2, then you should not block these individual product pages, but rather add canonical URLs to them, which should point to the respective category page that you want to get indexed.

              Once you have these canonicals implemented, you should wait for a few months or more for Google to pass the link equity to your 200 product category pages, and once it is done, you are free to block them from indexing on robots.txt + meta tag on the page itself, and maybe even x-robots-tag. The way how to block them - it is a different discussion. Let me know if you want to learn more on the best approach.

              So, here is my checklist for this URL migration:

              1. add canonicals pointing from product pages to category pages.
              2. make sure that all category pages are well interlinked between each other, and the individual product pages are linked to several category pages (eg. a product A should be linked to category A, and also to similar categories B & C). As a rule of thumb, make sure that each category page has at least 10 incoming links from other category pages.
              3. Make sure that all these category pages are linked from your homepage
              4. Make sure that sitemap contains only self-canonicalized pages.
              5. Make sure that these category pages have good core web vitals metrics, compared to your competitors on SERP.
              6. In 2-3 months, when you see that Google indexes the category pages, and crawling of product pages have been reduced significantly, and the ranks of the category pages have gone up, it is ok to block these 300K pages from crawling.

              As to manually submitting the categories by hand, I doubt it will help, especially if the product pages have a lot of backlinks. I've seen many cases when Google disregards the robots.txt directives if a page has good backlinks and traffic.

              AspenFasteners 2 Replies Last reply Jul 5, 2021, 2:07 PM Reply Quote 0
              • 1 / 1
              1 out of 6
              • First post
                1/6
                Last post

              Got a burning SEO question?

              Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


              Start my free trial


              Browse Questions

              Explore more categories

              • Moz Tools

                Chat with the community about the Moz tools.

              • SEO Tactics

                Discuss the SEO process with fellow marketers

              • Community

                Discuss industry events, jobs, and news!

              • Digital Marketing

                Chat about tactics outside of SEO

              • Research & Trends

                Dive into research and trends in the search industry.

              • Support

                Connect on product support and feature requests.

              • See all categories

              Related Questions

              • viatrading1

                Inactive Products - Inactive URLs

                Hi, In our website www.viatrading.com we have many products that might be in stock or not depending on availability. Until now, when a product was not available anymore, we took this page down (and redirected to its product category page). And, only if the product was available again, we re-activated the URL - this might be days, months or even years later. To make this more SEO-friendly, we decided now that while a product is not available, instead or deactivating/redirecting the page, we will leave it online and just add a message saying "This product is currently not available". If we do this, we will automatically re-activate about 500 products pages at once. 1. Just to make sure, is it harmful for SEO to keep activating/deactivating URLs this way? 2. Since most of these pages have been deindexed for a long time due to being redirected - have they lost all their SEO juice? 3. How can we better activate these old 500 pages - is it ok activating them all at once? Thank you,

                Intermediate & Advanced SEO | Sep 30, 2016, 5:00 PM | viatrading1
                1
              • Jonathan.Smith

                Should I include URLs that are 301'd or only include 200 status URLs in my sitemap.xml?

                I'm not sure if I should be including old URLs (content) that are being redirected (301) to new URLs (content) in my sitemap.xml. Does anyone know if it is best to include or leave out 301ed URLs in a xml sitemap?

                Intermediate & Advanced SEO | Feb 10, 2016, 11:17 PM | Jonathan.Smith
                0
              • ennovators

                Replace dynamic paramenter URLs with static Landing Page URL - faceted navigation

                Hi there, got a quick question regarding faceted navigation. If a specific filter (facet) seems to be quite popular for visitors. Does it make sense to replace a dynamic URL e.x http://www.domain.com/pants.html?a_type=239 by a static, more SEO friendly URL e.x http://www.domain.com/pants/levis-pants.html by creating a proper landing page for it. I know, that it is nearly impossible to replace all variations of this parameter URLs by static ones but does it generally make sense to do this for the most popular facets choose by visitors. Or does this cause any issues? Any help is much appreciated. Thanks a lot in advance

                Intermediate & Advanced SEO | Jul 9, 2015, 5:26 PM | ennovators
                0
              • morg45454

                Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?

                my site uses is set up at http://www.site.com I have my site redirected from non- www to the www in htacess file. My question is... what should my robots.txt file look like for the non-www site? Do you block robots from crawling the site like this? Or do you leave it blank? User-agent: * Disallow: / Sitemap: http://www.morganlindsayphotography.com/sitemap.xml Sitemap: http://www.morganlindsayphotography.com/video-sitemap.xml

                Intermediate & Advanced SEO | Jun 17, 2015, 5:48 PM | morg45454
                0
              • MiguelSalcido

                Linking to URLs With Hash (#) in Them

                How does link juice flow when linking to URLs with the hash tag in them? If I link to this page, which generates a pop-over on my homepage that gives info about my special offer, where will the link juice go to? homepage.com/#specialoffer Will the link juice go to the homepage? Will it go nowhere? Will it go to the hash URL above? I'd like to publish an annual/evergreen sort of offer that will generate lots of links. And instead of driving those links to homepage.com/offer, I was hoping to get that link juice to flow to the homepage, or maybe even a product page, instead. And just updating the pop over information each year as the offer changes. I've seen competitors do it this way but wanted to see what the community here things in terms of linking to URLs with the hash tag in them. Can also be a use case for using hash tags in URLs for tracking purposes maybe?

                Intermediate & Advanced SEO | Apr 22, 2015, 5:06 PM | MiguelSalcido
                0
              • NelsonF

                Will a disclaimer affect Crawling?

                Hello everyone! My German users will have to get a disclaimer according to German laws, now my question is the following: Will a disclaimer affect crawling? What's the best practice to have regarding this? Should I have special care in this? What's the best disclaimer technique? A Plain HTML page? Something overlapping the site? Thank you all!

                Intermediate & Advanced SEO | Jun 12, 2014, 5:38 AM | NelsonF
                0
              • monster99

                How to Disallow Tag Pages With Robot.txt

                Hi i have a site which i'm dealing with that has tag pages for instant - http://www.domain.com/news/?tag=choice How can i exclude these tag pages (about 20+ being crawled and indexed by the search engines with robot.txt Also sometimes they're created dynamically so i want something which automatically excludes tage pages from being crawled and indexed. Any suggestions? Cheers, Mark

                Intermediate & Advanced SEO | Nov 1, 2012, 11:24 PM | monster99
                0
              • AU-SEO

                Brackets in a URL String

                Was talking with a friend about this the other day. Do Brackets and or Braces in a URL string impact SEO? (I know short human readable etc... but for the sake of conversation has anyone relaised any impacts of these particular Characters in a URL?

                Intermediate & Advanced SEO | Oct 2, 2011, 8:46 PM | AU-SEO
                0

              Get started with Moz Pro!

              Unlock the power of advanced SEO tools and data-driven insights.

              Start my free trial
              Products
              • Moz Pro
              • Moz Local
              • Moz API
              • Moz Data
              • STAT
              • Product Updates
              Moz Solutions
              • SMB Solutions
              • Agency Solutions
              • Enterprise Solutions
              Free SEO Tools
              • Domain Authority Checker
              • Link Explorer
              • Keyword Explorer
              • Competitive Research
              • Brand Authority Checker
              • Local Citation Checker
              • MozBar Extension
              • MozCast
              Resources
              • Blog
              • SEO Learning Center
              • Help Hub
              • Beginner's Guide to SEO
              • How-to Guides
              • Moz Academy
              • API Docs
              About Moz
              • About
              • Team
              • Careers
              • Contact
              Why Moz
              • Case Studies
              • Testimonials
              Get Involved
              • Become an Affiliate
              • MozCon
              • Webinars
              • Practical Marketer Series
              • MozPod
              Connect with us

              Contact the Help team

              Join our newsletter
              Moz logo
              © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
              • Accessibility
              • Terms of Use
              • Privacy

              Looks like your connection to Moz was lost, please wait while we try to reconnect.