undefined
Skip to content
Moz logo Menu open Menu close
  • Products
    • Moz Pro
    • Moz Pro Home
    • Moz Local
    • Moz Local Home
    • STAT
    • Moz API
    • Moz API Home
    • Compare SEO Products
    • Moz Data
  • Free SEO Tools
    • Domain Analysis
    • Keyword Explorer
    • Link Explorer
    • Competitive Research
    • MozBar
    • More Free SEO Tools
  • Learn SEO
    • Beginner's Guide to SEO
    • SEO Learning Center
    • Moz Academy
    • SEO Q&A
    • Webinars, Whitepapers, & Guides
  • Blog
  • Why Moz
    • Agency Solutions
    • Enterprise Solutions
    • Small Business Solutions
    • Case Studies
    • The Moz Story
    • New Releases
  • Log in
  • Log out
  • Products
    • Moz Pro

      Your all-in-one suite of SEO essentials.

    • Moz Local

      Raise your local SEO visibility with complete local SEO management.

    • STAT

      SERP tracking and analytics for enterprise SEO experts.

    • Moz API

      Power your SEO with our index of over 44 trillion links.

    • Compare SEO Products

      See which Moz SEO solution best meets your business needs.

    • Moz Data

      Power your SEO strategy & AI models with custom data solutions.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Free SEO Tools
    • Domain Analysis

      Get top competitive SEO metrics like DA, top pages and more.

    • Keyword Explorer

      Find traffic-driving keywords with our 1.25 billion+ keyword index.

    • Link Explorer

      Explore over 40 trillion links for powerful backlink data.

    • Competitive Research

      Uncover valuable insights on your organic search competitors.

    • MozBar

      See top SEO metrics for free as you browse the web.

    • More Free SEO Tools

      Explore all the free SEO tools Moz has to offer.

    NEW Keyword Suggestions by Topic
    Moz Pro

    NEW Keyword Suggestions by Topic

    Learn more
  • Learn SEO
    • Beginner's Guide to SEO

      The #1 most popular introduction to SEO, trusted by millions.

    • SEO Learning Center

      Broaden your knowledge with SEO resources for all skill levels.

    • On-Demand Webinars

      Learn modern SEO best practices from industry experts.

    • How-To Guides

      Step-by-step guides to search success from the authority on SEO.

    • Moz Academy

      Upskill and get certified with on-demand courses & certifications.

    • MozCon

      Save on Early Bird tickets and join us in London or New York City

    Unlock flexible pricing & new endpoints
    Moz API

    Unlock flexible pricing & new endpoints

    Find your plan
  • Blog
  • Why Moz
    • Small Business Solutions

      Uncover insights to make smarter marketing decisions in less time.

    • Agency Solutions

      Earn & keep valuable clients with unparalleled data & insights.

    • Enterprise Solutions

      Gain a competitive edge in the ever-changing world of search.

    • The Moz Story

      Moz was the first & remains the most trusted SEO company.

    • Case Studies

      Explore how Moz drives ROI with a proven track record of success.

    • New Releases

      Get the scoop on the latest and greatest from Moz.

    Surface actionable competitive intel
    New Feature

    Surface actionable competitive intel

    Learn More
  • Log in
    • Moz Pro
    • Moz Local
    • Moz Local Dashboard
    • Moz API
    • Moz API Dashboard
    • Moz Academy
  • Avatar
    • Moz Home
    • Notifications
    • Account & Billing
    • Manage Users
    • Community Profile
    • My Q&A
    • My Videos
    • Log Out

The Moz Q&A Forum

  • Forum
  • Questions
  • Users
  • Ask the Community

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

  1. Home
  2. SEO Tactics
  3. Intermediate & Advanced SEO
  4. Crawled page count in Search console

Moz Q&A is closed.

After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.

Crawled page count in Search console

Intermediate & Advanced SEO
2
9
2.5k
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as question
Log in to reply
This topic has been deleted. Only users with question management privileges can see it.
  • Bob_van_Biezen
    Bob_van_Biezen last edited by Mar 7, 2016, 10:49 AM

    Hi Guys,

    I'm working on a project (premium-hookahs.nl) where I stumble upon a situation I can’t address. Attached is a screenshot of the crawled pages in Search Console.

    History:

    Doing to technical difficulties this webshop didn’t always no index filterpages resulting in thousands of duplicated pages. In reality this webshops has less than 1000 individual pages. At this point we took the following steps to result this:

    1. Noindex filterpages.
    2. Exclude those filterspages in Search Console and robots.txt.
    3. Canonical the filterpages to the relevant categoriepages.

    This however didn’t result in Google crawling less pages. Although the implementation wasn’t always sound (technical problems during updates) I’m sure this setup has been the same for the last two weeks. Personally I expected a drop of crawled pages but they are still sky high. Can’t imagine Google visits this site 40 times a day.

    To complicate the situation:

    We’re running an experiment to gain positions on around 250 long term searches. A few filters will be indexed (size, color, number of hoses and flavors) and three of them can be combined. This results in around 250 extra pages. Meta titles, descriptions, h1 and texts are unique as well.

    Questions:

    1. -          Excluding in robots.txt should result in Google not crawling those pages right?
    2. -          Is this number of crawled pages normal for a website with around 1000 unique pages?
    3. -          What am I missing?

    BxlESTT

    1 Reply Last reply Reply Quote 0
    • donford
      donford @Bob_van_Biezen last edited by Mar 8, 2016, 11:23 AM Mar 8, 2016, 11:23 AM

      Ben,

      I doubt that crawlers are going to access the robots.txt file for each request, but they still have to validate any url they find against the list of the blocked ones.

      Glad to help,

      Don

      1 Reply Last reply Reply Quote 1
      • Bob_van_Biezen
        Bob_van_Biezen @donford last edited by Mar 8, 2016, 11:18 AM Mar 8, 2016, 11:18 AM

        Hi Don,

        Thanks for the clear explanation. I always though disallow in robots.txt would give a sort of map to Google (at the start of a site crawl) with the pages on the site that shouldn’t be crawled. So he therefore didn’t have to “check the locked cars”.

        If I understand you correctly, google checks the robots.txt with every single page load?

        That could definitely explain high number of crawled pages per day.

        Thanks a lot!

        donford 1 Reply Last reply Mar 8, 2016, 11:23 AM Reply Quote 0
        • donford
          donford @Bob_van_Biezen last edited by Mar 8, 2016, 11:35 AM Mar 8, 2016, 10:52 AM

          Hi Bob,

          About the nofollow vs blocked. In the end I suppose you have the same results, but in practice it works a little differently. When you nofollow a link it tells the crawler as soon as it encounters the link not to request or follow that link path. When you block it via robots the crawler still attempts to access the url only to find it not accessible.

          Imagine if I said go to the parking lot and collect all the loose change in all the unlocked cars. Now imagine how much easier that task would be if all the locked cars had a sign in the window that said "Locked", you could easily ignore the locked cars and go directly to the unlocked ones. Without the sign you would have to physically go check each car to see if it will open.

          About link juice, if you have a link, juice will be passed regardless of the type of link. (You used to be able to use nofollow to preserve link juice but no longer). This is bit unfortunate for sites that use search filters because they are such a valuable tool for the users.

          Don

          Bob_van_Biezen 1 Reply Last reply Mar 8, 2016, 11:18 AM Reply Quote 1
          • Bob_van_Biezen
            Bob_van_Biezen @donford last edited by Mar 8, 2016, 10:30 AM Mar 8, 2016, 10:30 AM

            Hi Don,

            You're right about the sitemap, noted it on the to do list!

            Your point about nofollow is intersting. Isn't excluding in robots.txt giving the same result?

            Before we went on with the robots.txt we didn't implant nofollow because we didn't want any linkjuice to pass away. Since we got robots.txt I assume this doesn’t matter anymore since Google won’t crawl those pages anyway.

            Best regards,

            Bob

            donford 1 Reply Last reply Mar 8, 2016, 10:52 AM Reply Quote 0
            • donford
              donford last edited by Mar 8, 2016, 9:56 AM Mar 8, 2016, 9:56 AM

              Hi Bob,

              You can "suggest" a crawl rate to Google by logging into your webmasters tools on Google and adjusting it there.

              As for indexing pages.. I looked at your robots and site. It really looks like you need to employ some No Follow on some of your internal linking, specifically on the product page filters, that alone could reduce the total number of URLS that the crawlers even attempts to look at.

              Additionally your sitemap http://premium-hookahs.nl/sitemap.xml shows a change frequency of daily, and probably should be broken out between Pages / Images so you end up using two sitemaps one for images and one for pages. You may also want to review what is in there. Using ScreamingFrog (free) the sitemap I made (link) only shows about 100 urls.

              Hope it helps,

              Don

              Bob_van_Biezen 1 Reply Last reply Mar 8, 2016, 10:30 AM Reply Quote 1
              • Bob_van_Biezen
                Bob_van_Biezen @donford last edited by Mar 8, 2016, 9:05 AM Mar 8, 2016, 9:05 AM

                Hi Don,

                Just wanted to add a quick note: your input made go through the indexation state of the website again which was worse than I through it was. I will take some steps to get this resolved, thanks!

                Would love to hear your input about the number of crawled pages.

                Best regards,

                Bob

                1 Reply Last reply Reply Quote 0
                • Bob_van_Biezen
                  Bob_van_Biezen @donford last edited by Mar 8, 2016, 7:25 AM Mar 8, 2016, 7:25 AM

                  Hello Don,

                  Thanks for your advice. What would your advice be if the main goal would be the reduction of crawled pages per day? I think we got the right pages in the index and the old duplicates are mostly deindexed. At this point I’m mostly worried about Google spending it’s crawlbudget on the right pages. Somehow it still crawls 40.000 pages per day while we only got around 1000 pages that should be crawled. Looking at the current setup (with almost everything excluded though robots.txt) I can’t think of pages it does crawl to reach the 40k. And 40 times a day sounds like way to many crawled pages for a normal webshop.

                  Hope to hear from you!

                  1 Reply Last reply Reply Quote 0
                  • donford
                    donford last edited by Mar 7, 2016, 4:02 PM Mar 7, 2016, 4:02 PM

                    Hello Bob,

                    Here is some food for thought. If you disallow a page in Robots.txt, google for example will not crawl that page. That does not however mean they will remove it from the index if it had previously been crawled. It simply treats it as inaccessible and moves on. It will take some time, months before Google finally says, we have no fresh crawls of page x, its time to remove it from the index.

                    On the other hand if you specifically allow Google to crawl those pages and show a no-index tag on it, Google now has a new directive it can act upon immediately.

                    So my evaluation of the situation would be to do 1 of 2 things.

                    1. Remove the disallow from robots and allow Google to crawl the pages again. However, this time use no-index, no-follow tags.

                    2. Remove the disallow from robots and allow Google to crawl the pages again, but use canonical tags to the main "filter" page to prevent further indexing the specific filter pages.

                    Which option is best depends on the amount of urls being indexed, a few thousand canonical would be my choice. A few hundred thousand, then no index would make more sense.

                    Whichever option, you will have to insure Google re-crawls, and then allow them time to re-index appropriately. Not a quick fix, but a fix none the less.

                    My thoughts and I hope it makes sense,

                    Don

                    Bob_van_Biezen 2 Replies Last reply Mar 8, 2016, 9:05 AM Reply Quote 1
                    • 1 / 1
                    1 out of 9
                    • First post
                      1/9
                      Last post

                    Got a burning SEO question?

                    Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.


                    Start my free trial


                    Browse Questions

                    Explore more categories

                    • Moz Tools

                      Chat with the community about the Moz tools.

                    • SEO Tactics

                      Discuss the SEO process with fellow marketers

                    • Community

                      Discuss industry events, jobs, and news!

                    • Digital Marketing

                      Chat about tactics outside of SEO

                    • Research & Trends

                      Dive into research and trends in the search industry.

                    • Support

                      Connect on product support and feature requests.

                    • See all categories

                    Related Questions

                    • neverenoughmusic.com

                      My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages.

                      Hello, My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages.  I have contacted my theme company but not sure what could have done this.  Any ideas? The original posts/pages are still correct and working it just looks like it did duplicates and added void(0 to the end of each post/page. Questions: There is no way to undo this correct? Do I have to do a redirect on each of these? Will this hurt my rankings and domain authority? Any suggestions would be appreciated. Thanks, Wade

                      Intermediate & Advanced SEO | Apr 2, 2020, 10:19 PM | neverenoughmusic.com
                      0
                    • JaredBroussard

                      Google Image Search - Is there a way to influence the related icons at the top of the image search results?

                      Google recently added related icons at the top of the image search results page. Some of the icons may be unrelated to the search. Are there any best practices to influence what is positioned in the related image icons section?  Thank you.

                      Intermediate & Advanced SEO | Jul 5, 2019, 12:51 PM | JaredBroussard
                      1
                    • jeremyskillings

                      Website Snippet Update in Search Console?

                      I have a company that I started working with that has an outdated and inaccurate snippet coming up.  See the link below. They changed their name from DK on Pittsburgh Sports to just DK Pittsburgh Sports several years ago, but the snippet is still putting the old info, including outdated and incorrect description. I'm not seeing that title or description anywhere on the site or a schema plugin.  How can we get it updated?  I have updated titles, etc. for the home page, and done a Fetch to get re-indexed.  Does Snippet have a different type of refresh that I can submit or edit? Thanks in advance https://g.co/kgs/qZAnAC

                      Intermediate & Advanced SEO | Nov 17, 2017, 7:34 PM | jeremyskillings
                      0
                    • MBASydney

                      Date of page first indexed or age of a page?

                      Hi does anyone know any ways, tools to find when a page was first indexed/cached by Google? I remember a while back, around 2009 i had a firefox plugin which could check this, and gave you a exact date. Maybe this has changed since. I don't remember the plugin. Or any recommendations on finding the age of a page (not domain) for a website? This is for competitor research not my own website. Cheers, Paul

                      Intermediate & Advanced SEO | Aug 19, 2014, 10:24 AM | MBASydney
                      0
                    • WizardOfMoz

                      Redirect Search Results to Category Pages

                      I am planning redirect the search results to it's matching category page to avoid having two indexed pages of essentially the same content. Example http://www.example.com/search/?kw=sunglasses
                      wil be redirected to
                      http://www.example.com/category/sunglasses/ Is this a good idea? What are the possible negative effect if I go this route? Thanks.

                      Intermediate & Advanced SEO | Jun 9, 2014, 11:56 AM | WizardOfMoz
                      0
                    • Sika22

                      PDF or HTML Page?

                      One of our sales team members has created a 25 page word document as a topical page.  The plan was to make this into an html page with a table of contents.  My thoughts were why not make it a pdf?  Is there any con to using a PDF vs an html page?  If the PDF was properly optimized would it perform just as well?  The goal is to have folks click back to our products and hopefully by after reading about how they work.

                      Intermediate & Advanced SEO | Apr 18, 2014, 3:16 PM | Sika22
                      0
                    • Peter264

                      NOINDEX listing pages: Page 2, Page 3... etc?

                      Would it be beneficial to NOINDEX category listing pages except for the first page.  For example on this site: http://flyawaysimulation.com/downloads/101/fsx-missions/ Has lots of pages such as Page 2, Page 3, Page 4... etc: http://www.google.com/search?q=site%3Aflyawaysimulation.com+fsx+missions Would there be any SEO benefit of NOINDEX on these pages?  Of course, FOLLOW is default, so links would still be followed and juice applied. Your thoughts and suggestions are much appreciated.

                      Intermediate & Advanced SEO | Dec 6, 2011, 3:18 PM | Peter264
                      0
                    • EricPacifico

                      Should the sitemap include just menu pages or all pages site wide?

                      I have a Drupal site that utilizes Solr, with 10 menu pages and about 4,000 pages of content. Redoing a few things and we'll need to revamp the sitemap. Typically I'd jam all pages into a single sitemap and that's it, but post-Panda, should I do anything different?

                      Intermediate & Advanced SEO | Jul 14, 2011, 3:44 AM | EricPacifico
                      0

                    Get started with Moz Pro!

                    Unlock the power of advanced SEO tools and data-driven insights.

                    Start my free trial
                    Products
                    • Moz Pro
                    • Moz Local
                    • Moz API
                    • Moz Data
                    • STAT
                    • Product Updates
                    Moz Solutions
                    • SMB Solutions
                    • Agency Solutions
                    • Enterprise Solutions
                    Free SEO Tools
                    • Domain Authority Checker
                    • Link Explorer
                    • Keyword Explorer
                    • Competitive Research
                    • Brand Authority Checker
                    • Local Citation Checker
                    • MozBar Extension
                    • MozCast
                    Resources
                    • Blog
                    • SEO Learning Center
                    • Help Hub
                    • Beginner's Guide to SEO
                    • How-to Guides
                    • Moz Academy
                    • API Docs
                    About Moz
                    • About
                    • Team
                    • Careers
                    • Contact
                    Why Moz
                    • Case Studies
                    • Testimonials
                    Get Involved
                    • Become an Affiliate
                    • MozCon
                    • Webinars
                    • Practical Marketer Series
                    • MozPod
                    Connect with us

                    Contact the Help team

                    Join our newsletter
                    Moz logo
                    © 2021 - 2025 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                    • Accessibility
                    • Terms of Use
                    • Privacy

                    Looks like your connection to Moz was lost, please wait while we try to reconnect.