Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Substantial difference between Number of Indexed Pages and Sitemap Pages
-
Hey there,
I am doing a website audit at the moment.
I've notices substantial differences in the number of pages indexed (search console), the number of pages in the sitemap and the number I am getting when I crawl the page with screamingfrog (see below). Would those discrepancies concern you? The website and its rankings seems fine otherwise.
Total indexed: 2,360 (Search Consule)
About 2,920 results (Google search "site:example.com")
Sitemap: 1,229 URLs
Screemingfrog Spider: 1,352 URLsCheers,
Jochen -
Those discrepancies would not concern me, but there are some differences between all the things you list:
Total indexed: 2,360 Search Console - this is likely a reasonably accurate list of the number of pages you have indexed in Google. You could use a tool like URL Profiler to check index status of specific URLs.
About 2,920 results Google search "site:example.com" - site: search is less accurate and will likely return a different number each time you do it, even if it's just moments apart.
Sitemap: 1,229 URLs: these are URLs you added to a sitemap because they are priority pages you want to make sure Google has indexed and hopefully ranked. You control this number.
Screaming Frog Spider: 1,352 URLs - Screaming Frog is going to start on your homepage and crawl the site attempting to discover as many URLs as possible. If you are not linking to a page, SF won't be able to crawl it. Google on the other hand may have old pages, old URL structures or pages that were linked from an external website in their index and they won't forget them.
A really important question is: how many pages do you have that you want to be indexed? Is Google's index bloated with pages that you want to keep out? Figure these things out, and then try to adjust your sitemaps, noindex, robots.txt as needed.
-
Thanks for your reply Dmitrii,
we have excluded all query parameters in search console so this shouldn't be an issue. What is also strange is that when I try to scrape the SERPS via a site:example.com search Google is only showing a fraction (about 700) of the 2,920 results.
Cheers,
Jochen
- ★
- ★
- ☆
- ☆
- ☆
MozPoints: 810
Good Answers: 47
Endorsed Answers: 20">- ★
- ★
- ☆
- ☆
- ☆
-
Hi there.
I think that as long as rankings are good (especially historically), there is no reason to worry, because google includes in index pages, which wouldn't be in sitemap - for example pages, generated with query parameters (domain.com?x=value). Sometimes these pages do not really exist by themselves (like filters in online stores), they only exist "on the fly".
Hope this makes sense and helps
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Would You Redirect a Page if the Parent Page was Redirected?
Hi everyone! Let's use this as an example URL: https://www.example.com/marvel/avengers/hulk/ We have done a 301 redirect for the "Avengers" page to another page on the site. Sibling pages of the "Hulk" page live off "marvel" now (ex: /marvel/thor/ and /marvel/iron-man/). Is there any benefit in doing a 301 for the "Hulk" page to live at /marvel/hulk/ like it's sibling pages? Is there any harm long-term in leaving the "Hulk" page under a permanently redirected page? Thank you! Matt
Intermediate & Advanced SEO | | amag0 -
My product category pages are not being indexed on google can someone help?
My website has been indexed on google and all of its pages can be found on google except for the product category pages - which are where we want our traffic heading to, so this is a big problem for us. Our website is www.skirtinguk.com And an example of a page that isn't being indexed is https://www.skirtinguk.com/product-category/mdf-skirting-board/
Intermediate & Advanced SEO | | chelseaskirtinguk0 -
E-Commerce Site Collection Pages Not Being Indexed
Hello Everyone, So this is not really my strong suit but I’m going to do my best to explain the full scope of the issue and really hope someone has any insight. We have an e-commerce client (can't really share the domain) that uses Shopify; they have a large number of products categorized by Collections. The issue is when we do a site:search of our Collection Pages (site:Domain.com/Collections/) they don’t seem to be indexed. Also, not sure if it’s relevant but we also recently did an over-hall of our design. Because we haven’t been able to identify the issue here’s everything we know/have done so far: Moz Crawl Check and the Collection Pages came up. Checked Organic Landing Page Analytics (source/medium: Google) and the pages are getting traffic. Submitted the pages to Google Search Console. The URLs are listed on the sitemap.xml but when we tried to submit the Collections sitemap.xml to Google Search Console 99 were submitted but nothing came back as being indexed (like our other pages and products). We tested the URL in GSC’s robots.txt tester and it came up as being “allowed” but just in case below is the language used in our robots:
Intermediate & Advanced SEO | | Ben-R
User-agent: *
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /9545580/checkouts
Disallow: /carts
Disallow: /account
Disallow: /collections/+
Disallow: /collections/%2B
Disallow: /collections/%2b
Disallow: /blogs/+
Disallow: /blogs/%2B
Disallow: /blogs/%2b
Disallow: /design_theme_id
Disallow: /preview_theme_id
Disallow: /preview_script_id
Disallow: /apple-app-site-association
Sitemap: https://domain.com/sitemap.xml A Google Cache:Search currently shows a collections/all page we have up that lists all of our products. Please let us know if there’s any other details we could provide that might help. Any insight or suggestions would be very much appreciated. Looking forward to hearing all of your thoughts! Thank you in advance. Best,0 -
My blog is indexing only the archive and category pages
Hi there MOZ community. I am new to the QandA and have a question. I have a blog Its been live for months - but I can not get the posts to rank in the serps. Oddly only the categories rank. The posts are crawled it seems - but seen as less important for a reason I don't understand. Can anyone here help with this? See here for what i mean. I have had several wp sites rank well in the serps - and the posts do much better. Than the categories or archives - super odd. Thanks to all for help!
Intermediate & Advanced SEO | | walletapp0 -
Submitting URLs multiple times in different sitemaps
We have a very dynamic site, with a large number of pages. We use a sitemap index file, that points to several smaller sitemap files. The question is: Would there be any issue if we include the same URL in multiple sitemap files? Scenario: URL1 appears on sitemap1. 2 weeks later, the page at URL1 changes and we'd like to update it on a sitemap. Would it be acceptable to add URL1 as an entry in sitemap2? Would there be any issues with the same URL appearing multiple times? Thanks.
Intermediate & Advanced SEO | | msquare0 -
301 - should I redirect entire domain or page for page?
Hi, We recently enabled a 301 on our domain from our old website to our new website. On the advice of fellow mozzer's we copied the old site exactly to the new domain, then did the 301 so that the sites are identical. Question is, should we be doing the 301 as a whole domain redirect, i.e. www.oldsite.com is now > www.newsite.com, or individually setting each page, i.e. www.oldsite.com/page1 is now www.newsite.com/page1 etc for each page in our site? Remembering that both old and new sites (for now) are identical copies. Also we set the 301 about 5 days ago and have verified its working but haven't seen a single change in rank either from the old site or new - is this because Google hasn't likely re-indexed yet? Thanks, Anthony
Intermediate & Advanced SEO | | Grenadi0 -
Should I Allow Blog Tag Pages to be Indexed?
I have a wordpress blog with settings currently set so that Google does not index tag pages. Is this a best practice that avoids duplicate content or am I hurting the site by taking eligible pages out of the index?
Intermediate & Advanced SEO | | JSOC0 -
How to resolve Duplicate Page Content issue for root domain & index.html?
SEOMoz returns a Duplicate Page Content error for a website's index page, with both domain.com and domain.com/index.html isted seperately. We had a rewrite in the htacess file, but for some reason this has not had an impact and we have since removed it. What's the best way (in an HTML website) to ensure all index.html links are automatically redirected to the root domain and these aren't seen as two separate pages?
Intermediate & Advanced SEO | | ContentWriterMicky0