Discrepency between # of pages and # of pages indexed

Mont

Here is some background:

The site in question has approximately 10,000 pages and Google Webmaster shows that 10,000 urls(pages were submitted)

2) Only 5,500 pages appear in the Google index

3) Webmaster shows that approximately 200 pages could not be crawled for various reasons

4) SEOMOZ shows about 1,000 pages that have long URL's or Page Titles (which we are correcting)

5) No other errors are being reported in either Webmaster or SEO MOZ

6) This is a new site launched six weeks ago. Within two weeks of launching, Google had indexed all 10,000 pages and showed 9,800 in the index but over the last few weeks, the number of pages in the index kept dropping until it reached 5,500 where it has been stable for two weeks.

Any ideas of what the issue might be? Also, is there a way to download all of the pages that are being included in that index as this might help troubleshoot?

Dan-Petrovic

It's not exactly 3 clicks... if you're a PR 10 website it will take you quite a few clicks in before it gets "tired". Deep links are always a great idea.

Thos003

I have also heard 3 clicks from a page with link juice. So if you have deep links to a page it can help carry pages deeper in. Do you agree?

Mont

Thank you to all for your advice. Good suggestions.

Mont

We do have different types of pages but Google is indexing all category pages but not all individual content pages. Based on the replies I have received, I suspect the issue can be helped by flattening the site architecture and links.

As an FYI, the site is a health care content site so no products are sold on the site. Revenue is from ads.

Dan-Petrovic

Great tip. I have seen this happen too (e.g. forum, blog, archive and content part of the website not indexed equally).

KeriMorgret

Do you have areas of your site that are distinctively different in type, such as category pages and individual item pages, or individual item pages and user submitted content?

What I'm getting at is trying to find if there's a certain type of page that Google isn't indexing. If you have distinct types of pages, you can create separate site maps (one for each type of content) and see if one type of content is being indexed better than another. It's more of a diagnostics tool that a solution, but I've found it helpful for sites of that size and larger in the past.

As other people have said, it's also a new site, so the lack of links could be hindering things as well.

Dan-Petrovic

Agreed!

Dan-Petrovic

Oh yes, Google is very big on balancing and allocation of resources. I don't think 10,000 will present a problem though as this number may be too common on ecommerce and content websites.

Dan-Petrovic

Very good advice in the replies. Everyone seems to have forgotten PageRank though. In Google's random surfer model it is assumed user will at some point abandon the website (after PageRank has been exhausted). This means if your site lacks raw link juice it may not have enough to go around through the whole site structure and it leaves some pages dry and unindexed. What can help is: Already mentioned flatter site architecture and unique content, but also direct links to pages not in index (including via social media) and more and stronger links towards home page which should ideally cascade down to the rest.

SparkplugDigital

If you don't have many links to your site yet, I think that could reduce the number of pages that Google keeps in its main index. Google may allocate less resources to crawling your site if you have very little link juice, especially if deep pages on your site have no link juice coming in to them.

Another possibility is if some of the 10,000 pages are not unique content or duplicate content. Google could send a lot of your pages to its supplemental index if this is the case.

RyanKent

If you flatten out your site architecture a bit to where all pages are no more then 3 clicks deep, and provide a better HTML sitemap you will definitely see more pages indexed. It wont be all 10k, but it will be an improvement.

Mont

I appreciate the reply. The HTML site map does not show all 10,000 pages and some pages are likely more than 3 deep. I will try this and see what happens.

RyanKent

Google will not index your entire 10k page site just because you submitted the links in a site map. They will crawl your site and index many pages, but most likely you will never have your entire site indexed.

Cleaning up your crawl errors will help in getting your content indexed. A few other things you can do are:

provide a HTML sitemap on your website
ensure your site navigation is solid ( i.e. all pages are reachable, no island pages, the navigation can be seen in HMTL, etc)
ensure you do not have deep content. Google will often only go about 3 clicks deep. If you have buried content, it won't be indexed unless it is well linked.
if there are any particular pages you want to get indexed, you can link to them from your home page, or ask others to link to those pages from external sites.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Discrepency between # of pages and # of pages indexed

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Use Internal Search pages as Landing Pages?

Purpose of static index.html pages?

Pages are Indexed but not Cached by Google. Why?

Why would Google not index all submitted pages?

404 Errors for Form Generated Pages - No index, no follow or 301 redirect

How to block text on a page to be indexed?

Pages with Duplicate Page Content Crawl Diagnostics

SEOMoz Crawl Diagnostic indicates duplicate page content for home page?