Discrepency between # of pages and # of pages indexed
-
Here is some background:
- The site in question has approximately 10,000 pages and Google Webmaster shows that 10,000 urls(pages were submitted)
2) Only 5,500 pages appear in the Google index
3) Webmaster shows that approximately 200 pages could not be crawled for various reasons
4) SEOMOZ shows about 1,000 pages that have long URL's or Page Titles (which we are correcting)
5) No other errors are being reported in either Webmaster or SEO MOZ
6) This is a new site launched six weeks ago. Within two weeks of launching, Google had indexed all 10,000 pages and showed 9,800 in the index but over the last few weeks, the number of pages in the index kept dropping until it reached 5,500 where it has been stable for two weeks.
Any ideas of what the issue might be? Also, is there a way to download all of the pages that are being included in that index as this might help troubleshoot?
-
It's not exactly 3 clicks... if you're a PR 10 website it will take you quite a few clicks in before it gets "tired". Deep links are always a great idea.
-
I have also heard 3 clicks from a page with link juice. So if you have deep links to a page it can help carry pages deeper in. Do you agree?
-
Thank you to all for your advice. Good suggestions.
-
We do have different types of pages but Google is indexing all category pages but not all individual content pages. Based on the replies I have received, I suspect the issue can be helped by flattening the site architecture and links.
As an FYI, the site is a health care content site so no products are sold on the site. Revenue is from ads.
-
Great tip. I have seen this happen too (e.g. forum, blog, archive and content part of the website not indexed equally).
-
Do you have areas of your site that are distinctively different in type, such as category pages and individual item pages, or individual item pages and user submitted content?
What I'm getting at is trying to find if there's a certain type of page that Google isn't indexing. If you have distinct types of pages, you can create separate site maps (one for each type of content) and see if one type of content is being indexed better than another. It's more of a diagnostics tool that a solution, but I've found it helpful for sites of that size and larger in the past.
As other people have said, it's also a new site, so the lack of links could be hindering things as well.
-
Agreed!
-
Oh yes, Google is very big on balancing and allocation of resources. I don't think 10,000 will present a problem though as this number may be too common on ecommerce and content websites.
-
Very good advice in the replies. Everyone seems to have forgotten PageRank though. In Google's random surfer model it is assumed user will at some point abandon the website (after PageRank has been exhausted). This means if your site lacks raw link juice it may not have enough to go around through the whole site structure and it leaves some pages dry and unindexed. What can help is: Already mentioned flatter site architecture and unique content, but also direct links to pages not in index (including via social media) and more and stronger links towards home page which should ideally cascade down to the rest.
-
If you don't have many links to your site yet, I think that could reduce the number of pages that Google keeps in its main index. Google may allocate less resources to crawling your site if you have very little link juice, especially if deep pages on your site have no link juice coming in to them.
Another possibility is if some of the 10,000 pages are not unique content or duplicate content. Google could send a lot of your pages to its supplemental index if this is the case.
-
If you flatten out your site architecture a bit to where all pages are no more then 3 clicks deep, and provide a better HTML sitemap you will definitely see more pages indexed. It wont be all 10k, but it will be an improvement.
-
I appreciate the reply. The HTML site map does not show all 10,000 pages and some pages are likely more than 3 deep. I will try this and see what happens.
-
Google will not index your entire 10k page site just because you submitted the links in a site map. They will crawl your site and index many pages, but most likely you will never have your entire site indexed.
Cleaning up your crawl errors will help in getting your content indexed. A few other things you can do are:
-
provide a HTML sitemap on your website
-
ensure your site navigation is solid ( i.e. all pages are reachable, no island pages, the navigation can be seen in HMTL, etc)
-
ensure you do not have deep content. Google will often only go about 3 clicks deep. If you have buried content, it won't be indexed unless it is well linked.
-
if there are any particular pages you want to get indexed, you can link to them from your home page, or ask others to link to those pages from external sites.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Unsolved Why My site pages getting video index viewport issue?
Hello, I have been publishing a good number of blogs on my site Flooring Flow. Though, there's been an error of the video viewport on some of my articles. I have tried fixing it but the error is still showing in Google Search Console. Can anyone help me fix it out?
Technical SEO | | mitty270 -
Pages not indexable?
Hello, I've been trying to find out why Google Search Console finds these pages non-indexable: https://www.visitflorida.com/en-us/eat-drink.html https://www.visitflorida.com/en-us/florida-beaches/beach-finder.html Moz and SEMrush both crawl the pages and show no errors but GSC comes back with, "blocked by robots.txt" but I've confirmed it is not. Anyone have any thoughts? 6AYn1TL
Technical SEO | | KenSchaefer0 -
My pages are being crawled, but not indexed according to Search Console
According to Google Search Console, my pages are being crawled by not indexed. We use Shopify and about two weeks ago I selected that Traffic from all our domains redirects to our primary domain. So everything from www.url.com and https://url.com and so on, would all redirect to one url. Have added an attached image from Search Console. 6fzEQg8
Technical SEO | | HariOmHemp0 -
Why are my images not being indexed?
I have submitted an image sitemap with over 2,000 images yet only about 35 have been indexed. Could you please help me understand why Google is not indexing my images? www.creative-calendars.com
Technical SEO | | nicole20140 -
"One Page With Two Links To Same Page; We Counted The First Link" Is this true?
I read this to day http://searchengineland.com/googles-matt-cutts-one-page-two-links-page-counted-first-link-192718 I thought to myself, yep, thats what I been reading in Moz for years ( pitty Matt could not confirm that still the case for 2014) But reading though the comments Michael Martinez of http://www.seo-theory.com/ pointed out that Mat says "...the last time I checked, was 2009, and back then -- uh, we might, for example, only have selected one of the links from a given page."
Technical SEO | | PaddyDisplays
Which would imply that is does not not mean it always the first link. Michael goes on to say "Back in 2008 when Rand WRONGLY claimed that Google was only counting the first link (I shared results of a test where it passed anchor text from TWO links on the same page)" then goes on to say " In practice the search engine sometimes skipped over links and took anchor text from a second or third link down the page." For me this is significant. I know people that have had "SEO experts" recommend that they should have a blog attached to there e-commence site and post blog posts (with no real interest for readers) with anchor text links to you landing pages. I thought that posting blog post just for anchor text link was a waste of time if you are already linking to the landing page with in a main navigation as google would see that link first. But if Michael is correct then these type of blog posts anchor text link blog posts would have value But who is' right Rand or Michael?0 -
Pages to be indexed in Google
Hi, We have 70K posts in our site but Google has scanned 500K pages and these extra pages are category pages or User profile pages. Each category has a page and each user has a page. When we have 90K users so Google has indexed 90K pages of users alone. My question is. Should we leave it as they are or should we block them from being indexed? As we get unwanted landings to the pages and huge bounce rate. If we need to remove what needs to be done? Robots block or Noindex/Nofollow Regards
Technical SEO | | mtthompsons0 -
41.000 pages indexed two years after it was redirected to a new domain
Hi!Two years ago, we changed the domain elmundodportivo.es to mundodeportivo.com. Apparently, everything was OK, but more than two years later, there are still 41.000 pages indexed in Google (https://www.google.com/search?q=site%3Aelmundodeportivo.es) even though all the domains have been redirected with a 301 redirect. I detected some problems with redirections that were 303 instead of 301, but we fixed that one month ago.A secondary problem is that the pagerank for elmundodportivo.es is 7 yet and mundodeportivo.com is 3.What I'm doing wrong?Thank you all,Oriol
Technical SEO | | MundoDeportivo0 -
Home page indexed but not ranking...interior pages with thin content outrank home page??
I have a Joomla site with a home page that I can't get to rank for anything beyond the company name @ Google - the site works fine @ Bing and Yahoo. The interior pages will rank all day long but the home page never shows up in the results. I have checked the page code out in every tool that I know about and have had no luck....by all account it should be good to go...any thoughts/comments/help would be greatly appreciated. The site is http://www.selectivedesigns.com Thanks! Greg
Technical SEO | | DougHosmer0