Discrepency between # of pages and # of pages indexed
-
Here is some background:
- The site in question has approximately 10,000 pages and Google Webmaster shows that 10,000 urls(pages were submitted)
2) Only 5,500 pages appear in the Google index
3) Webmaster shows that approximately 200 pages could not be crawled for various reasons
4) SEOMOZ shows about 1,000 pages that have long URL's or Page Titles (which we are correcting)
5) No other errors are being reported in either Webmaster or SEO MOZ
6) This is a new site launched six weeks ago. Within two weeks of launching, Google had indexed all 10,000 pages and showed 9,800 in the index but over the last few weeks, the number of pages in the index kept dropping until it reached 5,500 where it has been stable for two weeks.
Any ideas of what the issue might be? Also, is there a way to download all of the pages that are being included in that index as this might help troubleshoot?
-
It's not exactly 3 clicks... if you're a PR 10 website it will take you quite a few clicks in before it gets "tired". Deep links are always a great idea.
-
I have also heard 3 clicks from a page with link juice. So if you have deep links to a page it can help carry pages deeper in. Do you agree?
-
Thank you to all for your advice. Good suggestions.
-
We do have different types of pages but Google is indexing all category pages but not all individual content pages. Based on the replies I have received, I suspect the issue can be helped by flattening the site architecture and links.
As an FYI, the site is a health care content site so no products are sold on the site. Revenue is from ads.
-
Great tip. I have seen this happen too (e.g. forum, blog, archive and content part of the website not indexed equally).
-
Do you have areas of your site that are distinctively different in type, such as category pages and individual item pages, or individual item pages and user submitted content?
What I'm getting at is trying to find if there's a certain type of page that Google isn't indexing. If you have distinct types of pages, you can create separate site maps (one for each type of content) and see if one type of content is being indexed better than another. It's more of a diagnostics tool that a solution, but I've found it helpful for sites of that size and larger in the past.
As other people have said, it's also a new site, so the lack of links could be hindering things as well.
-
Agreed!
-
Oh yes, Google is very big on balancing and allocation of resources. I don't think 10,000 will present a problem though as this number may be too common on ecommerce and content websites.
-
Very good advice in the replies. Everyone seems to have forgotten PageRank though. In Google's random surfer model it is assumed user will at some point abandon the website (after PageRank has been exhausted). This means if your site lacks raw link juice it may not have enough to go around through the whole site structure and it leaves some pages dry and unindexed. What can help is: Already mentioned flatter site architecture and unique content, but also direct links to pages not in index (including via social media) and more and stronger links towards home page which should ideally cascade down to the rest.
-
If you don't have many links to your site yet, I think that could reduce the number of pages that Google keeps in its main index. Google may allocate less resources to crawling your site if you have very little link juice, especially if deep pages on your site have no link juice coming in to them.
Another possibility is if some of the 10,000 pages are not unique content or duplicate content. Google could send a lot of your pages to its supplemental index if this is the case.
-
If you flatten out your site architecture a bit to where all pages are no more then 3 clicks deep, and provide a better HTML sitemap you will definitely see more pages indexed. It wont be all 10k, but it will be an improvement.
-
I appreciate the reply. The HTML site map does not show all 10,000 pages and some pages are likely more than 3 deep. I will try this and see what happens.
-
Google will not index your entire 10k page site just because you submitted the links in a site map. They will crawl your site and index many pages, but most likely you will never have your entire site indexed.
Cleaning up your crawl errors will help in getting your content indexed. A few other things you can do are:
-
provide a HTML sitemap on your website
-
ensure your site navigation is solid ( i.e. all pages are reachable, no island pages, the navigation can be seen in HMTL, etc)
-
ensure you do not have deep content. Google will often only go about 3 clicks deep. If you have buried content, it won't be indexed unless it is well linked.
-
if there are any particular pages you want to get indexed, you can link to them from your home page, or ask others to link to those pages from external sites.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to get google to forget my old but still working page and list my new fully optimized page for a keyword?
Hi There! (i am beginner in seo) I have dynamic and static pages on our site. I created a static page for a specific keyword. Fully optimized it, (h1, alt, metas, etc.....maybe too optimized). My problem is that this page is alive for weeks, checked it in GWT and it is in robots.txt, google sees it, and indexed it. BUT whenewer i do a search for that keyword, we still appear with the dynamically created link in the google listings. How could i "redirect" google, if sy make a search for that keyword than shows our optimized page? Is there a tool for that? I cant delete the dynamic page... Any ideas? Thx Andrew
Technical SEO | | Neckermann0 -
Website SEO Product Pages - Condense Product Pages
We are managing a website that has seen consistently dropping rankings over the last 2 years (http://www.independence-bunting.com/). Our long term strategy has been purely content-based and is of high quality, but isn’t seeing the desired results. It is an ecommerce site that has a lot of pages, most of which are category or product pages. Many of the product pages have duplicate or thin content, which we currently see as one of the primary reasons for the ranking drops.The website has many individual products which have the same fabric and size options, but have different designs. So it is difficult to write valuable content that differs between several products that have similar designs. Right now each of the different designs has its own product page. We have a dilemma, because our options are:A.Combine similar designs of the product into one product page where the customer must choose a design, a fabric, and a size before checking out. This way we can have valuable content and don’t have to duplicate that content on other pages or try to find more to say about something that there really isn’t anything else to say about. However, this process will remove between 50% and 70% of the pages on the website. We know number of indexed pages is important to search engines and if they suddenly see that half of our pages are gone, we may cause more negative effects despite the fact that we are in fact aiming to provide more value to the user, rather than less.B.Leave the product pages alone and try to write more valuable content for each product page, which will be difficult because there really isn’t that much more to say, or more valuable ways to say it. This is the “safe” option as it means that our negative potential impact is reduced but we won’t necessarily see much positive trending either. C.Test solution A on a small percentage of the product categories to see any impact over the next several months before making sitewide updates to the product pages if we see positive impact, or revert to the old way if we see negative impact.Any sound advice would be of incredible value at this point, as the work we are doing isn’t having the desired effects and we are seeing consistent dropping rankings at this point.Any information would be greatly appreciated. Thank you,
Technical SEO | | Ed-iOVA0 -
Why google indexed pages are decreasing?
Hi, my website had around 400 pages indexed but from February, i noticed a huge decrease in indexed numbers and it is continually decreasing. can anyone help me to find out the reason. where i can get solution for that? will it effect my web page ranking ?
Technical SEO | | SierraPCB0 -
Should I put meta descriptions on pages that are not indexed?
I have multiple pages that I do not want to be indexed (and they are currently not indexed, so that's great). They don't have meta descriptions on them and I'm wondering if it's worth my time to go in and insert them, since they should hypothetically never be shown. Does anyone have any experience with this? Thanks! The reason this is a question is because one member of our team was linking to this page through Facebook to send people to it and noticed random text on the page being pulled in as the description.
Technical SEO | | Viewpoints0 -
.co.uk/index.html or just .co.uk - my on-page reports are different for both - why?
It looks like the same thing, yet it has a different on-page report for each version - why is this. Please share your ideas with me on this. The original url is http://bath.waspkilluk.co.uk/index.html. Many Thanks - Simon.
Technical SEO | | simonberenyi0 -
Index page 404 error
Crawl Results show there is 404 error page which is index.htmk **it is under my root, ** http://mydomain.com/index.htmk I have checked my index page on the server and my index page is index.HTML instead of index.HTMK. Please help me to fix it
Technical SEO | | semer0 -
Where to put Schema On Page
What part of my page should I put Schema data? Header? Footer? Also All pages? or just home page?
Technical SEO | | bozzie3114 -
I have 15,000 pages. How do I have the Google bot crawl all the pages?
I have 15,000 pages. How do I have the Google bot crawl all the pages? My site is 7 years old. But there are only about 3,500 pages being crawled.
Technical SEO | | Ishimoto0