XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexed Site A's Content On Site B, Site C etc
Hi All, I have an issue where the content (pages and images) of Site A (www.ericreynolds.photography) are showing up in Google under different domains Site B (www.fastphonerepair.com), Site C (www.quarryhillvet.com), Site D (www.spacasey.com). I believe this happened because I installed an SSL cert on Site A but didn't have the default SSL domain set on the server. You were able to access Site B and any page from Site A and it would pull up properly. I have since fixed that SSL issue and am now doing a 301 redirect from Sites B, C and D to Site A for anything https since Sites B, C, D are not using an SSL cert. My question is, how can I trigger google to re-index all of the sites to remove the wrong listings in the index. I have a screen shot attached so you can see the issue clearer. I have resubmitted my site map but I'm not seeing much of a change in the index for my site. Any help on what I could do would be great. Thanks
Intermediate & Advanced SEO | | cwscontent
Eric TeVM49b.png qPtXvME.png1 -
For a sitemap.html page, does the URL slug have to be /sitemap?
Also, do you have to have anchors in your sitemap.html? or are naked URLs that link okay?
Intermediate & Advanced SEO | | imjonny1230 -
Hreflang in vs. sitemap?
Hi all, I decided to identify alternate language pages of my site via sitemap to save our development team some time. I also like the idea of having leaner markup. However, my site has many alternate language and country page variations, so after creating a sitemap that includes mostly tier 1 and tier 2 level URLs, i now have a sitemap file that's 17mb. I did a couple google searches to see is sitemap file size can ever be an issue and found a discussion or two that suggested keeping the size small and a really old article that recommended keeping it < 10mb. Does the sitemap file size matter? GWT has verified the sitemap and appears to be indexing the URLs fine. Are there any particular benefits to specifying alternate versions of a URL in vs. sitemap? Thanks, -Eugene
Intermediate & Advanced SEO | | eugene_bgb0 -
Why is page still indexing?
Hi all, I have a few pages that - despite having a robots meta tag and no follow, no index, they are showing up in Google SERPs. In troubleshooting this with my team, it was brought up that another page could be linking to these pages and causing this. Is that plausible? How could I confirm that? Thanks,
Intermediate & Advanced SEO | | SSFCU
Sarah0 -
Strange indexing of multi language site
I've been looking at a site which has a strange ranking/indexing issue. The website has several translated versions of the site for different languages and these translated pages seem to be outranking the UK pages in the UK search results. All of the translated pages are in sub folders, eg domain.com domain.com/fr domain.com/sv domain.com/it domain.com/es I cant work out how or why Google would see these pages with non english content as more relevant than the UK pages? One thing I did notice is that there are no meta language tags on there. Could this be the issue?
Intermediate & Advanced SEO | | edwardlewis0 -
Sitemap.xml Question
I am pretty new to SEO and I have been creating new pages for our website for niche terms. Should I include ALL pages on our website in the sitemap.xml or should I only have our "main" pages listed on the sitemap.xml file? Thanks
Intermediate & Advanced SEO | | threebiz0 -
What on-page/site optimization techniques can I utilize to improve this site (http://www.paradisus.com/)?
I use a Search Engine Spider Simulator to analyze the homepage and I think my client is using black hat tactics such as cloaking. Am I right? Any recommendations on to improve the top navigation under Resorts pull down. Each of the 6 resorts listed are all part of the Paradisus brand, but each resort has their own sub domain.
Intermediate & Advanced SEO | | Melia0 -
Sitemap in SERPS
What's up guys, Having some troubles with SERP rankings. My sitemap (navigation) is appearing instead of my actual keywords. I have tried a few methods to fix this; setting a preferred domain, using a 301 redirects, deleting out of date pages via Google webmaster tools. Nothing seems to work. My next step was to refresh the cache for my entire site - does anyone know how to do this? Can't see any tools... Any help would be great. Cheers, Jon.
Intermediate & Advanced SEO | | jamesjk240