Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My url disappeared from Google but Search Console shows indexed. This url has been indexed for more than a year. Please help!
Super weird problem that I can't solve for last 5 hours. One of my urls: https://www.dcacar.com/lax-car-service.html Has been indexed for more than a year and also has an AMP version, few hours ago I realized that it had disappeared from serps. We were ranking on page 1 for several key terms. When I perform a search "site:dcacar.com " the url is no where to be found on all 5 pages. But when I check my Google Console it shows as indexed I requested to index again but nothing changed. All other 50 or so urls are not effected at all, this is the only url that has gone missing can someone solve this mystery for me please. Thanks a lot in advance.
Intermediate & Advanced SEO | | Davit19850 -
Sitemaps: Best Practice
What should and what shouldn't go in the sitemap? In particular, pages like subscribe to our newsletter/ unsubscribe to our newsletter? Is there really any benefit in highlighting those pages to the SEs? Thanks for any advice/ anecdotes 🙂
Intermediate & Advanced SEO | | Fubra0 -
Breaking up a site into multiple sites
Hi, I am working on plan to divide up mid-number DA website into multiple sites. So the current site's content will be divided up among these new sites. We can't share anything going forward because each site will be independent. The current homepage will change to just link out to the new sites and have minimal content. I am thinking the websites will take a hit in rankings but I don't know how much and how long the drop will last. I know if you redirect an entire domain to a new domain the impact is negligible but in this case I'm only redirecting parts of a site to a new domain. Say we rank #1 for "blue widget" on the current site. That page is going to be redirected to new site and new domain. How much of a drop can we expect? How hard will it be to rank for other new keywords say "purple widget" that we don't have now? How much link juice can i expect to pass from current website to new websites? Thank you in advance.
Intermediate & Advanced SEO | | timdavis0 -
Moving html site to wordpress and 301 redirect from index.htm to index.php or just www.example.com
I found page duplicate content when using Moz crawl tool, see below. http://www.example.com
Intermediate & Advanced SEO | | gozmoz
Page Authority 40
Linking Root Domains 31
External Link Count 138
Internal Link Count 18
Status Code 200
1 duplicate http://www.example.com/index.htm
Page Authority 19
Linking Root Domains 1
External Link Count 0
Internal Link Count 15
Status Code 200
1 duplicate I have recently transfered my old html site to wordpress.
To keep the urls the same I am using a plugin which appends .htm at the end of each page. My old site home page was index.htm. I have created index.htm in wordpress as well but now there is a conflict of duplicate content. I am using latest post as my home page which is index.php Question 1.
Should I also use redirect 301 im htaccess file to transfer index.htm page authority (19) to www.example.com If yes, do I use
Redirect 301 /index.htm http://www.example.com/index.php
or
Redirect 301 /index.htm http://www.example.com Question 2
Should I change my "Home" menu link to http://www.example.com instead of http://www.example.com/index.htm that would fix the duplicate content, as indx.htm does not exist anymore. Is there a better option? Thanks0 -
Is possible to submit a XML sitemap to Google without using Google Search Console?
We have a client that will not grant us access to their Google Search Console (don't ask us why). Is there anyway possible to submit a XML sitemap to Google without using GSC? Thanks
Intermediate & Advanced SEO | | RosemaryB0 -
XML Sitemap index within a XML sitemaps index
We have a similar problem to http://www.seomoz.org/q/can-a-xml-sitemap-index-point-to-other-sitemaps-indexes Can a XML sitemap index point to other sitemaps indexes? According to the "Unique Doll Clothing" example on this link, it seems possible http://www.seomoz.org/blog/multiple-xml-sitemaps-increased-indexation-and-traffic Can someone share an XML Sitemap index within a XML sitemaps index example? We are looking for the format to implement the same on our website.
Intermediate & Advanced SEO | | Lakshdeep0 -
Should the sitemap include just menu pages or all pages site wide?
I have a Drupal site that utilizes Solr, with 10 menu pages and about 4,000 pages of content. Redoing a few things and we'll need to revamp the sitemap. Typically I'd jam all pages into a single sitemap and that's it, but post-Panda, should I do anything different?
Intermediate & Advanced SEO | | EricPacifico0 -
Sitemap in SERPS
What's up guys, Having some troubles with SERP rankings. My sitemap (navigation) is appearing instead of my actual keywords. I have tried a few methods to fix this; setting a preferred domain, using a 301 redirects, deleting out of date pages via Google webmaster tools. Nothing seems to work. My next step was to refresh the cache for my entire site - does anyone know how to do this? Can't see any tools... Any help would be great. Cheers, Jon.
Intermediate & Advanced SEO | | jamesjk240