XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Breaking up a site into multiple sites
Hi, I am working on plan to divide up mid-number DA website into multiple sites. So the current site's content will be divided up among these new sites. We can't share anything going forward because each site will be independent. The current homepage will change to just link out to the new sites and have minimal content. I am thinking the websites will take a hit in rankings but I don't know how much and how long the drop will last. I know if you redirect an entire domain to a new domain the impact is negligible but in this case I'm only redirecting parts of a site to a new domain. Say we rank #1 for "blue widget" on the current site. That page is going to be redirected to new site and new domain. How much of a drop can we expect? How hard will it be to rank for other new keywords say "purple widget" that we don't have now? How much link juice can i expect to pass from current website to new websites? Thank you in advance.
Intermediate & Advanced SEO | | timdavis0 -
Large sites linking to us in their menu
Hello, I am digging in to our in-links in WMT and notice that we have a number of sites that link to us in their menu or every page on their site, making for hundreds of thousands of links to our site. Here is an example:
Intermediate & Advanced SEO | | evansluke
http://www.askthebookie.com/ (in the topmost right menu there is a link to our forums) http://www.covers.com/postingforum/ There are about a dozen or so sites that link to our homepage tens-of-thousands of times. Should I disavow them, or is this be viewed as a legitimate link? Thanks in advance for any help!0 -
Google Indexed my Site then De-indexed a Week After
Hi there, I'm working on getting a large e-commerce website indexed and I am having a lot of trouble.
Intermediate & Advanced SEO | | Travis-W
The site is www.consumerbase.com. We have about 130,000 pages and only 25,000 are getting indexed. I use multiple sitemaps so I can tell which product pages are indexed, and we need our "Mailing List" pages the most - http://www.consumerbase.com/mailing-lists/cigar-smoking-enthusiasts-mailing-list.html I submitted a sitemap a few weeks ago of a particular type of product page and about 40k/43k of the pages were indexed - GREAT! A week ago Google de-indexed almost all of those new pages. Check out this image, it kind of boggles my mind and makes me sad. http://screencast.com/t/GivYGYRrOV While these pages were indexed, we immediately received a ton of traffic to them - making me think Google liked them. I think our breadcrumbs, site structure, and "customers who viewed this product also viewed" links would make the site extremely crawl-able. What gives?
Does it come down to our site not having enough Domain Authority?
My client really needs an answer about how we are going to get these pages indexed.0 -
Does Google index more than three levels down if the XML sitemap is submitted via Google webmaster Tools?
We are building a very big ecommerce site. The site has 1000 products and has many categories/levels. The site is still in construccion so you cannot see it online. My objective is to get Google to rank the products (level 5) Here is an example level 1 - Homepage - http://vulcano.moldear.com.ar/ Level 2 - http://vulcano.moldear.com.ar/piscinas/ Level 3 - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/ Level 4 - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/autocebantes.html/ Level 5 - Product is on this level - http://vulcano.moldear.com.ar/piscinas/electrobombas-para-piscinas/autocebantes/autocebante-recomendada-para-filtros-vc-10.html Thanks
Intermediate & Advanced SEO | | Carla_Dawson0 -
Sitemaps
I am working with a site that has sitemaps broken down very specifically. By page type: article, page etc and also broken down by Category. Unfortunately, this is not done hierarchically. Category and page type are separate maps, they are not nested. My question here is: Is is detrimental to have two separate sitemaps that point to the same pages? Should we eliminate one of these taxonomies, or maybe just try to make them hierarchical? IE item type -> category -> pagetitle Is there an issue with having a sitemap index that points to a nested sitemap index? (I dont think so, but might as well be sure. Thanks Moz Community! Can't delete my question, but turns out that isn't how they are structured. Food for thought anyway I suppose.
Intermediate & Advanced SEO | | MarloSchneider0 -
Large site rel=can or no-index?
Hi, A large site with tens of thousands of pages, but lots of the pages are very similar. The site is about training courses, and the url structure is something like: training-course/date/time I only really want the search engines to index the actual training course pages, which is the better option for me and why?: a) rel=canonical b) noindex, nofollow Thanks, Gary.
Intermediate & Advanced SEO | | cottamg0 -
Same day indexing... some tips for a non blog site
Hi There's a question today about paying for Yahoo directory - I wondered if these kinds of links would help with getting indexed. We're adding about 10-20 pages of new unique content each day, but it takes google about 4 days to list it and then it only lists a few pages of that 10-20 pages, we have other older sites and blogs that are within the hour. What tips can you give me for getting at least a same day index, and a thorough one at that, so if we publish 20 pages on Thursday by the end of play Friday they're indexed? Would the Yahoo Directory help?
Intermediate & Advanced SEO | | xoffie0 -
One platform, multiple niche sites: Worth $60/mo so each site has different class C?
Howdy all, The short of it is that I currently run a very niche business directory/review website and am in the process of expanding the system to support running multiple sites out of the same database/codebase. In a normal setup I'd just run all the sites off of the same server with all of them sharing a single IP address, but thanks to the wonders of the cloud, it would be fairly simple for me to run each site on it's own server at a cost of about $60/mo/site giving each site a unique IP on a unique c-block (in many cases a unique a-block even.) The ultimate goal here is to leverage the authority I've built up for the one site I currently run to help grow the next site I launch, and repeat the process. The question is: Is the SEO-value that the sites can pass to each other worth the extra cost and management overhead? I've gotten conflicting answers on this topic from multiple people I consider pretty smart so I'd love to know what other people say.
Intermediate & Advanced SEO | | qurve0