Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
XML Sitemap Index Percentage (Large Sites)
-
Hi all
I'm wanting to find out from those who have experience dealing with large sites (10s/100s of millions of pages).
What's a typical (or highest) percentage of indexed pages vs. submitted pages you've seen? This information can be found in webmaster tools where Google shows you the pages submitted & indexed for each of your sitemap.
I'm trying to figure out whether,
- The average index % out there
- There is a ceiling (i.e. will never reach 100%)
- It's possible to improve the indexing percentage further
Just to give you some background, sitemap index files (according to schema.org) have been implemented to improve crawl efficiency and I'm wanting to find out other ways to improve this further.
I've been thinking about looking at the URL parameters to exclude as there are hundreds (e-commerce site) to help Google improve crawl efficiency and utilise the daily crawl quote more effectively to discover pages that have not been discovered yet.
However, I'm not sure yet whether this is the best path to take or I'm just flogging a dead horse if there is such a ceiling or if I'm already at the average ballpark for large sites.
Any suggestions/insights would be appreciated. Thanks.
-
I've worked on a site that was ~100 million pages, and I've seen indexation percentages ranging from 8% to 95%. When dealing with sites this size, there are so, so many issues at play, and there are so few sites of this size that finding an average probably won't do you much good.
Rather than focusing on whether or not you have enough pages indexed based on averages, you should focus on two key questions: "do my sitemaps only include pages that would make great search engine entry pages" and "have I done everything possible to eliminate junk pages that are wasting crawl bandwidth."
Of course, making sure you don't have any duplicate content, thin content, or poor on-site optimization issues should also be a focus.
I guess what I'm trying to say is, I believe any site can have 100% of it's search entry worthy pages indexed, but sites of that size rarely have ALL of their pages indexed since sites that large often have a ton of pages that don't make great search results.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google cache is showing my UK homepage site instead of the US homepage and ranking the UK site in US
Hi There, When I check the cache of the US website (www.us.allsaints.com) Google returns the UK website. This is also reflected in the US Google Search Results when the UK site ranks for our brand name instead of the US site. The homepage has hreflang tags only on the homepage and the domains have been pointed correctly to the right territories via Google Webmaster Console.This has happened before in 26th July 2015 and was wondering if any had any idea why this is happening or if any one has experienced the same issueFDGjldR
Intermediate & Advanced SEO | | adzhass0 -
Multilingual Sitemaps
Hey there, I have a site with many languages. So here are my questions concerning the sitemaps. The correct way of creating a sitemap for a multilingual site is as followed ( by the official blog of Google ) <urlset xmlns="</span>http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"> http://www.example.com/loc> <xhtml:link rel="alternate" hreflang="en" href="</span>http://www.example.com/"/> <xhtml:link rel="alternate" hreflang="de" href="</span>http://www.example.com/de"/> <xhtml:link rel="alternate" hreflang="fr" href="</span>http://www.example.com/fr"/><a href=" http:="" www.example.com="" fr"="" target="_blank"></xhtml:link><a href=" http:="" www.example.com="" de"="" target="_blank"></xhtml:link><a href=" http:="" www.example.com="" "="" target="_blank"></xhtml:link><a href=" http:="" www.sitemaps.org="" schemas="" sitemap="" 0.9"="" rel="nofollow" target="_blank"></urlset> **So here is my first question. My site has over 200.000 pages that all of them support around 5-6 languages. Am I suppose to do this example 200.000 times?****My second question is. My root domain is www.example.com but this one redirects with 301 to www.example.com/en should the sitemap be at ****www.example.com/sitemap.xmlorwww.example.com/en/sitemap.xml ???****My third question is as followed. On WMT do I submit my sitemap in all versions of my site? I have all my languages there.**Thanks in advance for taking the time to respond to this thread and by creating it I hope many people will solve their own questions.
Intermediate & Advanced SEO | | Angelos_Savvaidis0 -
Moving to a new site while keeping old site live
For reasons I won't get into here, I need to move most of my site to a new domain (DOMAIN B) while keeping every single current detail on the old domain (DOMAIN A) as it is. Meaning, there will be 2 live websites that have mostly the same content, but I want the content to appear to search engines as though it now belongs to DOMAIN B. Weird situation. I know. I've run around in circles trying to figure out the best course of action. What do you think is the best way of going about this? Do I simply point DOMAIN A's canonical tags to the copied content on DOMAIN B and call it good? Should I ask sites that link to DOMAIN A to change their links to DOMAIN B, or start fresh and cut my losses? Should I still file a change of address with GWT, even though I'm not going to 301 redirect anything?
Intermediate & Advanced SEO | | kdaniels0 -
Removing index.php
I have question for the community and whether or not this is a good or bad idea. I currently have a Joomla site that displays www.domain.com/index.php in all the URLs with the exception of the home page. I have read that it's better to not have index.php showing in the URL at all. Does it really matter if I have index.php in my URL? I've read that it is a bad practice. I am thinking about installing the sh404SEF component on my site and removing the index.php. However, I rank pretty high for the keywords I want in Google, Bing and Yahoo. All of the URLs that show up in the searches have index.php as part of the URL. Has anyone ever used sh404SEF to remove the index.php and how did you overcome not loosing your search engine links? I don't want an existing search showing www.domain.com/index.php/sales and it not linking to the correct page which would now be www.domain.com/sales. I guess I could insert the proper redirects in the htaccess file. But I was hoping to avoid having every page of my site in the htaccess file for redirecting. Any help or advice appreciated.
Intermediate & Advanced SEO | | MedGroupMedia0 -
Malicious site pointed A-Record to my IP, Google Indexed
Hello All, I launched my site on May 1 and as it turns out, another domain was pointing it's A-Record to my IP. This site is coming up as malicious, but worst of all, it's ranking on keywords for my business objectives with my content and metadata, therefore I'm losing traffic. I've had the domain host remove the incorrect A-Record and I've submitted numerous malware reports to Google, and attempted to request removal of this site from the index. I've resubmitted my sitemap, but it seems as though this offending domain is still being indexed more thoroughly than my legitimate domain. Can anyone offer any advice? Anything would be greatly appreciated! Best regards, Doug
Intermediate & Advanced SEO | | FranGen0 -
How Do I Generate a Sitemap for a Large Wordpress Site?
Hello Everyone! I am working with a Wordpress site that is in Google news (i.e. everyday we have about 30 new URLs to add to our sitemap) The site has years of articles, resulting in about 200,000 pages on the site. Our strategy so far has been use a sitemap plugin that only generates the last few months of posts, however we want to improve our SEO and submit all the URLs in our site to search engines. The issue is the plugins we've looked at generate the sitemap on-the-fly. i.e. when you request the sitemap, the plugin then dynamically generates the sitemap. Our site is so large that even a single request for our sitemap.xml ties up tons of server resources and takes an extremely long time to generate the sitemap (if the page doesn't time out in the process). Does anyone have a solution? Thanks, Aaron
Intermediate & Advanced SEO | | alloydigital0 -
Franchise sites on subdomains
I've been asked by a client to optimise a a webpage for a location i.e. London. Turns out that the location is actually a franchise of the main company. When the company launch a new franchise, so far they have simply added a new page to the main site, for example: mysite.co.uk/sub-folder/london They have so far done this for 10 or so franchises and task someone with optimising that page for their main keyword + location. I think I know the answer to this, but would like to get a back up / additional info on it in terms of ranking / seo benefits. I am going to suggest the idea of using a subdomain for each location, example: london.mysite.co.uk Would this be the correct approach. If you think yes, why? Many thanks,
Intermediate & Advanced SEO | | Webrevolve0 -
Can a XML sitemap index point to other sitemaps indexes?
We have a massive site that is having some issue being fully crawled due to some of our site architecture and linking. Is it possible to have a XML sitemap index point to other sitemap indexes rather than standalone XML sitemaps? Has anyone done this successfully? Based upon the description here: http://sitemaps.org/protocol.php#index it seems like it should be possible. Thanks in advance for your help!
Intermediate & Advanced SEO | | CareerBliss0