Google Sitemap only indexing 50% Is that a problem?
-
We have about 18,000 pages submitted on our Google Sitemap and only about 9000 of them are indexed. Is this a problem?
We have a script that creates a sitemap on a daily basis and it is submitted on a daily basis. Am I better off only doing it once a week? Is this why I never get to the full 18,000 indexed?
-
My robots, tags and redirects are all good now. Any other things to look at?
-
Have you done some troubleshooting? If there's that much of a % change, did you check your robots, tags, redirects, etc. to see if any of the technical side may be hindering indexing?
-
It is a large e-commerce site with pretty much the exact situation described. We re did the site about 6 weeks ago and the site before was always close to 100% indexed. It was about 17900 out of 18000.
-
Great answer Donford. We have a large site, with many items that are basically the same but usually have one different attribute value. So Google will typical index a parent page and list the rest as:
Results 1 - 15 of 15 – Medium Duty - Swivel Top Plate - Capacity to 400 lbs ...
So even though the page may not be in the primary index, it will still help the visitor get to what they are looking for. So I would advise grabbing a snippet of text on a page not indexed and using it as a query to see if this is the case.
-
Google will index more as they find value in more links. The last ecommerce site I worked on had 12,000 pages as of the end of the year they were 85% indexed.
It is quite common from my experience for larger sites to take awhile to be fully indexed if ever at all. Here is what Goolge says about ensuring proper setup, but other then what they say, its all about content and uniqueness. A particular challenge for some e-commerce sites whom sell items that are similar in nature. Like 1/2"x1" screw vs 5/8" x 1" screw. Its very hard to develop unique content for items that similar.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Indexing Of Pages As HTTPS vs HTTP
We recently updated our site to be mobile optimized. As part of the update, we had also planned on adding SSL security to the site. However, we use an iframe on a lot of our site pages from a third party vendor for real estate listings and that iframe was not SSL friendly and the vendor does not have that solution yet. So, those iframes weren't displaying the content. As a result, we had to shift gears and go back to just being http and not the new https that we were hoping for. However, google seems to have indexed a lot of our pages as https and gives a security error to any visitors. The new site was launched about a week ago and there was code in the htaccess file that was pushing to www and https. I have fixed the htaccess file to no longer have https. My questions is will google "reindex" the site once it recognizes the new htaccess commands in the next couple weeks?
Intermediate & Advanced SEO | | vikasnwu1 -
Google Index Status Falling Fast - What should I be considering?
Hi Folks, Working on an ecommerce site. I have found a month on month fall in the Index Status continuing since late 2015. This has resulted in around 80% of pages indexed according to Webmaster. I do not seem to have any bad links or server issues. I am in the early stages of working through, updating content and tags but am yet to see a slowing of the fall. If anybody has tips on where to look for to issues or insight to resolve this I would really appreciate it. Thanks everybody! Tim
Intermediate & Advanced SEO | | Toby-Symec0 -
Weird Indexation Issue
On this webpage, we have an interactive graphic that allows users to click a navigational element and learn more about an anatomical part of the knee or a knee malady. For example, a user could click "Articular Cartilage" and they will land on this page: http://www.neocartimplant.com/knee-anatomy-maladies/anatomy/articular-cartilage The weird thing is whether you perform a Google Search for the above URL or for a string of text on that URL (i.e. "Articular cartilage is hyaline cartilage (as opposed to menisci, which consists of fibrocartilage) on the articular surfaces, or the ends, of bones. This thin, smooth tissue lines both joint surfaces where the bones come together to form the knee. ") the following page ranks: http://www.neocartimplant.com/anatmal/knee-anatomy-maladies/anatomy/articular-cartilage.php I have two questions: 1 - Any idea on how the Googlebot is getting to that page?
Intermediate & Advanced SEO | | davidangotti
2 - How should I get the Googlebot to index the correct page (http://www.neocartimplant.com/knee-anatomy-maladies/anatomy/articular-cartilage)? Thanks in advance for your help!0 -
Getting into Google News, URL's & Sitemaps
Hello, I know that one of the 'technical requirements' to get into google news is that the URL's have unique numbers at the end, BUT, that requirement can be circumvented if you have a Google News Sitemap. I've purchased the Yoast Google News Sitemap (https://yoast.com/wordpress/plugins/news-seo/) BUT just found out that you cannot submit a google news Sitemap until you are accepted into google news. Thus, my question is that do you need to add the digits to the URL's temporarily until you get in and can submit a google news sitemap, OR, is it ok to apply without them and take care of the sitemap after you get in. If anyone has any other tips about getting into Google News that would be great! Thanks!
Intermediate & Advanced SEO | | stacksnew0 -
Why is a site no longer being indexed by Google after HTTPS switch?
A client of ours recently had a new site built and made the switch to HTTPS. We made sure to redirect all of the HTTP pages to HTTPS and submitted a new sitemap to Google. GWT says the sitemap was submitted successfully but only 4 pages have been indexed where there should be over 2000. This has led to a plummet of organic traffic and we can't find the issue. Has anyone else had issues/success with doing a HTTPS switch that knows how to fix this problem?
Intermediate & Advanced SEO | | ATMOSMarketing560 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
To index search results or to not index search results?
What are your feelings about indexing search results? I know big brands can get away with it (yelp, ebay, etc). Apart from UGC, it seems like one of the best ways to capture long tail traffic at scale. If the search results offer valuable / engaging content, would you give it a go?
Intermediate & Advanced SEO | | nicole.healthline0 -
Google Indexed the HTTPS version of an e-commerce site
Hi, I am working with a new e-commerce site. The way they are setup is that once you add an item to the cart, you'll be put onto secure HTTPS versions of the page as you continue to browse. Well, somehow this translated to Google indexing the whole site as HTTPS, even the home page. Couple questions: 1. I assume that is bad or could hurt rankings, or at a minimum is not the best practice for SEO, right? 2. Assuming it is something we don't want, how would we go about getting the http versions of pages indexed instead of https? Do we need rel-canonical on each page to be to the http version? Anything else that would help? Thanks!
Intermediate & Advanced SEO | | brianspatterson0