New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
-
In Google Search console
4,218,017 URLs submitted
402,035 URLs indexed
what is the best way to troubleshoot?
What is best guidance for sitemap indexation of large sites with a lot of changing content?
-
Hi Hamish
I'm not sure how many products you have listed on your website but I am only guessing that it is not 4m of even 400,000. I think the question you should be asking yourself is 'do I really need so many URLs?'
If you have 50,000 products in your site then frankly you only need maybe 51000 pages in total (including support pages, brands (maybe), categories and sub-categories. I am only guessing but I would suggest that the other pages are being created by tags or other attributes and that these elements are creating acres of duplicate and very skinny content.
My usual question is - 'so you have 400,000 (never mind 4m) pages in Google? - did you write or generate 400,000 pages of useful, interesting, non-duplicate and shareable content? The answer of course is usually no.
Try switching off sets of tags and canonicalizing very similar content and you'll be amazed how it helps rankings!
Just a thought
Regards Nigel
Carousel Projects.
-
This post from Search Engine Journal (https://www.searchenginejournal.com/definitive-list-reasons-google-isnt-indexing-site/118245/) is helpful for troubleshooting.
This Moz post (https://moz.com/blog/8-reasons-why-your-site-might-not-get-indexed) has some additional considerations. The 6th point the post author raises is one you should pay attention to given you're asking about a large e-commerce site. Point 6 says you might not have enough Pagerank, that "the number of pages Google crawls is roughly proportional to your pagerank".
As you probably know, Google has said they're not maintaining Pagerank anymore, but the essence of the issue raised is a solid one. Google does set a crawl budget for every website and large e-commerce sites often run into situations where they run out before the entire site is indexed. You should look at your site structure, robots tagging, and as Jason McMahon says, internal linking to make sure you are directing Google to the most important pages on your site first, and that all redundant content is canonicalized or noindexed.
I'd start with that.
-
Hi Hamish_TM,
It is hard to say without knowing the exact URL but here are some things to consider:
- Indexing Lag - How long ago did you submit the sitemaps? We usually find there can be at least a few weeks lag between when the sitemaps are submitted and when all the URL's are indexed.
- Internal Linking - What does your sites internal linking structure look like? Good internal linking like having breadcrumbs, in-text links, sidebar links and siloed URL structuring can help the indexation process.
- **Sitemap Errors - **Are there currently any sitemap errors listed in Google Search Console? Either on the dashboard or in the sitemaps section? Any issues here could be adding to your problem.
Hopefully, this is of some help and let me know how you go.
Regards,
Jason.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Main Site and eCommerce Site URLs for SEO
My client currently has a main website on a url and an eCommerce site on a subdomain. The eCommerce site is currently not mobile friendly, has images that are too small and are problematic - and I believe it negates some of the SEO work we do for them. I had to turn off Google Shopping ads because the quality score was so low. That being said, they are rebuilding a shopping cart on a new platform that will be mobile friendly BUT the images are going to be tiny until they slowly replace images over several months. Would you keep the shopping cart on a subdomain, or make it part of the main website URL? Can it negatively impact the progress we have made on the main site SEO.
Technical SEO | | jerrico10 -
Client wants to repackage in-depth content as PowerPoint files and embed on site. SEO implications?
Hi, I've a client who is planning to build out "courses" for their site. Their ultimate goal is to have videos (which will have transcriptions) but since the videos are not yet ready they want to launch with the content in PowerPoint format instead. Thing is, the pages they have now are really good content/in-depth. In short it seems videos are Phase 2, so their Phase 1 preference is to take all their courses content and put them in PowerPoint slides and add them to their web site. While I understand standalone files like PDFs and PPTs can be indexable, my recollection is that embedded slides are not (like SlideShare). Is that correct? My worry is that by taking this content and reformatting it into PowerPoints will hurt their site instead of helping. Any insight is appreciated!
Technical SEO | | CR-SEO0 -
Use existing page with bad URL or brand new URL?
Hello, We will be updating an existing page with more helpful information with the goal of reaching more potential customers through SEO and also attaching a SEM campaign to the specific landing page. The current URL of the page scores 25 on Page Authority, and has 2 links to it from blog articles (PA 35, 31). The current content needs to be rewritten to be more helpful and also needs some additional information. The downsides are that it has an "bad" URL- no target keyword and uses underscores. Which of the following choices would you make? 1. Update this old "bad" URL with new content. Benefit from the existing PA. -or- 2. Start with a new optimized URL, reusing some of the old content and utilizing a 301 redirect from the previous page? Thank you!
Technical SEO | | XLMarketing0 -
Creating a help hub, not sure the best name to use, " keyword help " or " help hub "?
I've been creating new content for our site, lots of help related content, so I created a help hub section. Now the more I go through it, and look at url structure and breadcrumbs, I can't help but think I should be using a keyword in there, but also don't want to over do it, since the keyword we are shooting for is also a subsection of our site, complete with url keyword and breadcrumb. So I just don't want to have too many over redundant titles like keyword this and keyword that, so I came here to get some advice from the awesome community of folks. Keep help hub so it's: Url: site.com/help-hub/helppage1 Breadcrumb: Home > Help-Hub > Help Page 1 or Url: site.com/keyword/help/helppage1 Breadcrumb: Home > Keyword > Help > Help Page 1
Technical SEO | | Deacyde0 -
Vanity URLs are being indexed in Google
We are currently using vanity URLs to track offline marketing, the vanity URL is structured as www.clientdomain.com/publication, this URL then is 302 redirected to the actual URL on the website not a custom landing page. The resulting redirected URL looks like: www.clientdomain.com/xyzpage?utm_source=print&utm_medium=print&utm_campaign=printcampaign. We have started to notice that some of the vanity URLs are being indexed in Google search. To prevent this from happening should we be using a 301 redirect instead of a 302 and will the Google index ignore the utm parameters in the URL that is being 301 redirect to? If not, any suggestions on how to handle? Thanks,
Technical SEO | | seogirl221 -
Post Site Migration - thousands of indexed pages, 4 months after
Hi all, Believe me. I think I've already tried and googled for every possible question that I have. This one is very frustrating – I have the following old domain – fancydiamonds dot net. We built a new site – Leibish dot com and done everything by the book: Individual 301 redirects for all the pages. Change of address via the GWT. Trying to maintain and improve the old optimization and hierarchy. 4 months after the site migration – we still have to gain back more than 50% of our original organic traffic (17,000 vs. 35,500-50,000 The thing that strikes me the most that you can still find 2400 indexed pages on Google (they all have 301 redirects). And more than this – if you'll search for the old domain name on Google – fancydiamonds dot net you'll find the old domain! Something is not right here, but I have no explanation why these pages still exist. Any help will be highly appreciated. Thanks!
Technical SEO | | skifr0 -
E-commerce site multi language product urls
Hi, On my E-commerce site I use optimized Url's (product names) with my native language (Hebrew) I am going to translate some of the product pages to English with differnat descriptions, correct layout, etc... But I was wandering about the product Url's. Should I stay with the original product URL and add a language parameter to the url? Should I add a completely new URL using the English product name? Of course what I'm worrying the most of is duplicate content. Thanks, Asaf
Technical SEO | | AsafY0 -
We changed the URL structure 10 weeks ago and Google hasn't indexed it yet...
We recently modified the whole URL structure on our website, which resulted in huge amount of 404 pages changing them to nice human readable urls. We did this in the middle of March - about 10 weeks ago... We used to have around 5000 404 pages in the beginning, but this number is decreasing slowly. (We have around 3000 now). On some parts of the website we have also set up a 301 redirect from the old URLs to the new ones, to avoid showing a 404 page thus making the “indexing transmission”, but it doesn’t seem to have made any difference. We've lost a significant amount of traffic, because of the URL changes, as Google removed the old URLs, but hasn’t indexed our new URLs yet. Is there anything else we can do to get our website indexed with the new URL structure quicker? It might also be useful to know that we are a page rank 4 and have over 30,000 unique users a month so I am sure Google often comes to the site quite often and pages we have made since then that only have the new url structure are indexed within hours sometimes they appear in search the next day!
Technical SEO | | jack860