New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
-
In Google Search console
4,218,017 URLs submitted
402,035 URLs indexed
what is the best way to troubleshoot?
What is best guidance for sitemap indexation of large sites with a lot of changing content?
-
Hi Hamish
I'm not sure how many products you have listed on your website but I am only guessing that it is not 4m of even 400,000. I think the question you should be asking yourself is 'do I really need so many URLs?'
If you have 50,000 products in your site then frankly you only need maybe 51000 pages in total (including support pages, brands (maybe), categories and sub-categories. I am only guessing but I would suggest that the other pages are being created by tags or other attributes and that these elements are creating acres of duplicate and very skinny content.
My usual question is - 'so you have 400,000 (never mind 4m) pages in Google? - did you write or generate 400,000 pages of useful, interesting, non-duplicate and shareable content? The answer of course is usually no.
Try switching off sets of tags and canonicalizing very similar content and you'll be amazed how it helps rankings!
Just a thought
Regards Nigel
Carousel Projects.
-
This post from Search Engine Journal (https://www.searchenginejournal.com/definitive-list-reasons-google-isnt-indexing-site/118245/) is helpful for troubleshooting.
This Moz post (https://moz.com/blog/8-reasons-why-your-site-might-not-get-indexed) has some additional considerations. The 6th point the post author raises is one you should pay attention to given you're asking about a large e-commerce site. Point 6 says you might not have enough Pagerank, that "the number of pages Google crawls is roughly proportional to your pagerank".
As you probably know, Google has said they're not maintaining Pagerank anymore, but the essence of the issue raised is a solid one. Google does set a crawl budget for every website and large e-commerce sites often run into situations where they run out before the entire site is indexed. You should look at your site structure, robots tagging, and as Jason McMahon says, internal linking to make sure you are directing Google to the most important pages on your site first, and that all redundant content is canonicalized or noindexed.
I'd start with that.
-
Hi Hamish_TM,
It is hard to say without knowing the exact URL but here are some things to consider:
- Indexing Lag - How long ago did you submit the sitemaps? We usually find there can be at least a few weeks lag between when the sitemaps are submitted and when all the URL's are indexed.
- Internal Linking - What does your sites internal linking structure look like? Good internal linking like having breadcrumbs, in-text links, sidebar links and siloed URL structuring can help the indexation process.
- **Sitemap Errors - **Are there currently any sitemap errors listed in Google Search Console? Either on the dashboard or in the sitemaps section? Any issues here could be adding to your problem.
Hopefully, this is of some help and let me know how you go.
Regards,
Jason.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Indexed But Not Submitted to Sitemap
Hi guys, In Google's webmaster tool it says that the URL has been indexed but not submitted to the sitemap. Is it necessary that the URL be submitted to the sitemap if it has already been indexed? Appreciate your help with this. Mark
Technical SEO | | marktheshark100 -
What are best options for website built with navigation drop-down menus in JavaScript, to get those menus indexed by Google?
This concerns f5.com, a large website with navigation menus that drop down when hovered over. The sub nav items (example: “DDoS Protection”) are not cached by Google and therefore do not distribute internal links properly to help those sub-pages rank well. Best option naturally is to change the nav menus from JS to CSS but barring that, is there another option? Will Schema SiteNavigationElement work as an alternate?
Technical SEO | | CarlLarson0 -
301 Redirects, Sitemaps and Indexing - How to hide redirected urls from search engines?
We have several pages in our site like this one, http://www.spectralink.com/solutions, which redirect to deeper page, http://www.spectralink.com/solutions/work-smarter-not-harder. Both urls are listed in the sitemap and both pages are being indexed. Should we remove those redirecting pages from the site map? Should we prevent the redirecting url from being indexed? If so, what's the best way to do that?
Technical SEO | | HeroDesignStudio0 -
Changing URLs for SEO
Hi, Currently we have a page, /business, but we have shifted our strategy to optimize for this page for the keyword "enterprise" instead of "business". The page authority of this page is 18 and our domain authority is 35. I've already updated content and title tags to more of an enterprise focus. Would it be wise to move the page to /enterprise and create a 301 redirect from /business to /enterprise? Or is this too risky from an SEO standpoint? Thanks!
Technical SEO | | mikekeeper0 -
Redirecting HTTP to HTTPS - How long does it take Google to re-index the site?
hello Moz We know that this year, Moz changed its domain to moz.com from www.seomoz.org
Technical SEO | | joony
however, when you type "site:seomoz.org" you still can find old urls indexed on Google (on page 7 and above) We also changed our site from http://www.example.com to https://www.example.com
And Google is indexing both sites even though we did proper 301 redirection via htaccess. How long would it take Google to refresh the index? We just don't worry about it? Say we redirected our entire site. What is going to happen to those websites that copied and pasted our content? We have already DMCAed their webpages, but making our site https would mean that their website is now more original than our site? Thus, Google assumes that we have copied their site? (Google is very slow on responding to our DMCA complaint) Thank you in advance for your reply.0 -
Best Practice for Blocking a site from 1 countries search engines
A client cannot appear in any search engines in one given country but they are ok in rest of the world. Has anybody had any experience blocking a site from appearing in just google.de, bing.de and yahoo.de for example?
Technical SEO | | Salience_Search_Marketing0 -
Will blocking the Wayback Machine (archive.org) have any impact on Google crawl and indexing/SEO?
Will blocking the Wayback Machine (archive.org) by adding the code they give have any impact on Google crawl and indexing/SEO? Anyone know? Thanks! ~Brett
Technical SEO | | BBuck0 -
Changing the URL structure will it help me or hurt me?
I got handed a website running on Joomla without the SEO friendly URL check box selected so our URLs all look like this www.rotaryvalve.com/index.php?option=com_content&view=article&id=22&Itemid=37 . I am hoping to rework this website in the near future here and plan on changing the URL structure across the website so there are some actual keywords in the URL. When I did this I was thinking of just doing 301 redirects to the new pages and hopefully the hit from the search engines wouldn't be too bad. Can anyone speak from experience as to what the best way to go about doing this would be so I don't end up falling back ranking wise. Would change the URLs end up helping me or hurting me? Thanks
Technical SEO | | wmwmeyer0