Some URLs in the sitemap not indexed
-
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index.
It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible.
Any ideas on why this is? Is this type of discrepancy typical?
-
Thanks. Very helpful!
-
This is great to know that 10% is a good discrepancy. Hard to know otherwise.
That article about Screaming Frog is super helpful, thanks!
-
I have never had a site with 100% crawled pages, sometimes Google will drop a page off for being too similar to another, not informative enough, canonical links set, redirects.
As Ryan says, don't just rely on Moz use Screaming Frog to get a good view of your site too, see if there are any errors. Also you can run the frog whenever you like, it's just a little more technical to understand.
Xenu oooh never heard of that one Ryan thanks!
Just looked into Xenu, Screaming frog does it all and some.
-
Hi Mase,
I've managed sites with with hundreds of thousands of pages too, and in my experience a discrepancy between what's offered up via the sitemaps and what gets indexed is typical (dare I say it, a 10% discrepancy seems pretty good!). Pages deeper in the site seem to suffer this fate more frequently than those with fewer subfolders, as do those with thin content.
I agree completely with Ryan's comment about Screaming Frog: it is an invaluable tool for site audits, in addition to lots of other useful site insights. You might find this article interesting to get a sense of the many ways you can use SF: http://www.seerinteractive.com/blog/screaming-frog-guide/
-
You're welcome. Definitely take a look at a crawler that gives you more insight, especially with a site as large as yours. Just note, no matter what you might never achieve an exact match between the pages you've submitted and the number indexed as Google can decide not to index a page for other reasons aside from the page's presence in a site map. Something useful for you as well would be to look at how many of your pages recieve visits in analytics. That will give you an idea of percentages on pages in the sitemap vs the index vs active.
-
I have not run the site through those tools you mentioned, I'm unfamiliar.
I am not, however, receiving any errors on those pages. And when I "Fetch and Render" in GWMT, they look and render fine without errors. I'm able to submit them to the index one-by-one.
Thanks for your response, Ryan.
-
Hi Mase. Are you getting errors on URLs you've submitted? Or ran other crawlers on your site like Xenu or ScreamingFrog to produce any possible errors? It's also good to know which pages might not have enough content to be indexed: filters, sorting views, etc.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google selecting incorrect URL as canonical: 'Duplicate, submitted URL not selected as canonical'
Hi there, A number of our URLs are being de-indexed by Google. When looking into this using Google Search Console the same message is appearing on multiple pages across our sites: 'Duplicate, submitted URL not selected as canonical' 'IndexingIndexing allowed? YesUser-declared canonical - https://www.mrisoftware.com/ie/products/real-estate-financial-software/Google-selected canonical - https://www.mrisoftware.com/uk/products/real-estate-financial-software/'Has anyone else experienced this problem?How can I get Google to select the correct, user-declared canoncial? Thanks.
Technical SEO | | nfrank0 -
Sitemaps:
Hello, doing an audit found in our sitemaps the tag which at the time was to say that the url was mobile. In our case the URL is the same for desktop and mobile.
Technical SEO | | romaro
Do you recommend leaving or removing it?
Thank you!0 -
URL Indexing with Keyword
Hi, My webpage url is indexed in Google but don't show when searching the Main Keyword. How can i index it with keyword. It should show on any SERP when the keyword is searched. Any suggestions.
Technical SEO | | green.h1 -
Url folder structure
I work for a travel site and we have pages for properties in destinations and am trying to decide how best to organize the URLs basically we have our main domain, resort pages and we'll also have articles about each resort so the URL structure will actually get longer:
Technical SEO | | Vacatia_SEO
A. domain.com/main-keyword/state/city-region/resort-name
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature _
_ domain.com/family-condo-for-rent/orlando-florida/liki-tiki-village/kid-friend-pool_ B. Another way to structure would be to remove the location and keyword folders and combine. Note that some of the resort names are long and spaces are being replaced dynamically with dashes.
ex. domain.com/main-keyword-in-state-city/resort-name
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village_ _ domain.com/main-keyword-in-state-city/resort-name-feature_
_ domain.com/family-condo-for-rent-in-orlando-florida/liki-tiki-village-kid-friend-pool_ Question: is that too many folders or should i combine or break up? What would you do with this? Trying to avoid too many dashes.0 -
XML Sitemap Creation
I am looking for a tool where I can add a list of URL's and output an XML sitemap. Ideally this would be Web based or work on the mac? Extra bonus if it handles video sitemaps. My alternative is XLS and a bunch of concatenates, but I'd rather something cleaner. It doesn't need to crawl the site. Thanks.
Technical SEO | | Jeff_Lucas0 -
Bing and Yahoo Indexing
I have a young site (6 most) that is almost completely indexed by Google but Bing and Yahoo will only index a few pages. Does anyone have any tips for getting more pages indexed in Bing and Yahoo. The site is registered with Bing Webmaster tools and has a valid XML sitemmap.
Technical SEO | | waynekolenchuk0 -
Is it OK for a sitemap to appear as a "Top URL" in Google Webmaster?
I'm using Google Webmaster (alongside other tools) to understand how Google is indexing my site. One of the tools is "Content Keywords", where it lists keywords that Google sees as significant for your site. The keywords shown are generally fine, but when I click on an individual word, I am often seeing our sitemap as one of the "Top URLs" that the keyword is found on (our sitemap is at system/sitemap1.xml.gz) - is this OK? Obviously I don't want to add the sitemap URL to robots.txt, but I also want to ensure that 'real' user-focused pages (e.g. our homepage) appear higher in the "Top URLs" list for the keywords, as I'm assuming this is an indicator of how the site is performing in search. Any help appreciated!
Technical SEO | | anilababla0 -
Rel=canonical + no index
We have been doing an a/b test of our hp and although we placed a rel=canonical tag on the testing page it is still being indexed. In fact at one point google even had it showing as a sitelink . We have this problem through out our website. My question is: What is the best practice for duplicate pages? 1. put only a rel= canonical pointing to the "wanted original page" 2. put a rel= canonical (pointing to the wanted original page) and a no index on the duplicate version Has anyone seen any detrimental effect doing # 2? Thanks
Technical SEO | | Morris770