Some URLs in the sitemap not indexed
-
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index.
It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible.
Any ideas on why this is? Is this type of discrepancy typical?
-
Thanks. Very helpful!
-
This is great to know that 10% is a good discrepancy. Hard to know otherwise.
That article about Screaming Frog is super helpful, thanks!
-
I have never had a site with 100% crawled pages, sometimes Google will drop a page off for being too similar to another, not informative enough, canonical links set, redirects.
As Ryan says, don't just rely on Moz use Screaming Frog to get a good view of your site too, see if there are any errors. Also you can run the frog whenever you like, it's just a little more technical to understand.
Xenu oooh never heard of that one Ryan thanks!
Just looked into Xenu, Screaming frog does it all and some.
-
Hi Mase,
I've managed sites with with hundreds of thousands of pages too, and in my experience a discrepancy between what's offered up via the sitemaps and what gets indexed is typical (dare I say it, a 10% discrepancy seems pretty good!). Pages deeper in the site seem to suffer this fate more frequently than those with fewer subfolders, as do those with thin content.
I agree completely with Ryan's comment about Screaming Frog: it is an invaluable tool for site audits, in addition to lots of other useful site insights. You might find this article interesting to get a sense of the many ways you can use SF: http://www.seerinteractive.com/blog/screaming-frog-guide/
-
You're welcome. Definitely take a look at a crawler that gives you more insight, especially with a site as large as yours. Just note, no matter what you might never achieve an exact match between the pages you've submitted and the number indexed as Google can decide not to index a page for other reasons aside from the page's presence in a site map. Something useful for you as well would be to look at how many of your pages recieve visits in analytics. That will give you an idea of percentages on pages in the sitemap vs the index vs active.
-
I have not run the site through those tools you mentioned, I'm unfamiliar.
I am not, however, receiving any errors on those pages. And when I "Fetch and Render" in GWMT, they look and render fine without errors. I'm able to submit them to the index one-by-one.
Thanks for your response, Ryan.
-
Hi Mase. Are you getting errors on URLs you've submitted? Or ran other crawlers on your site like Xenu or ScreamingFrog to produce any possible errors? It's also good to know which pages might not have enough content to be indexed: filters, sorting views, etc.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can you help by advising how to stop a URL from referring to another URL on my website with a 404 errorplease?
How to stop a URL from referring to another URL on my site. I'm getting a 404 error on a referred URL which is (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/[null id=43484])referred from URL (https://webwritinglab.com/know-exactly-what-your-ideal-clients-want-in-8-easy-steps/) The referred URL is the URL page that I want and I do not need it redirecting to the other URL as that's presenting a 404 error. I have tried saving the permalink in WordPress and recreated the .htaccess file and the problem is still there. Can you advise how to fix this please? Is it a case of removing the redirect? Is this advisable and how do I do that please? Thanks
Technical SEO | | Nichole.wynter20200 -
No index and Crawl Budget
Hello, If we noindex pages, will it improve crawl budget ? For example pages like these - https://x-z.com/2012/10/
Technical SEO | | Johnroger
https://x-y.com/2012/06/
https://x-y.com/2013/03/
https://x-y.com/2019/10/
https://x-y.com/2019/08/ Should we delete/redirect such pages ? Thanks0 -
Use existing page with bad URL or brand new URL?
Hello, We will be updating an existing page with more helpful information with the goal of reaching more potential customers through SEO and also attaching a SEM campaign to the specific landing page. The current URL of the page scores 25 on Page Authority, and has 2 links to it from blog articles (PA 35, 31). The current content needs to be rewritten to be more helpful and also needs some additional information. The downsides are that it has an "bad" URL- no target keyword and uses underscores. Which of the following choices would you make? 1. Update this old "bad" URL with new content. Benefit from the existing PA. -or- 2. Start with a new optimized URL, reusing some of the old content and utilizing a 301 redirect from the previous page? Thank you!
Technical SEO | | XLMarketing0 -
URL Structure
Hi, Hope you are all well. On our website we have a 'blog' and a 'news' section. The blog is located on "/blog" - but when you click on a post the url structure changes to /name-of-article and the blog subdomain isn't included. Would it be better to have "blog/name-of-article as this would then make the blog perform better in search results? Also, if our news page is under /news - but when you click on an article it changes to /news-article/name-of-article Wouldn't it be better to have /news/name-of-article Thanks a lot!! 🙂
Technical SEO | | National-Homebuyers0 -
Changing all urls
A client of mine has a wordpress website that is installed in a directory, called "site". So when you go to www.domain.com you are redirected to www.domain.com/site. We all know how bad it is to have a redirect fron your subdomain to another page. In this case I measured a loss of 5 points of page authority. The question is: what is the best practice to remove the "site" from the address and changing all the urls? Should I use the webmaster tool to tell to Google that the site is moving? It's not 100% true, cause the site is just moving one level up. Should I install a copy of the website under www.domain.com and just redirect 301 every old page to its new url? This way I think the site would be deindexet for 2/3 months. Any suggestions or tips welcome! Thanks DoMiSol
Technical SEO | | DoMiSoL0 -
How can I best find out which URLs from large sitemaps aren't indexed?
I have about a dozen sitemaps with a total of just over 300,000 urls in them. These have been carefully created to only select the content that I feel is above a certain threshold. However, Google says they have only indexed 230,000 of these urls. Now I'm wondering, how can I best go about working out which URLs they haven't indexed? No errors are showing in WMT related to these pages. I can obviously manually start hitting it, but surely there's a better way?
Technical SEO | | rango0 -
Exclude Child URLs from XML Sitemap Generator (Wordpress)
Hi all, I was recommended the XML Sitemap Generator for Wordpress by the very helpful Keith Bloemendaal and John Pring - however I can't seem to exclude child URLs. There is a section Exclude items and a subsection Exclude posts. I have tried inputting the URLs for the pages I don't want in the sitemap, however that didn't work. So I read that you have to include a list of "IDs" - not sure where on earth to find that info, tried the page name and the post= number from the URL, however neither worked. I hope somebody can point me in the right direction - and apologies, I am a Wordpress novice, and I got no answers from the Wordpress forums so turned right back to SEOmoz! Cheers.
Technical SEO | | markadoi840 -
URLs: To Change or Not to Change
Hello, We recently launched a redesigned site in Drupal in December of last year. We are an eco-travel company. My current URL's look like this: /africa-and-middle-east/kenya-tanzania /central-south-america/galapagos-islands My pages have good term targeting grades, and the rankings for the terms we are targeting - "kenya and tanzania safaris" and "galapagos islands cruises" are decent, but not great - most are on page 2 or 3. The one URL where I targeted our most important term, "amazon river cruises," I am still on page 2. /central-south-america/amazon-river-cruises My questions are: Did I miss an opportunity with the rest of the URL's, and should I consider changing the rest to more targeted terms with 301s? Since the new site launched in January, perhaps I have not given enough time for my new URL's to index and mature. Would it be easier to set up landing pages with unique article content that targets terms such as "galapagos islands cruises" and "kenya and tanzania safaris"? If so, how can I do it in such a way as to not "compete" with the pages I want to drive them to? This also raises the question of redirecting the same URL twice i.e. I would have 2 redirects in place for the same url e.g. from the former site to the new site, and yet another redirect to the most-recent URL. Is that a problem? Sorry if I've asked too many questions in one post. 😉 Any advice appreciated.
Technical SEO | | csmithal0