Google indexing less url's then containded in my sitemap.xml
-
My sitemap.xml contains 3821 urls but Google (webmaster tools) indexes only 1544 urls. What may be the cause? There is no technical problem. Why does Google index less URLs then contained in my sitemap.xml?
-
Thank you for helping
-
Unless you have a SEO actively reviewing your site, it is quite normal for Google to index less pages then are offered in your sitemap.
How exactly was your sitemap created? Did you go by hand through your site's 3281 pages and add them to a sitemap? Or more likely, did you use a tool to create the sitemap? If you used a tool, how much knowledge do you have regarding how this tool works or its settings?
Just a few examples of URLs which may be included in your sitemap that Google would likely not index:
-
Your home page and other pages may have multiple URLs which lead to the same page. For example: www.mysite.com and www.mysite.com/index.html may be two URLs for the same page. Google will likely only index one of them.
-
You may have links to various URLs which contain parameters which Google will reduce to a single URL. For example: www.mysite.com/product_id=308&sort=asc&color=black, and another URL www.mysite.com/product_id=308&sort=desc&color=black. Both URLs lead to the same content sorted differently.
-
You may have duplicate content on your site. For example, you can sell chairs and list the same chair under multiple paths such as /furniture/wood/chair123 and /furniture/dining-room/chair123. Google will recognize these two pages are the same content presented under multiple URLs.
-
You may have submitted pages to your sitemap which are blocked via robots.txt or the "noindex" tag or are canonicalized to another page.
In order to better understand the root issue you need to examine a list of all URLs in your sitemap and compare that to a list of all indexed URLs. Determine which URLs Google has not indexed and research the reason for each one independently.
-
-
Are they index worthy?
Having them on your sitemap does not mean google wants them in its index
-
He just said it. Is this a new domain? Im in the same boat as you for some of my domains.
-
Yes, I understand this. But In this situation Google first indexes all the URL's within my sitemap.xml uploaded in Google Webmaster tools. Now Google indexes less URL's, only 50%. What can be the cause if there are no technical problems?
-
Hi!
Google will only spend 'so much time' on any new domain. The more traffic and links and page authority you get, the more time Google will dedicate to crawling your website. You should also make sure that the site is not slow, as this will reduce the crawling speed even more! See Google page speed for tips on speeding up the load time of your site
Good Luck,
Sven Witteveen
Expand Online
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Search Console rejecting XML sitemap files as HTML files, despite them being XML
Hi Moz folks, We have launched an international site that uses subdirectories for regions and have had trouble getting pages outside of USA and Canada indexed. Google Search Console accounts have finally been verified, so we can submit the correct regional sitemap to the relevant search console account. However, when submitting non-USA and CA sitemap files (e.g. AU, NZ, UK), we are receiving a submission error that states, "Your Sitemap appears to be an HTML page," despite them being .xml files, e.g. http://www.t2tea.com/en/au/sitemap1_en_AU.xml. Queries on this suggest it's a W3 Cache plugin problem, but we aren't using Wordpress; the site is running on Demandware. Can anyone guide us on why Google Search Console is rejecting these sitemap files? Page indexation is a real issue. Many thanks in advance!
Technical SEO | | SearchDeploy0 -
IT's Hurt My Rank?HELP!!!
hi,guys,john here, i just began use the MOZ service several days ago, recently i noticed one thing that one keyword on the first google search result page, but when i done some external links,the rank down from 1 to 8, i think may be the bad quality external links caused the rank down. so my question,should i delete the bad quality links or build more better quality links? which is better for me. easy to delete the bad links and hard to build high quality links. so what's your better opinion,guys? thanks John
Technical SEO | | smokstore0 -
What's wrong with this robots.txt
Hi. really struggling with the robots.txt file
Technical SEO | | Leonie-Kramer
this is it: User-agent: *
Disallow: /product/ #old sitemap
Disallow: /media/name.xml When testing in w3c.org everything looks good, testing is okay, but when uploading it to the server, Google webmaster tools gives 3 errors. Checked it with my collegue we both don't know what's wrong. Can someone take a look at this and give me the solution.
Thanx in advance! Leonie1 -
Why wont google Index this page?
A week ago i accidentally changed this page settings in my CMS to "disable & dont index" as i was going to replace this page with another, but this didnt happen, but i forgot to switch the settings back! http://www.over50choices.co.uk/funeral-planning/funeral-plans Anyhow in an effort to get it back up quickly i submitted in GWTs but its still not indexed. When i use several SEO on page checking tools it has the Meta Title data as "Form" and not the correct title. Any ideas please? Yours frustrated Ash
Technical SEO | | AshShep10 -
Homepage no longer indexed in Google
Have been working on a site and the hompage has recently vanished from Google. I submit the site to Google webmaster tools a couple of days ago and checked today and the homepage has vanished. There are no no follow tags, and no robots.txt stopping the page from being crawled. It's a bit of a worry, the site is http://www.beyondthedeal.com
Technical SEO | | tonysandwich
Any insights would be massively appreciated! Thanks.0 -
301ing 404's
Hey guys, I am currently in the process of redirecting some of my 404 pages to pages like my home page. Before I do that, I am assessing the link value of the 404 pages. My question is what do you do with the 404 pages which appear to have low quality links, do you really want to redirect them to an important page on your site? What should I do with these 404 pages? CheersAdam
Technical SEO | | Adamshowbiz0 -
Single URL not indexed
Hi everyone! Some days ago, I noticed that one of our URLs (http://www.access.de/karriereplanung/webinare) is no longer in the Google index. We never had any form of penalty, link warning etc. Our traffic by Google is constantly growing every month. This single page does not have an external link pointing to it - only internal links. The page has been indexed all the time. The HTTP status code is 200, there is no noindex or something in the code. I submitted the URL on GWMT to let Google send it to the index. It was crawled successfully by Google, sent to the index 5 days ago - nothing happened, still not indexed. Do you have any suggestions why this page is no longer indexed? It is well linked internally and one click away from the home page. There is still the PR of 5 showing, I always thought that pages with PR are indexed.......
Technical SEO | | accessKellyOCG0 -
Web page is showing up on Google but doesn't show when it was cached, so is it indexed?
Hey everyone So I created a new page on a WordPress website, it was live for a few hours till I changed my mind & switched it back to a draft. Just out of curiosity I did the Site:www.example.com/Example search on Google to see if it had been indexed & apparently it had but when I click on cached to see what time it got indexed at exactly it's showing me an error. So does this mean it is indexed or not?
Technical SEO | | conversiontactics0