Google indexing less url's then containded in my sitemap.xml
-
My sitemap.xml contains 3821 urls but Google (webmaster tools) indexes only 1544 urls. What may be the cause? There is no technical problem. Why does Google index less URLs then contained in my sitemap.xml?
-
Thank you for helping
-
Unless you have a SEO actively reviewing your site, it is quite normal for Google to index less pages then are offered in your sitemap.
How exactly was your sitemap created? Did you go by hand through your site's 3281 pages and add them to a sitemap? Or more likely, did you use a tool to create the sitemap? If you used a tool, how much knowledge do you have regarding how this tool works or its settings?
Just a few examples of URLs which may be included in your sitemap that Google would likely not index:
-
Your home page and other pages may have multiple URLs which lead to the same page. For example: www.mysite.com and www.mysite.com/index.html may be two URLs for the same page. Google will likely only index one of them.
-
You may have links to various URLs which contain parameters which Google will reduce to a single URL. For example: www.mysite.com/product_id=308&sort=asc&color=black, and another URL www.mysite.com/product_id=308&sort=desc&color=black. Both URLs lead to the same content sorted differently.
-
You may have duplicate content on your site. For example, you can sell chairs and list the same chair under multiple paths such as /furniture/wood/chair123 and /furniture/dining-room/chair123. Google will recognize these two pages are the same content presented under multiple URLs.
-
You may have submitted pages to your sitemap which are blocked via robots.txt or the "noindex" tag or are canonicalized to another page.
In order to better understand the root issue you need to examine a list of all URLs in your sitemap and compare that to a list of all indexed URLs. Determine which URLs Google has not indexed and research the reason for each one independently.
-
-
Are they index worthy?
Having them on your sitemap does not mean google wants them in its index
-
He just said it. Is this a new domain? Im in the same boat as you for some of my domains.
-
Yes, I understand this. But In this situation Google first indexes all the URL's within my sitemap.xml uploaded in Google Webmaster tools. Now Google indexes less URL's, only 50%. What can be the cause if there are no technical problems?
-
Hi!
Google will only spend 'so much time' on any new domain. The more traffic and links and page authority you get, the more time Google will dedicate to crawling your website. You should also make sure that the site is not slow, as this will reduce the crawling speed even more! See Google page speed for tips on speeding up the load time of your site
Good Luck,
Sven Witteveen
Expand Online
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What IS SEO FRIENDLY BEST PRACTICE FOR URLS FILTERED 'TAGGED'
EX: https://www.STORENAME.com/collections/all-deals/alcatel– Tagged "Alcatel", when I run audits, I come across these URLS that give me duplicate content and missing H1. This is Canonical: https://www.STORENAMEcom/collections/all-deals/alcatel Any advice on how to tackle these I have about4k in my store! Thank you
Technical SEO | | Sscha0030 -
Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT
Good morning Moz... This is a weird one. It seems to be a "bug" with Google, honest... We migrated our site www.three-clearance.co.uk to a Drupal platform over the new year. The old site used URL-based tracking for heat map purposes, so for instance www.three-clearance.co.uk/apple-phones.html ..could be reached via www.three-clearance.co.uk/apple-phones.html?ref=menu or www.three-clearance.co.uk/apple-phones.html?ref=sidebar and so on. GWMT was told of the ref parameter and the canonical meta tag used to indicate our preference. As expected we encountered no duplicate content issues and everything was good. This is the chain of events: Site migrated to new platform following best practice, as far as I can attest to. Only known issue was that the verification for both google analytics (meta tag) and GWMT (HTML file) didn't transfer as expected so between relaunch on the 22nd Dec and the fix on 2nd Jan we have no GA data, and presumably there was a period where GWMT became unverified. URL structure and URIs were maintained 100% (which may be a problem, now) Yesterday I discovered 200-ish 'duplicate meta titles' and 'duplicate meta descriptions' in GWMT. Uh oh, thought I. Expand the report out and the duplicates are in fact ?ref= versions of the same root URL. Double uh oh, thought I. Run, not walk, to google and do some Fu: http://is.gd/yJ3U24 (9 versions of the same page, in the index, the only variation being the ?ref= URI) Checked BING and it has indexed each root URL once, as it should. Situation now: Site no longer uses ?ref= parameter, although of course there still exists some external backlinks that use it. This was intentional and happened when we migrated. I 'reset' the URL parameter in GWMT yesterday, given that there's no "delete" option. The "URLs monitored" count went from 900 to 0, but today is at over 1,000 (another wtf moment) I also resubmitted the XML sitemap and fetched 5 'hub' pages as Google, including the homepage and HTML site-map page. The ?ref= URls in the index have the disadvantage of actually working, given that we transferred the URL structure and of course the webserver just ignores the nonsense arguments and serves the page. So I assume Google assumes the pages still exist, and won't drop them from the index but will instead apply a dupe content penalty. Or maybe call us a spam farm. Who knows. Options that occurred to me (other than maybe making our canonical tags bold or locating a Google bug submission form 😄 ) include A) robots.txt-ing .?ref=. but to me this says "you can't see these pages", not "these pages don't exist", so isn't correct B) Hand-removing the URLs from the index through a page removal request per indexed URL C) Apply 301 to each indexed URL (hello BING dirty sitemap penalty) D) Post on SEOMoz because I genuinely can't understand this. Even if the gap in verification caused GWMT to forget that we had set ?ref= as a URL parameter, the parameter was no longer in use because the verification only went missing when we relaunched the site without this tracking. Google is seemingly 100% ignoring our canonical tags as well as the GWMT URL setting - I have no idea why and can't think of the best way to correct the situation. Do you? 🙂 Edited To Add: As of this morning the "edit/reset" buttons have disappeared from GWMT URL Parameters page, along with the option to add a new one. There's no messages explaining why and of course the Google help page doesn't mention disappearing buttons (it doesn't even explain what 'reset' does, or why there's no 'remove' option).
Technical SEO | | Tinhat0 -
What's the best URL Structure if my company is in multiple locations or cities?
I have read numerous intelligent, well informed responses to this question but have yet to hear a definitive answer from an authority. Here's the situation. Let's say I have a company who's URL is www.awesomecompany.com who provides one service called 'Awesome Service' This company has 20 franchises in the 20 largest US cities. They want a uniform online presence, meaning they want their design to remain consistent across all 20 domains. My question is this; what's the best domain or url structure for these 20 sites? Subdomain - dallas.awesomecompany.co Unique URL - www.dallasawesomecompany.com Directory - www.awesomecompany.com/dallas/ Here's my thoughts on this question but I'm really hoping someone b*tch slaps me and tells me I'm wrong: Of these three potential solutions these are how I would rank them and why: Subdomains Pros: Allows me to build an entire site so if my local site grows to 50+ pages, it's still easy to navigate Allows me to brand root domain and leverage brand trust of root domain (let's say the franchise is starbucks.com for instance) Cons: This subdomain is basically a brand new url in google's eyes and any link building will not benefit root domain. Directory Pros Fully leverages the root domain branding and fully allows for further branding If the domain is an authority site, ranking for sub pages will be achieved much quicker Cons While this is a great solution if you just want a simple map listing and contact info page for each of your 20 locations, what if each location want's their own "about us" page and their own "Awesome Service" page optimized for their respective City (i.e. Awesome Service in Dallas)? The Navigation and potentially the URL is going to start to get really confusing and cumbersome for the end user. Think about it, which is preferable?: dallas.awesomcompany.com/awesome-service/ www.awesomecompany.com/dallas/awesome-service (especially when www.awesomecompany.com/awesome-service/ already exists Unique URL Pros Potentially quicker rankings achieved than a subdomain if it's an exact match domain name (i.e. dallasawesomeservice.com) Cons Does not leverage the www.awesomecompany.com brand Could look like an imposter It is literally a brand new domain in Google's eyes so all SEO efforts would start from scratch Obviously what goes without saying is that all of these domains would need to have unique content on them to avoid duplicate content penalties. I'm very curious to hear what you all have to say.
Technical SEO | | BrianJGomez0 -
Does using Google Loader's ClientLocation API to serve different content based on region hurt SEO?
Does using Google Loader's ClientLocation API to serve different content based on region hurt SEO? Is there a better way to do what I'm trying to do?
Technical SEO | | Ocularis0 -
Why is Google not displaying the right URL on SERP?
Google is not displaying the URL correctly for this page. (See image) Here is the search that I performed: http://goo.gl/xk7L8 If you click on the URL, it doesn't take you to the page that it the URL references. Any ideas? It should show this URL: http://www.theskincentermd.com/breast-enhancement tsc-serp.png
Technical SEO | | theBREWROOM0 -
Any idea why our sitemap images aren't indexed?
Here's our sitemap: http://www.driftworks.com/shop/sitemap/dw_sitemap.xml In google webmaster tools, I can see the sitemap report and it says: Items:Web Submitted:2,798 Indexed:2,910 Items:Images Submitted:3,178 Indexed:0 Do you have any idea why our images are not being indexed according to webmaster tools? I checked a few of the image URLs and they worked nicely. Thanks in advance, J
Technical SEO | | DWJames0 -
How to display the exact url of our subsite in Google
Hi, I'm new to SEO and we just recently relaunched our site. Our site consist of 6 hotels that acts as a subsite. We noticed that when search for one of the hotels what is coming in the google is the main website. Example: We search for flora grand. We expect that in Google it will display the first link as www.florahospitality.com/dubai-flora-grand-hotel.aspx. But it show the main site which is www.florahospitality.com What do I miss here?
Technical SEO | | shebinhassan0 -
What tool do you use to check for URLs not indexed?
What is your favorite tool for getting a report of URLs that are not cached/indexed in Google & Bing for an entire site? Basically I want a list of URLs not cached in Google and a seperate list for Bing. Thanks, Mark
Technical SEO | | elephantseo3