Indexing/Sitemap - I must be wrong
-
Hi All,
I would guess that a great number of us new to SEO (or not) share some simple beliefs in relation to Google indexing and Sitemaps, and as such get confused by what Web master tools shows us.
It would be great if somone with experience/knowledge could clear this up for once and all
Common beliefs:
-
Google will crawl your site from the top down, following each link and recursively repeating the process until it bottoms out/becomes cyclic.
-
A Sitemap can be provided that outlines the definitive structure of the site, and is especially useful for links that may not be easily discovered via crawling.
-
In Google’s webmaster tools in the sitemap section the number of pages indexed shows the number of pages in your sitemap that Google considers to be worthwhile indexing.
-
If you place a rel="canonical" tag on every page pointing to the definitive version you will avoid duplicate content and aid Google in its indexing endeavour.
These preconceptions seem fair, but must be flawed.
Our site has 1,417 pages as listed in our Sitemap. Google’s tools tell us there are no issues with this sitemap but a mere 44 are indexed! We submit 2,716 images (because we create all our own images for products) and a disappointing zero are indexed.
Under Health->Index status in WM tools, we apparently have 4,169 pages indexed. I tend to assume these are old pages that now yield a 404 if they are visited.
It could be that Google’s Indexed quotient of 44 could mean “Pages indexed by virtue of your sitemap, i.e. we didn’t find them by crawling – so thanks for that”, but despite trawling through Google’s help, I don’t really get that feeling.
This is basic stuff, but I suspect a great number of us struggle to understand the disparity between our expectations and what WM Tools yields, and we go on to either ignore an important problem, or waste time on non-issues.
Can anyone shine a light on this for once and all?
If you are interested, our map looks like this :
http://www.1010direct.com/Sitemap.xml
Many thanks
Paul
-
-
44 relates to the number of pages with the same urls as in your sitemap - it is not everything that is index. Your old site is still indexed and being found, as Google visits those pages and gets redirected to a new page it is likely that number will increase (from 44) and the number of old indexed will decrease.
Google doesn't index sites on a one-off go around because then if may take say 4 months to come back and index again and if you've a new important page that gets lots of links and you don't get indexed and ranked for it because you've not been visited you wouldn't be happy. Also if this was done on every site it would take forever and take much more resources than even google has. it is annoying but you've just got to grin and bear it - at least you old site is still ranking and being found.
-
Thanks Andy,
What I dont get, is why Google would index in this way. I can understand why they would weight the importance of a page based on the number/strength of incoming links but not the decision to index it at all when lead in by a sitemap.
I just get a little frustrated when Google offers you seemingly definitive stats only to find they are so vague and mysterious they have little to no value. We should have 1400+ pages indexed, we clearly have more than 44 indexed ... what on earth does the number 44 relate to?
-
I think that as your sitemap reflect your new urls and this is what the index is based on you are likely to have more indexed from what you say. I would suggest going to "indexed status" under health of GWT and click total index and ever crawled, this may help clear this up.
-
I experienced this issue with sandboxed websites.
Market your products and in a few months every page should be in Google's index.
Cheers.
-
Thanks for the quick responses.
We had a bit of a URL reshuffle recently to make them a little more informative and to prevent each page URL terminating with "product.aspx". But that was around a month ago. Prior to that, we were around 40% indexed for pages (from the sitemap section of WM tools), and always zero for images.
So given that we clearly have more than 44 pages indexed by Google, what do you think that figure actually means?
-
dealing with your indexing issue first - depending on when you submitted depends how soon those pages may be indexed. I say "may" because a sitemap (yes answering another question) is just an indicator of "i have these pages" it does not mean they will be indexed - indeed unless you've a small website you will never have 100% indexation in my experience.
Spiders (search robots) index / visit a website / page via another link. They follow links to a page from around the web, or the site itself. The more links from around the web the quicker you will get indexed. (this explains why if you've 10,000 pages you won't ever get a link from other websites to them all and so they won't all get indexed). This means if you've a web page that gets a ton of links it will be indexed sooner than those with just 1 link - assuming all links are equal (which they aren't).
Spiders are not cyclic in their searching, it's very ad-hoc based on links in your site and other sites linking to you. A spider won't be sent to spider every page on your site - it will do a small amount at a time, this is likely why 44 pages are indexed and not more at this point.
A sitemap is (as i say) an indicator of pages in your site, the importance of them and when they were updated / created. it's not really a definitive structure - it's more of a reference guide. Think of it as you being the guide on a bus tour of a city, the search engine is your passenger you are pointing out places of interest and every so often it will see something it wan't to see and get off to look, but it may take many trips to get off at every stop.
Finally, Canonicals are a great way to clear up duplicate content issues. They aren't 100% successful but they do help - especially if you are using dynamic urls (such as paginating category pages).
hope that helps
-
I see your frustration, how long ago did you submit these site maps? Are we talking a couple of weeks or a couple of days/ a day? As I've seen myself, Google is not that fast at calculating the nr of pages indexed (definitely not within GWT). Mostly within a couple of days/ within a week Google largely increased the nr of pages indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What are we doing wrong?
So, we run seo tests via moz, gtmetrix and other sites. On many keywords (NSFW) we appear around 3-7 on the first page. Our biggest issue is that we are following all the rules moz etc say and we don't move but, but the teo tops sites fail on almost everything we are told to do..... Why? It confuses us. What are we doing wrong? http://5tw2.co.uk/temp/comment1.PNG
Intermediate & Advanced SEO | | 5TW0 -
Why Is this page de-indexed?
I have dropped out for all my first page KWDs for this page https://www.key.co.uk/en/key/dollies-load-movers-door-skates Can anyone see an issue? I am trying to find one.... We did just migrate to HTTPS but other areas have no problem
Intermediate & Advanced SEO | | BeckyKey0 -
I've got duplicate pages. For example, blog/page/2 is the same as author/admin/page/2\. Is this something I should just ignore, or should I create the author/admin/page2 and then 301 redirect?
I'm going through the crawl report and it says I've got duplicate pages. For example, blog/page/2 is the same as author/admin/page/2/ Now, the author/admin/page/2 I can't even find in WordPress, but it is the same thing as blog/page/2 nonetheless. Is this something I should just ignore, or should I create the author/admin/page2 and then 301 redirect it to blog/page/2?
Intermediate & Advanced SEO | | shift-inc0 -
Only the mobile version of the site is being indexed
We've got an interesting situation going on at the moment where a recently on-boarded clients site is being indexed and displayed, but it's on the mobile version of the site that is showing in serps. A quick rundown of the situation. Retail shopping center with approximately 200 URLS Mobile version of the site is www.mydomain.com/m/ XML sitemap submitted to Google with 202 URLs, 3 URLS indexed Doing site:www.mydomain.com in a Google search brings up the home page (desktop version) and then everything else is /m/ versions. There is no rel="canonical" on mobile site pages to their desktop counterpart (working on fixing that) We have limited CMS access, but developers are open to working with us on whatever is needed. Within desktop site source code, there are no "noindex, nofollow, etc" issues on the pages. No manual actions, link issues, etc Has anyone ever encoutnered this before? Any input or thoughts are appreciated. Thanks
Intermediate & Advanced SEO | | GregWalt0 -
Google News sitemap keywords
My company is a Theater news and reviews site. We're building a google news sitemap and Google suggests some recommended keywords we can use with their <keywords>tag: https://support.google.com/news/publisher/answer/116037</keywords> Our writers also tag their stories with relevant keywords. What should we populate the <keywords>tag with?</keywords> We were thinking we'd automatically populate it with author-added tags, in addition to one or more of the recommended ones suggested by Google, such as Theater, Arts, and Culture (all of our articles are related to these topics). Finally, many of our articles are about say, celebrities. An author may tag an article with 'Bryan Cranston,' and when this is the case we're considering also tagging it with the 'Celebrities' tag. Are all or any of these worthwhile?
Intermediate & Advanced SEO | | TheaterMania0 -
Why do some sites have several types of sitemap?
Hello Mozzers, I often seem to work on websites with several types of sitemaps - e.g. an html sitemap - an xml sitemap - almost always with identical structure and content. Does anybody know the thinking behind this? Currently looking at site with php and xml sitemap sitting alongside one another. I'm guessing one is for site users to read (and also to aid indexing) and the other for search engines, to further aid indexing. Does Google have any preferences? Is there anything you should be wary of re: Google, if there are multiple sitemaps?
Intermediate & Advanced SEO | | McTaggart0 -
Mobile Sitemaps
We are planning on creating a mobile site using a different URL. Our current sitemap creator won't dynamically detect mobile pages using the rel="alternate" tag but can can create a Project for that domain in Sitemap Creator and use the "mobile" option when you export it. The Sitemap Creator will then insert the mobile:mobilecontent tag for all the URLs in that sitemap. </mobile:mobile> Is this okay or will it cause problems?
Intermediate & Advanced SEO | | theLotter0 -
Canonical / 301 Redundancy
Suppose I have two dynamic URLs that lead to the identical page: www.example.com/product.php?x=1&y=1 and www.example.com/product.php?y=1 The x=1 parameter had some historical meaning, but is now unused. All references to the x=1 parameter have been removed from internal links and sitemaps. I have implemented two solutions: First, the header of www.example.com/product.php?x=1&y=1 includes: Second, the .htaccess file includes the following: Redirect permanent /product.php?x=1&y=1 http://www.example.com/product.php?y=1 Questions: 1. I assume that since canonical is still relatively new, it's best to play it safe and implement both solutions. Is this correct? 2. When I point my browser to www.example.com/product.php?x=1&y=1, it does NOT redirect to www.example.com/product.php?y=1. The address bar continues to show the non-canonical URL. Is this because the canonical tag somehow takes precedence over the 301 redirect? 3. How long will Google Webmaster Tools continue to show these as duplicates, even though I've implemeted BOTH canonical and 301? It's been a few weeks and I thought it would have rolled off by now. Thanks!
Intermediate & Advanced SEO | | ahirai0