Clarification on indexation of XML sitemaps within Webmaster Tools
-
Hi Mozzers,
I have a large service based website, which seems to be losing pages within Google's index. Whilst working on the site, I noticed that there are a number of xml sitemaps for each of the services. So I submitted them to webmaster tools last Friday (14th) and when I left they were "pending".
On returning to the office today, they all appear to have been successfully processed on either the 15th or 17th and I can see the following data:
13/08 - Submitted=0 Indexed=0
14/08 - Submitted=606,733 Indexed=122,243
15/08 - Submitted=606,733 Indexed=494,651
16/08 - Submitted=606,733 Indexed=517,527
17/08 - Submitted=606,733 Indexed=517,498Question 1: The indexed pages on 14th of 122,243 - Is this how many pages were previously indexed? Before Google processed the sitemaps? As they were not marked processed until 15th and 17th?
Question 2: The indexed pages are already slipping, I'm working on fixing the site by reducing pages and improving internal structure and content, which I'm hoping will fix the crawling issue. But how often will Google crawl these XML sitemaps?
Thanks in advance for any help.
-
Hi again
This means that because you have multiple sitemaps, Google is going to crawl those at different times possibly and at different rates, hence some of your sitemaps taking a day longer. I really wouldn't look into it too much, and just be assured that Google is crawling your sitemaps fine and indexing.
If you notice major discrepancies in what you submitted and what's being indexed, then I would refer to this Google resource on how to fix issues or errors you find in your sitemap crawl.
Hope this helps! Good luck!
-
Hi there
You submitted on the 13th, there were 0 pages indexed. The next day there were 122,243, so in that time period, Google indexed 122,243 of your site's pages.
This is a day by day process. So whatever new number appears on each day, subtract the previous day's number from your present day number, and that's how many pages were freshly indexed.
Hope this helps! Good luck!
-
Just checked webmaster tools again, and now they (sitemaps) all say processed on 17th and some say 18th (today) does this mean the sitemaps are being processed by Google every couple of days?
-
Hi Patrick,
Thanks for elaborating on question 2.
Question 1, I asked if the number (122,243) was how many pages were in the index** before** google processed the sitemaps, as they don't appear to have been processed until the following day.
You answered, yes but then said its how many pages were processed that day?
Thanks again for your time and clarification.
-
Hi there
Question 1 - Yes, this is how many pages Google indexed from your sitemap on that day.
Question 2 - XML sitemaps allow you to tell Google the change frequency of your URLs - you can learn more about that here. Also, according to Google:
Google's spiders regularly crawl the web to rebuild our index. Crawls are based on many factors such as PageRank, links to a page, and crawling constraints such as the number of parameters in a URL. Any number of factors can affect the crawl frequency of individual sites.
Our crawl process is algorithmic; computer programs determine which sites to crawl, how often, and how many pages to fetch from each site. We don't accept payment to crawl a site more frequently. For tips on maintaining a crawler-friendly website, please visit our Webmaster Guidelines.
Please let me know if you have any further questions or comments.
Hope this helps! Good luck!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Image sitemap
I work on a big eCommerce site with thousands of pages. We are talking about crating a separate image sitemap. Any idea example of an eCommerce site who has a separate image sitemap? I looked several and cant find one. Also, what are the best practices for creating a good image sitemap? thanks!
Technical SEO | | bizuH0 -
Sitemap
I have a question for the links in a sitemap. Wordpress works with a sitemap that first link to the different kind of pages: pagesitemap.xml categorysitemap.xml productsitemap.xml etc. etc. These links on the first page are clickable. We have a website that also links to the different pages but it's not clickable, just a flat link. Is this an issue?
Technical SEO | | Happy-SEO0 -
Why are my webpages not getting indexed?
I want to figure out why a lot of my pages for my website are not getting indexed by google. I have installed the SEO plugin by Yoast to my wordpress website. Under the titles and meta section of the plugin options I have set categories and tags to noindex. In WMT, google is saying that all my category pages and most of my tag pages are not being indexed. I want to make sure that the reason these pages are not being indexed are because of the SEO plugin. I want to prevent duplicate content so that is the reason I have set my categories and tags to noindex. Please respond if you know the absolute answer, its very important that I have my website indexed the proper way I want it to.
Technical SEO | | Dino640 -
Homepage indexation issue
Hello all, I've been scratching my head about this one for a while now... Let me explain the situation. I'm working on a multi-lingual website. Visitors are redirected (301) when they visit the homepage to the correct domain.com/en/default.html, domain.com/nl/default.html, domain.com/fr/default.html or domain.com/de/default.html based on browser language. I have doubts about the impact on the ability for Google to index the website because of that, but that's a problem for another day. The problem I'm having right now, is that domain.com/nl/default.html, domain.com/de/default.html and domain.com/fr/default.html are all indexed. When I search for the URL in Google I get the correct page on number one so I'm pretty sure those are indexed correctly. When I search for domain/en/default.html though, the homepage appears without /en/default.html extension. Does this mean Google assumes the domain.com page is the same as domain.com/en/default.html even though the redirect that's in place? Would be great if someone could shed some light on this. Thanks in advance!
Technical SEO | | buiserik0 -
Removing a staging area/dev area thats been indexed via GWT (since wasnt hidden) from the index
Hi, If you set up a brand new GWT account for a subdomain, where the dev area is located (separate from the main GWT account for the main live site) and remove all pages via the remove tool (by leaving the page field blank) will this definately not risk hurting/removing the main site (since the new subdomain specific gwt account doesn't apply to the main site in any way) ?? I have a new client who's dev area has been indexed, dev team has now prevented crawling of this subdomain but the 'the stable door was shut after the horse had already bolted' and the subdomains pages are on G's index so we need to remove the entire subdomain development area asap. So we are going to do this via the remove tool in a subdomain specific new gwt account, but I just want to triple check this wont accidentally get main site removed too ?? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Webmaster Tools Server Error
We recently did a build to our site and after the build the build one of the softwares that we are using changed. This caused our server errors to go into the thousands. right now google webmaster tools gave us a list of top 1,000 pages with errors and we fixed them all is there a way to see the rest of the errors?
Technical SEO | | DoRM0 -
Disappeared from Google with in 2 hours of webmaster tools error
Hey Guys I'm trying not to panic but....we had a problem with google indexing some of our secure pages then hit those pages and browsers firing up security warning, so I asked our web dev to have at look at it he made the below changes and within 2 hours the site has drop off the face of google “in web master tools I asked it to remove any https://freestylextreme.com URLs” “I cancelled that before it was processed” “I then setup the robots.txt to respond with a disallow all if the request was for an https URL” “I've now removed robots.txt completely” “and resubmitted the main site from web master tools” I've read a couple of blog posts and all say to remain clam , test the fetch bot on webmasters tools which is all good and just wait for google to reindex do you guys have any further advice ? Ben
Technical SEO | | elbeno1