Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How important are sitemap errors?
-
If there aren't any crawling / indexing issues with your site, how important do thing sitemap errors are? Do you work to always fix all errors?
I know here: http://www.seomoz.org/blog/bings-duane-forrester-on-webmaster-tools-metrics-and-sitemap-quality-thresholds
Duane Forrester mentions that sites with many 302's 301's will be punished--does any one know Googe's take on this?
-
Very important. Particularly if you have a large site. We operate a large site with 100,000's of pages and as Dan said it can be difficult to maintain. We use something called Unlimited XML Sitemap Generator which builds XML sitemaps for us automatically. I'd highly recommend it although it takes a bit of fiddling with to get it up and running as it's software which sits on site. We couldn't manage without it as we'd be forever on sitemaps.
We found that getting sitemaps right on a large site made a huge difference to the crawl rate that we encountered in GWT and a huge indexation to follow.
In particular check for 302's. I made the mistake of leaving those for a while and am sure that we suffered from some loss of link equity along the way.
Hope it helps
Dawn
-
Your sitemap should only list pages that actually exist.
If you delete some pages, then you need to rebuild the sitemap.
Ditto if you delete them and redirect.
Google is always lagging, so if you delete 10 pages and then update the sitemap, even if google downloads the sitemap immediately, they will still be running crawls on the old map, and they may be crawling the now-missing pages, but haven't shown the failures in your WMT yet.
If you update your sitemap quickly, it is possible they will never crawl the missing pages and get a 404 or 301.
(but of course, there could be other sites pointing to the now-missing pages, and the 404s will show up elsewhere as missing)
I am always checking, adding, deleting and redirecting pages, and I update the current sitemap every hour and all the others are rebuilt at midnight every night. I usually do deletions just before midnight if I can, to minimize the time the sitemap is out of sync.
-
As far as I know Google is more lenient with sitemap errors, but I would still recommend looking into it. The first step would be to be sure your sitemap is up to date to begin with - and has all the URLs you want (and not any you don't want). The main thing is none of them should 404 and then beyond that, yes, they should return 200's.
Unless you're dealing with a gigantic site which might be hard to maintain, in theory there shouldn't be errors in sitemaps if you have the correct URLs in there.
Even better, if you're running WordPress the Yoast SEO plugin will generate an XML sitemap for you and it update automatically.
Hope that helps!
-Dan
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Which search engines should we submit our sitemap to?
Other than Google and Bing, which search engines should we submit our sitemap to?
Intermediate & Advanced SEO | | NicheSocial0 -
Href lang in image or video XML sitemaps
Does anyone know if it is possible/recommended/not recommended to use href lang in image or video XML sitemaps? This had not crossed my mind until recently, but a client asked me this question and I couldn't find any information on this topic.
Intermediate & Advanced SEO | | ChrisKing0 -
Substantial difference between Number of Indexed Pages and Sitemap Pages
Hey there, I am doing a website audit at the moment. I've notices substantial differences in the number of pages indexed (search console), the number of pages in the sitemap and the number I am getting when I crawl the page with screamingfrog (see below). Would those discrepancies concern you? The website and its rankings seems fine otherwise. Total indexed: 2,360 (Search Consule)
Intermediate & Advanced SEO | | Online-Marketing-Guy
About 2,920 results (Google search "site:example.com")
Sitemap: 1,229 URLs
Screemingfrog Spider: 1,352 URLs Cheers,
Jochen0 -
Not found errors (404) due to being hacked
Hi Moz Guru's Our website was hacked a few months ago, since then we have taken various measures, last one being redesigning the website all together and removing it from a WordPress platform. So far all is going well, except that the 404 not found errors keeps coming up in Google Webmaster tools. The URLs are spam pages that were created by the virus. And these spam pages have been indexed by Google, and now we are struggling to get rid of them. Is there any way we can deal with these 404 spam pages links? Is marking all of them as fixed in the webmaster tools - search console- crawl errors helpful in any way? Can this have a negative impact on the SEO ? Looking forward to your answers. Many thanks.
Intermediate & Advanced SEO | | monicapopa0 -
Hreflang in vs. sitemap?
Hi all, I decided to identify alternate language pages of my site via sitemap to save our development team some time. I also like the idea of having leaner markup. However, my site has many alternate language and country page variations, so after creating a sitemap that includes mostly tier 1 and tier 2 level URLs, i now have a sitemap file that's 17mb. I did a couple google searches to see is sitemap file size can ever be an issue and found a discussion or two that suggested keeping the size small and a really old article that recommended keeping it < 10mb. Does the sitemap file size matter? GWT has verified the sitemap and appears to be indexing the URLs fine. Are there any particular benefits to specifying alternate versions of a URL in vs. sitemap? Thanks, -Eugene
Intermediate & Advanced SEO | | eugene_bgb0 -
Google Not Indexing XML Sitemap Images
Hi Mozzers, We are having an issue with our XML sitemap images not being indexed. The site has over 39,000 pages and 17,500 images submitted in GWT. If you take a look at the attached screenshot, 'GWT Images - Not Indexed', you can see that the majority of the pages are being indexed - but none of the images are. The first thing you should know about the images is that they are hosted on a content delivery network (CDN), rather than on the site itself. However, Google advice suggests hosting on a CDN is fine - see second screenshot, 'Google CDN Advice'. That advice says to either (i) ensure the hosting site is verified in GWT or (ii) submit in robots.txt. As we can't verify the hosting site in GWT, we had opted to submit via robots.txt. There are 3 sitemap indexes: 1) http://www.greenplantswap.co.uk/sitemap_index.xml, 2) http://www.greenplantswap.co.uk/sitemap/plant_genera/listings.xml and 3) http://www.greenplantswap.co.uk/sitemap/plant_genera/plants.xml. Each sitemap index is split up into often hundreds or thousands of smaller XML sitemaps. This is necessary due to the size of the site and how we have decided to pull URLs in. Essentially, if we did it another way, it may have involved some of the sitemaps being massive and thus taking upwards of a minute to load. To give you an idea of what is being submitted to Google in one of the sitemaps, please see view-source:http://www.greenplantswap.co.uk/sitemap/plant_genera/4/listings.xml?page=1. Originally, the images were SSL, so we decided to reverted to non-SSL URLs as that was an easy change. But over a week later, that seems to have had no impact. The image URLs are ugly... but should this prevent them from being indexed? The strange thing is that a very small number of images have been indexed - see http://goo.gl/P8GMn. I don't know if this is an anomaly or whether it suggests no issue with how the images have been set up - thus, there may be another issue. Sorry for the long message but I would be extremely grateful for any insight into this. I have tried to offer as much information as I can, however please do let me know if this is not enough. Thank you for taking the time to read and help. Regards, Mark Oz6HzKO rYD3ICZ
Intermediate & Advanced SEO | | edlondon0 -
Recovering from robots.txt error
Hello, A client of mine is going through a bit of a crisis. A developer (at their end) added Disallow: / to the robots.txt file. Luckily the SEOMoz crawl ran a couple of days after this happened and alerted me to the error. The robots.txt file was quickly updated but the client has found the vast majority of their rankings have gone. It took a further 5 days for GWMT to file that the robots.txt file had been updated and since then we have "Fetched as Google" and "Submitted URL and linked pages" in GWMT. In GWMT it is still showing that that vast majority of pages are blocked in the "Blocked URLs" section, although the robots.txt file below it is now ok. I guess what I want to ask is: What else is there that we can do to recover these rankings quickly? What time scales can we expect for recovery? More importantly has anyone had any experience with this sort of situation and is full recovery normal? Thanks in advance!
Intermediate & Advanced SEO | | RikkiD220 -
Include Cross Domain Canonical URL's in Sitemap - Yes or No?
I have several sites that have cross domain canonical tags setup on similar pages. I am unsure if these pages that are canonicalized to a different domain should be included in the sitemap. My first thought is no, because I should only include pages in the sitemap that I want indexed. On the other hand, if I include ALL pages on my site in the sitemap, once Google gets to a page that has a cross domain canonical tag, I'm assuming it will just note that and determine if the canonicalized page is the better version. I have yet to see any errors in GWT about this. I have seen errors where I included a 301 redirect in my sitemap file. I suspect its ok, but to me, it seems that Google would rather not find these URL's in a sitemap, have to crawl them time and time again to determine if they are the best page, even though I'm indicating that this page has a similar page that I'd rather have indexed.
Intermediate & Advanced SEO | | WEB-IRS0