Only half of the sitemap is indexed
-
I have a website with high domain authority and high quality content and blog. I've resubmitted the sitemap half a dozen times. Search console getr half way through and then stops. Does anyone know any reason for this?
I've seen the usual responses of 'google is not obligated to crawl you' but this site has been fully crawled in the past. It's very odd
Does anyone have any ideas why it might stop half way - or does anyone know a testing tool that might illuminate the situation?
-
Hi Andrew
Here a few things to check or rule out:
-
Are those pages accessible to be crawled (not blocked with robots.txt etc)
-
Are they also internally linked? (ie;s crawl with Screaming Frog, starting at the homepage and see if they turn up)
-
Is the page actually indexed (search the URL in Google) but just not showing up in Search Console?
-
How long are you waiting before resubmitting - also does it literally get half way down the list, or do you mean 50% are not indexed?
Overall, I would just submit the sitemap and you don't need to keep resubmitting. I would rather do some crosschecks to make sure the URL is accessible (crawlable) and even maybe indexed already, just not showing in the report. Usually, there's some other issue with the URL besides a sitemap issue - and like you mentioned, I'm not sure how long you're waiting, but it can indeed take weeks for them to show up.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Should we include URLs with parameters in the sitemap?
Hi, I wanted to know whether we can include URLs with search parameters in the sitemap. Currently, we are trying to append structured data for our job listing page. There happens to be a large number of job listings around 1000 pages with unique job-id and location. Should we add these pages in the sitemap or is there any other solution to this? Regards, Tejas
Algorithm Updates | | tejasbansode0 -
Non-indexed or indexed top hierarchy pages get high PageRank at Google?
Hi, We are creating some pages just to capture leads from blog-posts. We created few pages at top hierarchy like website.com/new-page/. I'm just wondering if these pages will take away more PageRank. Do we need to create these pages at low hierarchy like website.com/folder/new-page to avoid passing more PageRank? Is this is how PR distributed even now and it's same for indexed or non-indexed pages? Thanks
Algorithm Updates | | vtmoz0 -
Indexed, though blocked by robots.txt: Need to bother?
Hi, We have intentionally blocked some of the website files which were indexed for years. Now we receive a message "Indexed, though blocked by robots.txt" in GSC. We can ignore as per my knowledge? Are any actions required about this? We thought of blocking them with meta tags but these are PDF files. Thanks
Algorithm Updates | | vtmoz1 -
If we have all products on-site for indexing, do we get dinged by Google for not transacting on-site?
I am trying to do research on the SEO impact of having an off-site transactional website. For example, Pepsi.com lists all product information on their site but guides visitors to transact on Amazon or Walmart. What impact, if any, does guiding the customer to a separate transactional site have on SEO? In short, if we have all products on-site for indexing, do we get dinged by Google for not transacting on-site?
Algorithm Updates | | KaylaV0 -
Does Bing Support same sitemap for full site, mobile, and images?
We have 1 sitemap for our desktop site, mobile site, and images. This works for Google, but I'm not sure if it's supported by Bing or if they require separate sitemaps. Anyone know?
Algorithm Updates | | YairSpolter0 -
Has Google problems in indexing pages that use <base href=""> the last days?
Since a couple of days I have the problem, that Google Webmaster tools are showing a lot more 404 Errors than normal. If I go thru the list I find very strange URLs that look like two paths put together. For example: http://www.domain.de/languages/languageschools/havanna/languages/languageschools/london/london.htm If I check on which page Google found that path it is showing me the following URL: http://www.domain.de/languages/languageschools/havanna/spanishcourse.htm If I check the source code of the Page for the Link leading to the London Page it looks like the following: [...](languages/languageschools/london/london.htm) So to me it looks like Google is ignoring the <base href="..."> and putting the path together as following: Part 1) http://www.domain.de/laguages/languageschools/havanna/ instead of base href Part 2) languages/languageschools/london/london.htm Result is the wrong path! http://www.domain.de/languages/languageschools/havanna/languages/languageschools/london/london.htm I know finding a solution is not difficult, I can use absolute paths instead of relative ones. But: - Does anyone make the same experience? - Do you know other reasons which could cause such a problem? P.s.: I am quite sure that the CMS (Typo3) is not generating these paths randomly. I would like to be sure before we change the CMS's Settings to absolute paths!
Algorithm Updates | | SimCaffe0 -
Big site SEO: To maintain html sitemaps, or scrap them in the era of xml?
We have dynamically updated xml sitemaps which we feed to Google et al. Our xml sitemap is updated constantly, and takes minimal hands on management to maintain. However we still have an html version (which we link to from our homepage), a legacy from back in the pre-xml days. As this html version is static we're finding it contains a lot of broken links and is not of much use to anyone. So my question is this - does Google (or any other search engine) still need both, or are xml sitemaps enough?
Algorithm Updates | | linklater0