Only half of the sitemap is indexed
-
I have a website with high domain authority and high quality content and blog. I've resubmitted the sitemap half a dozen times. Search console getr half way through and then stops. Does anyone know any reason for this?
I've seen the usual responses of 'google is not obligated to crawl you' but this site has been fully crawled in the past. It's very odd
Does anyone have any ideas why it might stop half way - or does anyone know a testing tool that might illuminate the situation?
-
Hi Andrew
Here a few things to check or rule out:
-
Are those pages accessible to be crawled (not blocked with robots.txt etc)
-
Are they also internally linked? (ie;s crawl with Screaming Frog, starting at the homepage and see if they turn up)
-
Is the page actually indexed (search the URL in Google) but just not showing up in Search Console?
-
How long are you waiting before resubmitting - also does it literally get half way down the list, or do you mean 50% are not indexed?
Overall, I would just submit the sitemap and you don't need to keep resubmitting. I would rather do some crosschecks to make sure the URL is accessible (crawlable) and even maybe indexed already, just not showing in the report. Usually, there's some other issue with the URL besides a sitemap issue - and like you mentioned, I'm not sure how long you're waiting, but it can indeed take weeks for them to show up.
-
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Website have Caching/Indexing / Ranking Issue
Hi, My Website (https://www.v3cars.com) is not cached or indexed on regular basic from last 15 days. before this it was cached or indexed on regular basic. We are uploading fresh content on daily basic. Currently my new content is not ranked anywhere in Google even after cached or indexed. Please help and suggest. Sandeep - Love to Cars
Algorithm Updates | | onlinesandeep0 -
Does using parent pages in WordPress help with SEO and/or indexing for SERPs?
I have a law office and we handle four different practice areas. I used to have multiple websites (one for each practice area) with keywords in the actual domain name, but based on the recommendation of SEO "experts" a few years ago, I consolidated all the webpages into one single webpage (based on the rumors at the time that Google was going to be focusing on authorship and branding in the future, rather than keywords in URLs or titles). Needless to say, Google authorship was dropped a year or two later and "branding" never took off. Overall, having one webpage is convenient and generally makes SEO easier, but there's been a huge drawback: When my page comes up in SERPs after searching for "attorney" or "lawyer" combined with a specific practice area, the practice area landing pages don't typically come up in the SERPs, only the front page comes up. It's as if Google recognizes that I have some decent content, and Google knows that I specialize in multiple practice areas, but it directs everyone to the front page only. Prospective clients don't like this and it causes my bounce rate to be high. They like to land on a page focusing on the practice area they searched for. Two questions: (1) Would using parent pages (e.g. http://lawfirm.com/divorce/anytown-usa-attorney-lawyer/ vs. http://lawfirm.com/anytown-usa-divorce-attorney-lawyer/) be better for SEO? The research I've done up to this point appears to indicate "no." It doesn't make much difference as long as the keywords are in the domain name and/or URL. But I'd be interested to hear contrary opinions. (2) Would using parent pages (e.g. http://lawfirm.com/divorce/anytown-usa-attorney-lawyer/ vs. http://lawfirm.com/anytown-usa-divorce-attorney-lawyer/) be better for indexing in Google SERPs? For example, would it make it more likely that someone searching for "anytown usa divorce attorney" would actually end up in the divorce section of the website rather than the front page?
Algorithm Updates | | micromano0 -
Is it stil a rule that Google will only index pages up to three tiers deep? Or has this changed?
I haven't looked into this in a while, it used to be that you didn't want to bury pages beyond three clicks from the main page. What is the rule now in order to have deep pages indexed?
Algorithm Updates | | seoessentials0 -
Large number of thin content pages indexed, affect overall site performance?
Hello Community, Question on negative impact of many virtually identical calendar pages indexed. We have a site that is a b2b software product. There are about 150 product-related pages, and another 1,200 or so short articles on industry related topics. In addition, we recently (~4 months ago) had Google index a large number of calendar pages used for webinar schedules. This boosted the indexed pages number shown in Webmaster tools to about 54,000. Since then, we "no-followed" the links on the calendar pages that allow you to view future months, and added "no-index" meta tags to all future month pages (beyond 6 months out). Our number of pages indexed value seems to be dropping, and is now down to 26,000. When you look at Google's report showing pages appearing in response to search queries, a more normal 890 pages appear. Very few calendar pages show up in this report. So, the question that has been raised is: Does a large number of pages in a search index with very thin content (basically blank calendar months) hurt the overall site? One person at the company said that because Panda/Penguin targeted thin-content sites that these pages would cause the performance of this site to drop as well. Thanks for your feedback. Chris
Algorithm Updates | | cogbox0 -
Why does Google say they have more URLs indexed for my site than they really do?
When I do a site search with Google (i.e. site:www.mysite.com), Google reports "About 7,500 results" -- but when I click through to the end of the results and choose to include omitted results, Google really has only 210 results for my site. I had an issue months back with a large # of URLs being indexed because of query strings and some other non-optimized technicalities - at that time I could see that Google really had indexed all of those URLs - but I've since implemented canonical URLs and fixed most (if not all) of my technical issues in order to get our index count down. At first I thought it would just be a matter of time for them to reconcile this, perhaps they were looking at cached data or something, but it's been months and the "About 7,500 results" just won't change even though the actual pages indexed keeps dropping! Does anyone know why Google would be still reporting a high index count, which doesn't actually reflect what is currently indexed? Thanks!
Algorithm Updates | | CassisGroup0 -
Is it OK to 301 redirect the index page to a search engine friendly url
Is it OK to 301 redirect the index page to a search engine friendly url.
Algorithm Updates | | WinningInch0 -
Are xml sitemaps a thing of the past?
We had an internal debate about the importance of having a sitemap.xml on your website. Basically, there is Google documentation that indicates a sitemap.xml is due diligence: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156184 And other authoritative forums, blogposts, etc. which indicate that sitemap creation and maintenance is a waste of your time, e.g. http://webmasters.stackexchange.com/questions/4803/the-sitemap-paradox/ A bigger question is: Are there cases in which not having a sitemap.xml actually became detrimental or risky? Thanks in advance!
Algorithm Updates | | HZseo0 -
Stop google indexing CDN pages
Just when I thought I'd seen it all, google hits me with another nasty surprise! I have a CDN to deliver images, js and css to visitors around the world. I have no links to static HTML pages on the site, as far as I can tell, but someone else may have - perhaps a scraper site? Google has decided the static pages they were able to access through the CDN have more value than my real pages, and they seem to be slowly replacing my pages in the index with the static pages. Anyone got an idea on how to stop that? Obviously, I have no access to the static area, because it is in the CDN, so there is no way I know of that I can have a robots file there. It could be that I have to trash the CDN and change it to only allow the image directory, and maybe set up a separate CDN subdomain for content that only contains the JS and CSS? Have you seen this problem and beat it? (Of course the next thing is Roger might look at google results and start crawling them too, LOL) P.S. The reason I am not asking this question in the google forums is that others have asked this question many times and nobody at google has bothered to answer, over the past 5 months, and nobody who did try, gave an answer that was remotely useful. So I'm not really hopeful of anyone here having a solution either, but I expect this is my best bet because you guys are always willing to try.
Algorithm Updates | | loopyal0