How can I best find out which URLs from large sitemaps aren't indexed?
-
I have about a dozen sitemaps with a total of just over 300,000 urls in them. These have been carefully created to only select the content that I feel is above a certain threshold.
However, Google says they have only indexed 230,000 of these urls. Now I'm wondering, how can I best go about working out which URLs they haven't indexed? No errors are showing in WMT related to these pages.
I can obviously manually start hitting it, but surely there's a better way?
-
There's no obvious function in WM tools, but having a look round there's this option:
http://www.aspfree.com/c/a/BrainDump/Extracting-Google-Indexed-Web-Site-Pages-Using-MS-Excel/
But Google will only display the first 1000 URLs on a site query so you would need to adapt it lots of times. From the looks of it there's not an easy way.
There's maybe a tool out there that is similar to Xenu, but checks the index status in Google also. I haven't ever had the need for this so I'm not aware of one, but the chances are there is something out there.
Good luck!
-
Any ideas on how to go about exporting indexed urls?
-
Hi Peter,
I'd attempt some sort of export of both indexed URLs and actual URLs into an Excel file and try and remove duplicates.
You would need to look into it but I'm sure there's a way of matching and removing duplicates.
Other than that I wouldn't know.
Ben
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why don't sites using Drupal have keywords
Why don't the vast majority of sites using Drupal list keywords in the head section? Is there another convention used in Drupal that serves the same purpose for SEO? I noticed most of the Drupal info pages about keywords seem to drop off around 2010
Technical SEO | | fxarechiga0 -
Getting high priority issue for our xxx.com and xxx.com/home as duplicate pages and duplicate page titles can't seem to find anything that needs to be corrected, what might I be missing?
I am getting high priority issue for our xxx.com and xxx.com/home as reporting both duplicate pages and duplicate page titles on crawl results, I can't seem to find anything that needs to be corrected, what am I be missing? Has anyone else had a similar issue, how was it corrected?
Technical SEO | | tgwebmaster0 -
Wrong canonical URL was specified. How to refresh the index now?
Wrong canonical URL was applied to thousands of pages of a client website, pointing them all to a single non-existing URL. Now Google has de-indexed most of those pages. We have fixed the problem now, but do we get Search engines crawl those pages again and start showing in Search results? I understand that a slow recovery is possible if we don't do anything. Was wondering if we can fast track the recovery... Any pointers? Thanks
Technical SEO | | Krupesh0 -
The use of tabs on productpages, do or don't?
Does google has any trouble reading content tabs? The content is not loaded by ajax and is already in the page source code.
Technical SEO | | wilcoXXL
As i'm checking some big e-commerce websites or (amazon.com for example) they get rid of the tabs with content and put the different content below eachother. Is his better for SEO purpose? But what about user experience? For users it think it is easier to navigate by tabs then to have a long page to scroll. What do you guys think about this issue?0 -
How do I find which pages are being deindexed on a large site?
Is there an easy way or any way to get a list of all deindexed pages? Thanks for reading!
Technical SEO | | DA20130 -
Google Sitemap - How Long Does it Take Google To Index?
We have changed our sitemap about 1 month ago and Google is yet to index it. We have run a site: search and we still have many pages indexed but we are wondering how long does it take for google to index our sitemap? The last sitemap we put up had thousands of pages indexed within a fortnight, but for some reason this version is taking way longer. We are also confident that there are no errors in this version. Help!
Technical SEO | | JamesDFA0 -
My webmaster doesn't shows backlinks data why???
Hi every one I have 2 websites with same domain with targeting different country name for example www.domain.com.au and the other is wwww.domain.co.nz . So my question is that in webmaster tool my one domain doesn't shows back links data or search query data. why this is happen?
Technical SEO | | SanketPatel0 -
Can someone break down 'page level link metrics' for me?
Sorry for the, again, basic question - can someone define page level link metrics for me?
Technical SEO | | Benj250