How can I best find out which URLs from large sitemaps aren't indexed?
-
I have about a dozen sitemaps with a total of just over 300,000 urls in them. These have been carefully created to only select the content that I feel is above a certain threshold.
However, Google says they have only indexed 230,000 of these urls. Now I'm wondering, how can I best go about working out which URLs they haven't indexed? No errors are showing in WMT related to these pages.
I can obviously manually start hitting it, but surely there's a better way?
-
There's no obvious function in WM tools, but having a look round there's this option:
http://www.aspfree.com/c/a/BrainDump/Extracting-Google-Indexed-Web-Site-Pages-Using-MS-Excel/
But Google will only display the first 1000 URLs on a site query so you would need to adapt it lots of times. From the looks of it there's not an easy way.
There's maybe a tool out there that is similar to Xenu, but checks the index status in Google also. I haven't ever had the need for this so I'm not aware of one, but the chances are there is something out there.
Good luck!
-
Any ideas on how to go about exporting indexed urls?
-
Hi Peter,
I'd attempt some sort of export of both indexed URLs and actual URLs into an Excel file and try and remove duplicates.
You would need to look into it but I'm sure there's a way of matching and removing duplicates.
Other than that I wouldn't know.
Ben
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Trying to find all internal links to a specific page (without index)
Hi guys -- Still waiting on Moz to index a page of mine. We launched a new site over two months ago. In the meantime, I really just need a list of internal links to a specific page because I want to change its URL. Does anybody know how to find that list (of internal links to 1 of my pages) without the Moz index? I appreciate the help!
Technical SEO | | marchexmarketingmcc1 -
'domain:example.com/' is this line with a '/' at the end of the domain valid in a disavow report file ?
Hi everyone Just out of curiosity, what would happen if in my disavow report I have this line : domain:example.com**/** instead of domain:example.com as recommended by google. I was just wondering if adding a / at the end of a domain would automatically render the line invalid and ignored by Google's disavow backlinks tool. Many thanks for your thoughts
Technical SEO | | LabeliumUSA0 -
Can new content be added to a url which has a 301 redirect?
I am working on a site which is currently being redesigned. The home page currently ranks highly for relevant search terms, although on the new site the content on this page will be removed. The solution I was considering, to preserve rankings, was to move the content on the home page to a new url, and use a 301 redirect to help preserve rankings for that particular page. The question I have therefore, is am I able to add new content to the home page, and have this page freshly indexed accordingly? Any thoughts or suggestions would be most welcome. Thanks, Matt.
Technical SEO | | MatthewA0 -
Friendly URLS (SEO urls)
Hello, I own a eCommerce site with more than 5k of products, urls of products are : www.site.com/index.php?route=product/product&path=61_87&product_id=266 Im thinking about make it friend to seo site.com/category/product-brand Here is my question,will I lost ranks for make that change? Its very important to me know it Thank you very much!
Technical SEO | | matiw0 -
Moving articles to new site, can't 301 redirect because of panda
I have a site that is high quality, but was hit by penguin and perhaps panda. I want to remove some of the articles from my old site and put them on my new site. I know I can't 301 redirect them because I will be passing on the bad google vibes. So instead, I was thinking of redirecting the old articles to a page on the old site which explains that the article is moved over to the new site. I assume that's okay? I'm wondering how long I should wait between the time I take them down from the old site to the time I repost them on the new site. Do I need to wait for Google to de-index them in order to not be considered duplicate content/syndication? We'll probably reword them a bit, too - we really want to avoid panda. Thanks!
Technical SEO | | philray
Phil0 -
I need help to define which is the best friendly url structure
Hi, I need some help to define which is the best friendly url structure for my new project, I'm in doubt for some cases, anyone could help me define which would be the best way? domain.com/buy-online/0-1,this-cool-model or
Technical SEO | | LeonardoLima
domain.com/buy-online/this-cool-model,0-1 or
domain.com/buy-online/0-1/this-cool-model or
domain.com/buy-online/this-cool-model/0-1 or
domain.com/buy-online/this-cool-model_0-1 or
domain.com/buy-online/this-cool-model?Model=0&OtherParam=1 Thanks! Best Regards,
Leonardo Lima0 -
Google Has Indexed Most of My Site, why won't Bing?
We've got 600K+ pages indexed by Google and have submitted our same sitemap.xml's to Bing, but have only seen 100-200 pages get indexed by Bing. Is this fairly typical? Is there anything further we can do to increase indexation on Bing?
Technical SEO | | jamesti0 -
Site just will not be reincluded in Google's Index
I asked a question about this site (www.cookinggames.com.au) some time ago http://www.seomoz.org/qa/view/38488/site-indexing-google-doesnt-like-it and had some very helpful answers which were great. However I'm still no further ahead. I have added some more content, submitted a new XML sitemap, removed the 'lorem ipsum...' Now it seems that even Bing have ditched the site too. The number 1 result in Australia for the search term 'cooking games' is now this one - http://www.cookinggames.net.au/ which surely is not so much better to deserve a #1 spot whilst my site is deindexed? I have just had another reconsideration request 'denied' and am absolutely out of ideas/. If anyone can help suggest what I need to do... or even suggest how I can get feedback from the search engines what's wring that would be fantastic. Thank you David
Technical SEO | | OzDave0