How to determine which pages are not indexed
-
Is there a way to determine which pages of a website are not being indexed by the search engines?
I know Google Webmasters has a sitemap area where it tells you how many urls have been submitted and how many are indexed out of those submitted. However, it doesn't necessarily show which urls aren't being indexed.
-
When discussing about Google index I recommend using https://sitecheck.tools/en/check-page-indexed/. This service is completely free and can handle anything from 100 to 100 million pages. It’s an efficient way to determine which of your pages are indexed by Google. Whether you're managing a small site or a large portal, this tool offers a practical solution for monitoring your site’s indexing status.
-
The better way is to check in the Search Console. For example, Bing Webmaster and Google Search Console have special tabs where you can see what pages in indexed and what pages are not indexed.
Also has a few services that can help you make it more UX-friendly. For example my service https://sitecheck.tools/ if you need help, please let me know. -
@mfrgolfgti Lol, yes that does work but not for indexing?
-
Hi, I know this is an old question but I wanted to ask about the first paragraph of your answer: "You can start by trying the "site:domain.com" search. This won't show you all the pages which are indexed, but it can help you determine which ones aren't indexed."
Do you happen to know why doing a site:domain.com search doesn't show all the indexed pages? I've just discovered this for our website. Down the site: command shows 73 pages but checking through the list, there are lots of pages not included. However if I do the site:domain.com/page.html command for those individual pages, they do come up in the search results page. I don't understand why though?
-
I'm running into this same issue where I have about a quarter of a client's site not indexing. Using the site:domain.com trick shows me 336 results - which I somehow need to add to a csv file, compare against the URLs crawled by screaming frog, and then use VLOOKUP to find the unique values.
So how can I get those 300+ results exported to a csv file for analysis?
-
Deep crawl will provide the information with one tool. It's not in expensive but it's definitely the best tool out there you have to connected to Google analytics in order for it to give you this information but it will show you how many of your your url are index and how many are not & should be.
If contentEd to Google Webmaster tools, Google analytics & then any of t analytics he many ways of scraping or indexing the site.
Technically that is more than one tool but it is a good way.
All the best,
tom
-
Crawl the domain using SF and then use URL profiler to check their indexation status.
You'll need proxies.
Can be done with Scrape box too
Otherwise you can probably use Sheets with some importxml wizardry to create a query on Google
-
hi Paul,
I too have not had any luck with Screaming Frog actually checking every link that it claims it will. You're exactly right it will check the homepage or the single link that you choose. However it will not from my experience check everything. I have a friend who has the paid version I will ask him.
I'll be sure to let you know. Because I do agree with you I just found this out myself in fact it is misleading to say check all and really check just one.
Excellent tutorial by the way of how to do this seemingly easy task however when attempted is truly not easy at all.
Sincerely,
Thomas
PS I get this result site:www.example.com
he gives me the opportunity to see all the indexed pages Google has processed I however would have to compare them to a csv file in order to actually know what is missing.
I really like your example and definitely will use that in the future.
-
Thanks for the reminder that Screaming Frog has that "Check Index" functionality, Thomas.
Unfortunately, I've never been able to get that method to check more than one link at a time, as all it does is send the request to a browser to check. Even highlighting multiple URLs and checking for indexation only checks the first one. Great for spot checks, but not what Seth is looking for, I don't think. My other post details an automatic way to check a site's hundreds (or thousands) of pages at a time.
I only have the free version of Screaming Frog on this machine at the moment so would be very interested to know if the paid version changes this.
Paul
-
Dear Paul,
thank you for taking the time to address this.
I did become extremely hastily when I wrote my 1st answer I copy and pasted off of a dictation software that I use. I then went on to wrongfully say this is the correct way to do something. However screaming frog SEO spider
Is a tool that I referenced early on this tool allows you to see 100% of all the links you are hosting at the time you run the scan.
And includes the ability to check if it is indexed with Google, Bing and Yahoo when I referenced this software nobody took notice as I probably looked like I did not know what I was talking about.
In hindsight I should have kept bringing up screaming frog however I did not I simply brought up other ways to check lost links. In my opinion going into Google and clicking one by one on what you do or do not know is indexed is a very long and arduous task.
Screaming frog allows you to click internal links then right-click check if indexed there will be a table that comes down on the right side. You can select from the 3 big search engines you can do many more things with this fantastic tool but I did not illustrate as well as I am right now exactly how this tool should be used or what its capabilities are. I truly thought once I had referenced it somebody would look into it and they would see what I was speaking about however hindsight is 2020 I appreciate your comment very much and hope you can see that yes I'm here mistaken the beginning however I did come up with an automated tool to give him the answer the question asked.
Screaming frog can be used on PC, Mac or Linux it is free to download and comes in a pay version with even more abilities then water are showcased in the free edition. It is only 2 Mb in size and uses almost no RAM on a Mac I don't know how big it is on the PC
here's the link to the software
http://www.screamingfrog.co.uk/seo-spider/
I hope that you will accept my apologies for not paying this much attention as I should have to what I pasted and hope this tool will be of use to you.
Respectfully,
Thomas
-
There is no individual tool capable of providing the info you're looking for, Seth. At least as far as I've ever come across.
HOWEVER! It is possible to do it if you are willing to do some of the work on your own to collect and manipulate data using several tools. Essentially this method automates the approach Takeshi has mentioned.
The short answer
First you'll create a list of all the pages on your website. Then you'll create a list of all the URLs that Google says are indexed. From there, you will use Excel to subtract the indexed URLs from the known URLs, leaving a list of non-indexed URLS, which is what you asked for.Ready? Here's how.
Collect a list of all your site's pages You can do this in several ways. If you have a reliable and complete sitemap, you can get this data there. If your CMS is capable of outputting such a list, great. If neither of these is an option, you can use the Screaming Frog spider to get the data (remember the free version will only collect up to 500 pages). Xenu Linksleuth is also an alternative. Put all these URLs into a spreadsheet.
Collect a list of all pages Google has indexed.
You'll do this using a scraper tool that will "scrape" all the URLs off a Google SERP page. There are many tools to do this; which one is best will depend largely on how big your site is. Assuming your site is only 7 or 800 pages, I recommend the brilliantly simple SERPS Redux bookmarklet from Liam Delahunty.Clicking on the bookmarklet while on a SERP page will automatically scrape all the URLs into an easily copyable format. The trick is, you want the SERP page to display as many results as possible, otherwise you'll have to iterate through many, many pages to catch everything.
So - pro tip - if you go to the setting icon while on any Google search page, and select Search Settings you will see the option to have your searches return up to 100 results instead of the usual 10. You have to select Never Show Instant Results in order for the Results per Page slider to become active.
Now, in Google's search box, you'll enter site:mysite.com as Takeshi explained. (NOTE: use the canonical version of your domain, so include the www if that's the primary version of your site) You should now have a page listing 100 URLs of your site that are indexed.
- Click the SERPRedux bookmarklet to collect them all, then copy and paste the URLs into a spreadsheet.
- Go back to the site:mydomain results page, click for page 2, and repeat, adding the additional URLs to the same spreadsheet.
- Repeat this process until you have collected all the URLs Google lists
Remove duplicates to leave just un-indexed URLs
Now you have a spreadsheet with all known URLs and all indexed URLs. Use Excel to remove all the duplicates, and what you will be left with is all the URLs that Google doesn't list as being indexed.Voila !
A few notes:
- The site: search operator doesn't guarantee that you'll actually get all indexed URLs, but it's the closest you'll be able to get. For an interesting experiment, re-run this process with the non-canonical version of your site address as well, to see where you might be indexed for duplicates.
- If your site is bigger, or you will need to do this multiple times, there are tools that will scrape all the SERPS pages at once so you don't have to iterate through them. The scraper components of SEER's SEO Toolbox or Neil Bosma's SEO Tools for Excel are good starting points. There is also a paid tool called ScrapeBox designed specifically for this kind of scraping. It's a blackhat tool, but in the right hands, is also powerful for whitehat purposes
- Use Takeshi's suggestion of running some of the resulting non-indexed list through manual site: searches to confirm the quality of your list
Whew! I know that's a lot to throw at you as an answer to what probably seemed like a simple question, but I wanted to work through the steps for you, rather than just hint at how it could be done.
Be sure to ask about any of the areas where my explanation isn't clear enough.
Paul
-
Thomas, as Takeshi has tried to point out, you have misread the original question. The original poster is asking for a way to find the actual URLS of pages from his site that are NOT indexed in the search engines.
He is not looking for the number of URLS that are indexed.
None of the tools you have repeatedly mentioned are capable of providing this information, which is likely why you're response was downvoted.
Best to carefully read the original question to ensure you are answering what is actually being asked, rather than what you assume is being asked. Otherwise you add significant confusion to the attempt to provide an answer to the original poster.
Paul
-
http://www.screamingfrog.co.uk/
Google analytics should be able to tell you the answers to this as well. I'm sorry I do not think that earlier however I stand by my Google Webmaster tools especially after consulting with a few more people.
you can use
then when done go to seo Scroll to bottom you will see exactly how many pages have been indexed successfully by Google.
Mr. Young,
I would like to know if this person does not have a 301 redirect Wood your site scan work successfully? Because under your directions it would not and I'm not giving you thumbs down on it you know
-
I hope the two links below will give you the information that you are looking for. I believe that you will find quite a bit from the second link and the first link will give you a free resource and finding exactly how many links pages have been indexed as far as how many have not you can only find that using the second link
http://www.northcutt.com/tools/free-seo-tools/google-indexed-pages-checker/
along with
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=2642366
Go to advanced and it will offer you a show all
-
He's looking for a way to find which pages aren't indexed, not how many pages are indexed.
-
Go to Google Webmaster tools and go to health underneath that go to index status you will find the answer that you've been looking for please remove the thumbs down from my answer because it is technically correct.
Index Status
Index Status
Showing data from the last year
<form id="view-options-form" action="https://www.google.com/webmasters/tools/index-status" method="GET">BasicAdvanced <label for="indexed-checkbox">Total indexed</label> this is your # <label for="crawled-checkbox">Ever crawled</label> <label for="roboted-checkbox">Blocked by robots</label> <label for="removed-checkbox">Removed</label> </form>
-
Connect Google analytics to Deepcrawl.com and it will give you the exact number when it is done indexing in (universal index.)
Take a tool like screaming frog SEO spider then run your site night through the tool.
One of the two tools about and I use the internal links to get your page number. You want to make sure they are HTML pages not just Uris then One of the two tools about and I use the internal links to get your page number. You want to make sure they are HTML pages not take the # and subtract it by amount google shows when you Ginger tonight: www.example.com and in the Google search no "" or ()( in your search "( site:www.example.com )" and in the Google search bar you will see a # that is your indexed urls a fast way is URLs indexed a very fast way is
would be to go to marketinggrader.com add your site & let it run then click "SEO"
you will then see the # of pages in Googles index
Login to Google Webmaster tools. And select indexed content it will show you exactly how many pages in your site map have been indexed and exactly how many pages in total has been indexed. You will not miss a thing inside Google Webmaster tools using the other techniques you could this things if you did not include the www.for instance useing site: on google you could find out with you did not have a 301 redirect Will not give you the correct answer.
use GWT
-
You can start by trying the "site:domain.com" search. This won't show you all the pages which are indexed, but it can help you determine which ones aren't indexed.
Another thing you can do is go into Google Analytics and see which of your pages have not received any organic visits. If a page has not received any clicks at all, there's a good chance it hasn't been indexed yet (or just isn't ranking well).
Finally, you can use the "site:domain.com/page.html" command to figure out whether a specific page is not being indexed. You can also do "site:domain.com/directory" to see whether any pages within a specific directory are being indexed.
-
You could use Linksleuth to crawl your site. It will tell you how many pages it found, then match it against the total of pages google has indexed.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Alternatives 301? Issues redirection of index.html page with Adobe Business Catalyst
Hi Moz community, As for now we have two different versions of a client's homepage that’s dividing our traffic. One of the urls is the index.html version of the other url. We are using Adobe Business Catalyst for one of our clients and they told us they can’t 301 redirect. Adobe Business Catalyst does 301 redirects, but not to itself like an .htaccess rewrite. Doing a 301 redirect using BC from index.html to / creates an infinite loop and break the page. Are there alternatives to a 301 or any suggestions how to solve this? Thanks for all your answers and thoughts in advance,
Technical SEO | | Anna_Hoesl
Anna0 -
Drop in Indexed Page + Organic Traffic
Hey Moz Community, I've been seeing a steady decrease in search console of pages being indexed by Google for our eCommerce site. This is corresponding to lower impressions and traffic in general this year. We started with around a million pages being indexed in Nov of 2015 down to 18,000 pages this Nov. I realized that since we don't have around 3,000 or so products year round this is mostly likely a good thing. I've checked to make sure our main landing pages are being indexed which they are and our sitemap was updated several times this year, although we're in the process of updating it again to resubmit. I also checked our robots.txt and there's nothing out of the ordinary. In the last month we've recently gotten rid of some duplicate content issues caused by pagination by using canonical tags but that's all we've done to reduce the number of pages crawled. We have seen some soft 404's and some server errors coming up in our crawl error report that we've either fixed or are trying to fix. Not really sure where to start looking to find a solution to the problem or if it's even a huge issue, but the drop in traffic is also not great. The drop in traffic corresponded to lose in rankings as well so there could be correlation or none. Any ideas here?
Technical SEO | | znotes0 -
Getting high priority issue for our xxx.com and xxx.com/home as duplicate pages and duplicate page titles can't seem to find anything that needs to be corrected, what might I be missing?
I am getting high priority issue for our xxx.com and xxx.com/home as reporting both duplicate pages and duplicate page titles on crawl results, I can't seem to find anything that needs to be corrected, what am I be missing? Has anyone else had a similar issue, how was it corrected?
Technical SEO | | tgwebmaster0 -
Https indexed...how?
Hello Moz, Since a while i am struggling with a SEO case: At the moment a https version of a homepage of a client of us is indexed in Google. Thats really strange because the url is redirected to an other website url for three weeks now. And we did everything to make clear to google that he has to index the other url.
Technical SEO | | Searchresult
So we have a few homepage urls A https://www.website.nl
B https://www.websites.nl/category
C http://www.websites.nl/category What we did: Redirected A with a 301 to B, a redirect from A or B to C is difficult because of the security issue with the ssl certificate. We put the right canonical url (VERSION C) on every version of the homepage(A,B) We only put the canonical urls in the sitemap.xml, only version C and uploaded it to Google Webmastertools We changed all important internal links to Version C We also get some valuable external backlinks to Version C Is there something i missed or i forget to say to Google hey look you've got the wrong url indexed, you have to index version C? How is it possible Google still prefers Version A after doing al those changes three weeks a go? I'am really looking forward to your answer. Thanks a lot in advanced! Greetz Djacko0 -
Investigating a huge spike in indexed pages
I've noticed an enormous spike in pages indexed through WMT in the last week. Now I know WMT can be a bit (OK, a lot) off base in its reporting but this was pretty hard to explain. See, we're in the middle of a huge campaign against dupe content and we've put a number of measures in place to fight it. For example: Implemented a strong canonicalization effort NOINDEX'd content we know to be duplicate programatically Are currently fixing true duplicate content issues through rewriting titles, desc etc. So I was pretty surprised to see the blow-up. Any ideas as to what else might cause such a counter intuitive trend? Has anyone else see Google do something that suddenly gloms onto a bunch of phantom pages?
Technical SEO | | farbeseo0 -
Page Indexing increase when I request Google Site Link demote
Hi there, Has anyone seen a page crawling increase in Google Web Master Tools when they have requested a site link demotion? I did this around the 23rd of March, the next day I started to see page crawling rise and rise and report a very visible spike in activity and to this day is still relatively high. From memory I have asked about this in SEOMOZ Q&A a couple of years ago in and was told that page crawl activity is a good thing - ok fine, no argument. However at the nearly in the same period I have noticed that my primary keyword rank for my home page has dropped away to something in the region of 4th page on Google US and since March has stayed there. However the exact same query in Google UK (Using SEOMOZ Rank Checker for this) has remained the same position (around 11th) - it has barely moved. I decided to request an undemote on GWT for this page link and the page crawl started to drop but not to the level before March 23rd. However the rank situation for this keyword term has not changed, the content on our website has not changed but something has come adrift with our US ranks. Using Open Site Explorer not one competitor listed has a higher domain authority than our site, page authority, domain links you name it but they sit there in first page. Sorry the above is a little bit of frustration, this question is not impulsive I have sat for weeks analyzing causes and effects but cannot see why this disparity is happening between the 2 country ranks when it has never happened for this length of time before. Ironically we are still number one in the United States for a keyword phrase which I moved away from over a month ago and do not refer to this phrase at all on our index page!! Bizarre. Granted, site link demotion may have no correlation to the KW ranking impact but looking at activities carried out on the site and timing of the page crawling. This is the only sizable factor I can identify that could be the cause. Oh! and the SEOMOZ 'On-Page Optimization Tool' reports that the home page gets an 'A' for this KW term. I have however this week commented out the canonical tag for the moment in the index page header to see if this has any effect. Why? Because as this was another (if not minor) change I employed to get the site to an 'A' credit with the tool. Any ideas, help appreciated as to what could be causing the rank differences. One final note the North American ranks initially were high, circa 11-12th but then consequently dropped away to 4th page but not the UK rankings, they witnessed no impact. Sorry one final thing, the rank in the US is my statistical outlier, using Google Analytics I have an average rank position of about 3 across all countries where our company appears for this term. Include the US and it pushes the average to 8/9th. Thanks David
Technical SEO | | David-E-Carey0 -
Google Indexing
Hi Everybody, I am having kind of an issue when it comes to the results Google is showing on my site. I have a multilingual site, which is main language is Catalan. But of course if I am looking results in Spanish (google.es) or in English (google.com) I want Google to show the results with the proper URL, title and descriptions. My brand is "Vallnord" so if you type this in Google you will be displayed the result in Catalan (Which is not optimized at all yet) but if you search "vallnord.com/es" only then you will be displayed the result in Spanish What do I have to do in order for Google to read this the way I want? Regards, Guido.
Technical SEO | | SilbertAd0 -
Page MozRank and MozTrust 0 for Home Page, Makes No Sense?
Hey Mozzers! I'm a bit confused by a site that is showing a 0 for home page MozRank and MozTrust, while its subdomain and root domain metrics look decent (relatively). I am posting images of the page metrics and subdomain metrics to show the disparity: http://i.imgur.com/3i0jq.png http://i.imgur.com/ydfme.png Is it normal to see this type of disparity? The home page has very little inbound links, but the big goose egg has me wondering if there is something else going on. Has anyone else experienced this? Or, does anyone have speculation as to why a home page would have a 0 MozRank while the subdomain metrics look much better? Thanks!
Technical SEO | | ClarityVentures0