Tool to calculate the number of pages in Google's index?
-
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
-
A good way to know how many pages are indexed in Google is to check the Top Landing Pages in Google Analytics,
It gives you a more interesting information than, IMHO, Google WBT itself, because those pages are the pages that actually are indexed and because of that receive traffic.
-
Google Webmaster Tools allows you to see your sitemap, along with the count of URLs in the sitemap and Google's index.
You can also download your sitemap from Google WMT. I am not aware of any data Google offers to help identify which URLs are not indexed. It sounds like this is the information Michelle is seeking.
If you just needed to know the # of URLs for each subdirectory, you could submit a sitemap for each one and then Google will show the URLs in Index for each given sitemap.
-
Is this your site? If so you should be able to find this information in Google Webmaster tools. Upload your sitemap and Google will tell you how many of the pages in your sitemap are in there index.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I'm noticing that URL that were once indexed by Google are suddenly getting dropped without any error messages in Webmasters Tools, has anyone seen issues like this before?
I'm noticing that URLs that were once indexed by Google are suddenly getting dropped without any error messages in Webmasters Tools, has anyone seen issues like this before? Here's an example:
Intermediate & Advanced SEO | | nystromandy
http://www.thefader.com/2017/01/11/the-carter-documentary-lil-wayne-black-lives-matter0 -
Google and PDF indexing
It was recently brought to my attention that one of the PDFs on our site wasn't showing up when looking for a particular phrase within the document. The user was trying to search only within our site. Once I removed the site restriction - I noticed that there was another site using the exact same PDF. It appears Google is indexing that PDF but not ours. The name, title, and content are the same. Is there any way to get around this? I find it interesting as we use GSA and within GSA it shows up for the phrase. I have to imagine Google is saying that it already has the PDF and therefore is ignoring our PDF. Any tricks to get around this? BTW - both sites rightfully should have the PDF. One is a client site and they are allowed to host the PDFs created for them. However, I'd like Mathematica to also be listed. Query: no site restriction (notice: Teach for america comes up #1 and Mathematica is not listed). https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#q=HSAC_final_rpt_9_2013.pdf+"Teach+charlotte"+filetype:pdf&as_qdr=all&filter=0 Query: site restriction (notice that it doesn't find the phrase and redirects to any of the words) https://www.google.com/search?as_q=&as_epq=HSAC_final_rpt_9_2013.pdf&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=&as_filetype=pdf&as_rights=&gws_rd=ssl#as_qdr=all&q="Teach+charlotte"+site:www.mathematica-mpr.com+filetype:pdf
Intermediate & Advanced SEO | | jpfleiderer0 -
Difference in Number of URLS in "Crawl, Sitemaps" & "Index Status" in Webmaster Tools, NORMAL?
Greetings MOZ Community: Webmaster Tools under "Index Status" shows 850 URLs indexed for our website (www.nyc-officespace-leader.com). The number of URLs indexed jumped by around 175 around June 10th, shortly after we launched a new version of our website. No new URLs were added to the site upgrade. Under Webmaster Tools under "Crawl, Site maps", it shows 637 pages submitted and 599 indexed. Prior to June 6th there was not a significant difference in the number of pages shown between the "Index Status" and "Crawl. Site Maps". Now there is a differential of 175. The 850 URLs in "Index Status" is equal to the number of URLs in the MOZ domain crawl report I ran yesterday. Since this differential developed, ranking has declined sharply. Perhaps I am hit by the new version of Panda, but Google indexing junk pages (if that is in fact happening) could have something to do with it. Is this differential between the number of URLs shown in "Index Status" and "Crawl, Sitemaps" normal? I am attaching Images of the two screens from Webmaster Tools as well as the MOZ crawl to illustrate what has occurred. My developer seems stumped by this. He has submitted a removal request for the 175 URLs to Google, but they remain in the index. Any suggestions? Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan0 -
How do I get rel='canonical' to eliminate the trailing slash on my home page??
I have been searching high and low. Please help if you can, and thank you if you spend the time reading this. I think this issue may be affecting most pages. SUMMARY: I want to eliminate the trailing slash that is appended to my website. SPECIFIC ISSUE: I want www.threewaystoharems.com to showing up to users and search engines without the trailing slash but try as I might it shows up like www.threewaystoharems.com/ which is the canonical link. WHY? and I'm concerned my back-links to the link without the trailing slash will not be recognized but most people are going to backlink me without a trailing slash. I don't want to loose linkjuice from the people and the search engines not being in consensus about what my page address is. THINGS I"VE TRIED: (1) I've gone in my wordpress settings under permalinks and tried to specify no trailing slash. I can do this here but not for the home page. (2) I've tried using the SEO by yoast to set the canonical page. This would work if I had a static front page, but my front page is of blog posts and so there is no advanced page settings to set the canonical tag. (3) I'd like to just find the source code of the home page, but because it is CSS, I don't know where to find the reference. I have gone into the css files of my wordpress theme looking in header and index and everywhere else looking for a specification of what the canonical page is. I am not able to find it. I'm thinking it is actually specified in the .htaccess file. (4) Went into cpanel file manager looking for files that contain Canonical. I only found a file called canonical.php . the only thing that seemed like it was worth changing was changing line 139 from $redirect_url = home_url('/'); to $redirect_url = home_url(''); nothing happened. I'm thinking it is actually specified in the .htaccess file. (5) I have gone through the .htaccess file and put thes 4 lines at the top (didn't redirect or create the proper canonical link) and then at the bottom of the file (also didn't redirect or create the proper canonical link) : RewriteEngine on
Intermediate & Advanced SEO | | Dillman
RewriteCond %{HTTP_HOST} ^([a-z.]+)?threewaystoharems.com$ [NC]
RewriteCond %{HTTP_HOST} !^www. [NC]
RewriteRule .? http://www.%1threewaystoharems.com%{REQUEST_URI} [R=301,L] Please help friends.0 -
Google indexing "noindex" pages
1 weeks ago my website expanded with a lot more pages. I included "noindex, follow" on a lot of these new pages, but then 4 days ago I saw the nr of pages Google indexed increased. Should I expect in 2-3 weeks these pages will be properly noindexed and it may just be a delay? It is odd to me that a few days after including "noindex" on pages, that webmaster tools shows an increase in indexing - that the pages were indexed in other words. My website is relatively new and these new pages are not pages Google frequently indexes.
Intermediate & Advanced SEO | | khi50 -
Google local pointing to Google plus page not homepage
Today my clients homepage dropped off the search results page (was #1 for months, in the top for years). I noticed in the places account everything is suddenly pointing at the Google plus page? The interior pages are still ranking. Any insight would be very helpful! Thanks.
Intermediate & Advanced SEO | | stevenob0 -
More Indexed Pages than URLs on site.
According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site. I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration. Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K). Anyone got any ideas?
Intermediate & Advanced SEO | | DavidLenehan0 -
Why my own page is not indexed for that keyword?
hi, I recently recreated the page www.zenucchi.it /ITA/poltrona-frau-brescia.html on the third level domain poltronafraubrescia.zenucchi.it by putting it on the home page. The first page is still indexed for the keyword poltrona frau brescia . But the new page is no indexed for that keyword and i don't know why ( even if the page is indexed in google ) .. I state that the new domain has the same autorithy and that i put a 301 redirect to pass his authority to the new one that has many more incoming links that did not have previous .. i hope you'll help me thanks a lot
Intermediate & Advanced SEO | | guidoboem0