Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Is there a way to get a list of Total Indexed pages from Google Webmaster Tools?
-
I'm doing a detailed analysis of how Google sees and indexes our website and we have found that there are 240,256 pages in the index which is way too many. It's an e-commerce site that needs some tidying up.
I'm working with an SEO specialist to set up URL parameters and put information in to the robots.txt file so the excess pages aren't indexed (we shouldn't have any more than around 3,00 - 4,000 pages) but we're struggling to find a way to get a list of these 240,256 pages as it would be helpful information in deciding what to put in the robots.txt file and which URL's we should ask Google to remove.
Is there a way to get a list of the URL's indexed? We can't find it in the Google Webmaster Tools.
-
Looks like I can only do the first thousand. It's a start though. Thank you for the information.
Many of the URL's on my list, when put in to Google search, are giving me 80-100 other variants I can remove by hand.
http://www.mathewporter.co.uk/list-a-domains-indexed-pages-in-google-docs/ for anyone else following.
-
Finally getting around to doing this and noticed that when I change the start number to anything above 900, it doesn't work - ie: it's only letting me look at the first 1,000 results for some reason.
The list of 1,000 has given me some good URL's to search off for the filtering thingy that was generating all the garbage URL's but I'd love to get past 1,000 if I can.
Does anyone know how?
-
Correct. I have gone in to URL Parameters already and set them to Crawl 'No URLs' for those we don't want crawled.
We haven't added those parameters listed in there in to the robots.txt file yet, but I will do that now. I had an initial consult today and we ran way over time when we discovered all this stuff so I have another appointment in a couple of weeks.
We have a sitemap of all the category pages and relevant static pages on the site already and Google has those indexed nicely. We just need to get rid of the 240,000 pages it has indexed that we don't want in there (frightening I know - it's a really high number).
I greatly appreciate you taking the time to respond. Thank you.
-
Thanks. There's a lot of auto-generated content, duplicate pages and we've set the robots.txt file up to exclude a large number of them. Now we wait.
Very helpful and greatly appreciated. Thank you.
-
Hi,
I'm going to assume that as you have said it's an e-commerce site that the URL parameters are created by product variations, filters, sorts etc. If so then you must already be seeing those parameters on the URL of your site as you navigate and in your analytics or search results.
Your SEO specialist should easily be able to add those parameters to the robots file. Then personally I would resubmit a site map for completeness and wait for results to take effect.
-
Joanne,
I'm afraid there's no way to know which pages are actually indexed from your Webmaster Tools. You can use a simple search in Google: site:domain.com and it will list "all" your indexed pages, however, there's no way to export that as a report.
You can create a report using some "hack". Login to your Google Drive, create a new spreadsheet and use the following command to populate rows:
=importXml("https://www.google.com/search?q=site:www.yourdomainnamehere.com&num=100&start=1"; "//cite")
This will load the first 100 results. You will need to repeat the process for every 1000 results you have, changing the last variable: "start=1" to "start=100" and then "start=200", etc (you see where I'm going). This could really be a pain in the butt for your site's size.
My recommendation is you navigate your own site, decide which pages should be removed and then create the robots.txt regardless what google has indexed. Once you complete your robots.txt, it will take a few weeks (or even a month) to have the blocked pages removed.
Hope that helps!
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Password Protected Page(s) Indexed
Hi, I am wondering if my website can get a penalty if some password protected pages are showing up when I search on google: site:www.example.com/sub-group/pass-word-protected-page That shows that my password protected page was indexed either before or after adding the password protection. I've seen people suggest no indexing the page. Is that the best method to take care of this? What if we are planning on pushing the page live later on? All of these pages have no title tag, meta description, image alt text, etc. Should I add them for each page? I am wondering what is the best step, especially if we are planning on pushing the page(s) live. Thanks for any help!
Intermediate & Advanced SEO | | aua0 -
Redirected Old Pages Still Indexed
Hello, we migrated a domain onto a new Wordpress site over a year ago. We redirected (with plugin: simple 301 redirects) all the old urls (.asp) to the corresponding new wordpress urls (non-.asp). The old pages are still indexed by Google, even though when you click on them you are redirected to the new page. Can someone tell me reasons they would still be indexed? Do you think it is hurting my rankings?
Intermediate & Advanced SEO | | phogan0 -
How to setup multiple pages in Google Search?
How to setup multiple pages in Google Search? I have seen sites that are arranged in google like : Website in Google
Intermediate & Advanced SEO | | Hall.Michael
About us. Contact us
Services. Etc.. Kindly review screenshot. Is this can achieved by Yoast Plugin? X9vMMTw.png0 -
How to add subdomains to webmaster tools?
Can anyone help with how I add a sub domain to webmaster tools? Also do I need to create a seperate sitemap for each sub domain? Any help appreciated!
Intermediate & Advanced SEO | | SamCUK1 -
Site Indexed by Google but not Bing or Yahoo
Hi, I have a site that is indexed (and ranking very well) in Google, but when I do a "site:www.domain.com" search in Bing and Yahoo it is not showing up. The team that purchased the domain a while back has no idea if it was indexed by Bing or Yahoo at the time of purchase. Just wondering if there is anything that might be preventing it from being indexed? Also, Im going to submit an index request, are there any other things I can do to get it picked up?
Intermediate & Advanced SEO | | dbfrench0 -
Should pages of old news articles be indexed?
My website published about 3 news articles a day and is set up so that old news articles can be accessed through a "back" button with articles going to page 2 then page 3 then page 4, etc... as new articles push them down. The pages include a link to the article and a short snippet. I was thinking I would want Google to index the first 3 pages of articles, but after that the pages are not worthwhile. Could these pages harm me and should they be noindexed and/or added as a canonical URL to the main news page - or is leaving them as is fine because they are so deep into the site that Google won't see them, but I also won't be penalized for having week content? Thanks for the help!
Intermediate & Advanced SEO | | theLotter0 -
Tool to calculate the number of pages in Google's index?
When working with a very large site, are there any tools that will help you calculate the number of links in the Google index? I know you can use site:www.domain.com to see all the links indexed for a particular url. But what if you want to see the number of pages indexed for 100 different subdirectories (i.e. www.domain.com/a, www.domain.com/b)? is there a tool to help automate the process of finding the number of pages from each subdirectory in Google's index?
Intermediate & Advanced SEO | | nicole.healthline0 -
Should I prevent Google from indexing blog tag and category pages?
I am working on a website that has a regularly updated Wordpress blog and am unsure whether or not the category and tag pages should be indexable. The blog posts are often outranked by the tag and category pages and they are ultimately leaving me with a duplicate content issue. With this in mind, I assumed that the best thing to do would be to remove the tag and category pages from the index, but after speaking to someone else about the issue, I am no longer sure. I have tried researching online, but there isn't anything that provided any further information. Please can anyone with any experience of dealing with issues like this or with any knowledge of the topic help me to resolve this annoying issue. Any input will be greatly appreciated. Thanks Paul
Intermediate & Advanced SEO | | PaulRogers0