Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
How to find all indexed pages in Google?
-
Hi,
We have an ecommerce site with around 4000 real pages. But our index count is at 47,000 pages in Google Webmaster Tools.
How can I get a list of all pages indexed of our domain? trying to locate the duplicate content.
Doing a "site:www.mydomain.com" only returns up to 676 results...
Any ideas?
Thanks,
Ben
-
You are absolutely right. But if you think that you have duplicate content issues, then Screaming Frog can help you tease that out.
That is also why I suggested the SEOmoz tool, since it is supposed to mimick a SE spider, it can give you a pretty good idea of any issues that you might have.
Using the advanced operator of site:domain makes sense, but if there are issues there like eyepaq said, it is going to be tough sledding.
My suggestion would be to download take a closer look at what GWT is telling you. Are there duplicates there? Is your CMS auto-generating URL's? That is probably going to be your best bet IMO.
Best of luck!
-
@BJS, I would export a file from GWT and filter the results. If your URLs are in GWT, then most likely it's indexed in Google.
-
Thank you to everyone that contributed.
@Zeph and @Francisco - I do use Screaming Frog, but actually, correct me if I am wrong, but it does not show a list of pages indexed, but rather pages that exist in the site - not what Google has already indexed. Thanks anyway
What I wanted was a way of creating a list of all indexed pages in Google - not a count.
But thank you all the same!
-
Hey Zeph! Hope your company is doing great.
@Ben, screaming frog is good for this. You will need to get the paid version of it. There is a video on the site http://www.screamingfrog.co.uk/seo-spider/. Use filters to get to your real URLs.
-
Hi,
There are tools that you can use - though for close 50k pages is harder to crawl. Best bet is the Web master tools count - although is not 100% exact either.
The site:domain is a good indicator but it's generated "on the fly" but it will show you a better result if you go "deeper" and click on page 10-20 and so on.
However right now it looks like there is an issue with site:domain. for more info see: http://www.seroundtable.com/google-site-command-cluster-16829.html
Cheers.
-
Use the tool Screaming Frog to see all your pages, that should help. Also, the SEOmoz toolset has a function that will show you all duplicate content (if you are a pro subscriber).
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Password Protected Page(s) Indexed
Hi, I am wondering if my website can get a penalty if some password protected pages are showing up when I search on google: site:www.example.com/sub-group/pass-word-protected-page That shows that my password protected page was indexed either before or after adding the password protection. I've seen people suggest no indexing the page. Is that the best method to take care of this? What if we are planning on pushing the page live later on? All of these pages have no title tag, meta description, image alt text, etc. Should I add them for each page? I am wondering what is the best step, especially if we are planning on pushing the page(s) live. Thanks for any help!
Intermediate & Advanced SEO | | aua0 -
Google doesn't index image slideshow
Hi, My articles are indexed and images (full size) via a meta in the body also. But, the images in the slideshow are not indexed, have you any idea? A problem with the JS Example : http://www.parismatch.com/People/Television/Sport-a-la-tele-les-femmes-a-l-abordage-962989 Thank you in advance Julien
Intermediate & Advanced SEO | | Julien.Ferras0 -
Google not Indexing images on CDN.
My URL is: http://bit.ly/1H2TArH We have set up a CDN on our own domain: http://bit.ly/292GkZC We have an image sitemap: http://bit.ly/29ca5s3 The image sitemap uses the CDN URLs. We verified the CDN subdomain in GWT. The robots.txt does not restrict any of the photos: http://bit.ly/29eNSXv. We used to have a disallow to /thumb/ which had a 301 redirect to our CDN but we removed both the disallow in the robots.txt as well as the 301. Yet, GWT still reports none of our images on the CDN are indexed.
Intermediate & Advanced SEO | | alphonsehaThe above screenshot is from the GWT of our main domain.The GWT from the CDN subdomain just shows 0. We did not submit a sitemap to the verified subdomain property because we already have a sitemap submitted to the property on the main domain name. While making a search of images indexed from our CDN, nothing comes up: http://bit.ly/293ZbC1While checking the GWT of the CDN subdomain, I have been getting crawling errors, mainly 500 level errors. Not that many in comparison to the number of images and traffic that we get on our website. Google is crawling, but it seems like it just doesn't index the pictures!?
Can anyone help? I have followed all the information that I was able to find on the web but yet, our images on the CDN still can't seem to get indexed.
0 -
How to setup multiple pages in Google Search?
How to setup multiple pages in Google Search? I have seen sites that are arranged in google like : Website in Google
Intermediate & Advanced SEO | | Hall.Michael
About us. Contact us
Services. Etc.. Kindly review screenshot. Is this can achieved by Yoast Plugin? X9vMMTw.png0 -
Why Google isn't indexing my images?
Hello, on my fairly new website Worthminer.com I am noticing that Google is not indexing images from my sitemap. Already 560 images submitted and Google indexed only 3 of them. Altough there is more images indexed they are not indexing any new images, and I have no idea why. Posts, categories and other urls are indexing just fine, but images not. I am using Wordpress and for sitemaps Wordpress SEO by yoast. Am I missing something here? Why Google won't index my images? Thanks, I appreciate any help, David xv1GtwK.jpg
Intermediate & Advanced SEO | | Worthminer1 -
Pages are Indexed but not Cached by Google. Why?
Here's an example: I get a 404 error for this: http://webcache.googleusercontent.com/search?q=cache:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all But a search for qjamba restaurant coupons gives a clear result as does this: site:http://www.qjamba.com/restaurants-coupons/ferguson/mo/all What is going on? How can this page be indexed but not in the Google cache? I should make clear that the page is not showing up with any kind of error in webmaster tools, and Google has been crawling pages just fine. This particular page was fetched by Google yesterday with no problems, and even crawled again twice today by Google Yet, no cache.
Intermediate & Advanced SEO | | friendoffood2 -
Wordpress blog in a subdirectory not being indexed by Google
HI MozzersIn my websites sitemap.xml, pages are listed, such as /blog/ and /blog/textile-fact-or-fiction-egyptian-cotton-explained/These pages are visible when you visit them in a browser and when you use the Google Webmaster tool - Fetch as Google to view them (see attachment), however they aren't being indexed in Google, not even the root directory for the blog (/blog/) is being indexed, and when we query:site: www.hilden.co.uk/blog/ It returns 0 results in Google.Also note that:The Wordpress installation is located at /blog/ which is a subdirectory of the main root directory which is managed by Magento. I'm wondering if this causing the problem.Any help on this would be greatly appreciated!AnthonyToTOHuj.png?1
Intermediate & Advanced SEO | | Tone_Agency0 -
Disallowed Pages Still Showing Up in Google Index. What do we do?
We recently disallowed a wide variety of pages for www.udemy.com which we do not want google indexing (e.g., /tags or /lectures). Basically we don't want to spread our link juice around to all these pages that are never going to rank. We want to keep it focused on our core pages which are for our courses. We've added them as disallows in robots.txt, but after 2-3 weeks google is still showing them in it's index. When we lookup "site: udemy.com", for example, Google currently shows ~650,000 pages indexed... when really it should only be showing ~5,000 pages indexed. As another example, if you search for "site:udemy.com/tag", google shows 129,000 results. We've definitely added "/tag" into our robots.txt properly, so this should not be happening... Google showed be showing 0 results. Any ideas re: how we get Google to pay attention and re-index our site properly?
Intermediate & Advanced SEO | | udemy0