Getting More Pages Indexed
-
We have a large E-commerce site (magento based) and have submitted sitemap files for several million pages within Webmaster tools. The number of indexed pages seems to fluctuate, but currently there is less than 300,000 pages indexed out of 4 million submitted. How can we get the number of indexed pages to be higher? Changing the settings on the crawl rate and resubmitting site maps doesn't seem to have an effect on the number of pages indexed.
Am I correct in assuming that most individual product pages just don't carry enough link juice to be considered important enough yet by Google to be indexed? Let me know if there are any suggestions or tips for getting more pages indexed.
-
I think that is what did it! lol
-
Yes, you will need internal links to establish your site navigation. Then, external links if you don't have enough PR flow from within your site.
Some powerful sites can support these millions of pages with internal links. If you have a site like that congratulations!
-
Thanks this is helpful. I will work on funneling spiders. I assume I'll need a healthy dose of both internal and external links pointing deep into the site in order to get the spider to start chewing in there? Thanks.
-
Thanks! You enjoyed the chewing spiders?
-
Wow...that was a powerful answer EGOL. Thanks for adding the long answer. The way you worded it created a visualization for me that was very helpful to my understanding as a novice. Thanks.
-
The short answer...
Link deep into the site at multiple points with heavy PR.
The long answer...
If you have a really big site you need a lot of linkjuice to get the entire site indexed and keep it in the index. You also need a good site structure so that spiders can crawl through the site and find every page.
If you have several million pages, my guess is that you will need hundreds of links of at least PR5 or PR6 linking into the site. I would direct each of those links to a deep category page. That will funnel the spiders deep into your site and force them to chew their way out while indexing your pages.
All of those links must be held permanently in place. Because if you pull the links the flow of spiders will stop and google will slowly forget about pages that are not visited by spiders on a regular basis.
If you have weak links or not enough links your site will not be thoroughly crawled and google will forget about your pages as fast as they are discovered.
Big sites require a PR resource.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Best practices for types of pages not to index
Trying to better understand best practices for when and when not use a content="noindex". Are there certain types of pages that we shouldn't want Google to index? Contact form pages, privacy policy pages, internal search pages, archive pages (using wordpress). Any thoughts would be appreciated.
Technical SEO | | RichHamilton_qcs0 -
Google showing https:// page in search results but directing to http:// page
We're a bit confused as to why Google shows a secure page https:// URL in the results for some of our pages. This includes our homepage. But when you click through it isn't taking you to the https:// page, just the normal unsecured page. This isn't happening for all of our results, most of our deeper content results are not showing as https://. I thought this might have something to do with Google conducting searches behind secure pages now, but this problem doesn't seem to affect other sites and our competitors. Any ideas as to why this is happening and how we get around it?
Technical SEO | | amiraicaew0 -
Why is this page not ranking but is indexed?
I have a page http://jobs.hays.co.uk/jobs-in-norfolk and it is indexed by Google but will not show up for any keywords I try. Any ideas?
Technical SEO | | S_Curtis0 -
Index page
To the SEO experts, this may well seem a silly question, so I apologies in advance as I try not to ask questions that I probably know the answer for already, but clarity is my goal I have numerous sites ,as standard practice, through the .htaccess I will always set up non www to www, and redirect the index page to www.mysite.com. All straight forward, have never questioned this practice, always been advised its the ebst practice to avoid duplicate content. Now, today, I was looking at a CMS service for a customer for their website, the website is already built and its a static website, so the CMS integration was going to mean a full rewrite of the website. Speaking to a friend on another forum, he told me about a service called simple CMS, had a look, looks perfect for the customer ... Went to set it up on the clients site and here is the problem. For the CMS software to work, it MUST access the index page, because my index page is redirected to www.mysite.com , it wont work as it cant find the index page (obviously) I questioned this with the software company, they inform me that it must access the index page, I have explained that it wont be able to and why (cause I have my index page redirected to avoid duplicate content) To my astonishment, the person there told me that duplicate content is a huge no no with Google (that's not the astonishing part) but its not relevant to the index and non index page of a website. This goes against everything I thought I knew ... The person also reassured me that they have worked within the SEO area for 10 years. As I am a subscriber to SEO MOZ and no one here has anything to gain but offering advice, is this true ? Will it not be an issue for duplicate content to show both a index page and non index page ?, will search engines not view this as duplicate content ? Or is this SEO expert talking bull, which I suspect, but cannot be sure. Any advice would be greatly appreciated, it would make my life a lot easier for the customer to use this CMS software, but I would do it at the risk of tarnishing the work they and I have done on their ranking status Many thanks in advance John
Technical SEO | | Johnny4B0 -
132 pages reported as having Duplicate Page Content but I'm not sure where to go to fix the problems?
I am seeing “Duplicate Page Content” coming up in our
Technical SEO | | danatanseo
reports on SEOMOZ.org Here’s an example: http://www.ccisolutions.com/StoreFront/product/williams-sound-ppa-r35-e http://www.ccisolutions.com/StoreFront/product/aphex-230-master-voice-channel-processor http://www.ccisolutions.com/StoreFront/product/AT-AE4100.prod These three pages are for completely unrelated products.
They are returning “200” status codes, but are being identified as having
duplicate page content. It appears these are all going to the home page, but it’s
an odd version of the home page because there’s no title. I would understand if these pages 301-redirected to the home page if they were obsolete products, but it's not a 301-redirect. The referring page is
listed as: http://www.ccisolutions.com/StoreFront/category/cd-duplicators None of the 3 links in question appear anywhere on that page. It's puzzling. We have 132 of these. Can anyone help me figure out
why this is happening and how best to fix it? Thanks!0 -
How do we ensure our new dynamic site gets indexed?
Just wondering if you can point me in the right direction. We're building a 'dynamically generated' website, so basically, pages don’t technically exist until the visitor types in the URL (or clicks an on page link), the pages are then created on the fly for the visitor. The major concern I’ve got is that Google won’t be able to index the site, as the pages don't exist until they're 'visited', and to top it off, they're rendered in JSPX, which makes things tricky to ensure the bots can view the content We’re going to build/submit a sitemap.xml to signpost the site for Googlebot but are there any other options/resources/best practices Mozzers could recommend for ensuring our new dynamic website gets indexed?
Technical SEO | | Hutch_e0 -
301 redirecting some pages directly, and the rest to a single page
I've read through the Redirect guide here already but can't get this down in my .htaccess I want to redirect some pages specifically (/contactinfo.html to the new /contact.php) And I want all other pages (not all have equivalent pages on the new site) to redirect to my new (index.php) homepage. How can I set it up so that some specific pages redirect directly, and all others go to one page? I already have the specific oldpage.html -> newpage.php redirects in place, just need to figure out the broad one for everything else.
Technical SEO | | RyanWhitney150 -
Importance of an optimized home page (index)
I'm helping a client redesign their website and they want to have a home page that's primarily graphics and/or flash (or jquery). If they are able to optimize all of their key sub-pages, what is the harm in terms of SEO?
Technical SEO | | EricVallee340