Crawl efficiency - Page indexed after one minute!
-
Hey Guys,A site that has 5+ million pages indexed and 300 new pages a day.I hear a lot that sites at this level its all about efficient crawlabitliy.The pages of this site gets indexed one minute after the page is online.1) Does this mean that the site is already crawling efficient and there is not much else to do about it?2) By increasing crawlability efficiency, should I expect gogole to crawl my site less (less bandwith google takes from my site for the same amount of crawl)or to crawl my site more often?Thanks
-
This is a complicated question that I can't give a simple answer for, as every site is set-up differently and has it's own challenges. You will likely use a variety of the techniques mentioned in my last paragraph above. Good luck.
-
Thanks Anthony,
Your explanation was very helpful.
Assuming that 3 millions pages out of my 5 are not so important for google to be crawling or indexing.
What would be the best way to optimize my crawl efficiency in relation to the amount of pages?
Just <noindex>3 million pages on the site, I believe this can be a risk move.</noindex>
Perhaps robots.txt but that would not de-index the existing pages.
-
Crawl efficiency isn't exactly the same as indexation speed. It is normal for a new page to be indexed quickly, often times it is linked to from the blog home page, shared on social networks, etc.
Crawl efficiency has a lot to do with making sure your most important pages are crawled as frequently as possible. Let's use the example of your site with 5,000,000 pages indexed. Perhaps there are 100,000 of those pages that are extremely important for your website. Your top categories, all of your products, your content, etc.
Then you are left with 4,900,000 pages that are not that important, but needed for the functionality of your website (pagination, filtering, sorting, etc). You have to determine, is it a good thing that Google has 5 million pages of your site indexed? Do you want Google regularly crawling those 4,900,000 pages, potentially at the expense of your more important pages?
Next, you check your Google Webmaster Tools and see that Google is crawling about 130,000 pages/day on your site. At that rate, it would take Google 38 days (over an entire month) to crawl your entire site. Of course, it doesn't actually work that way - Google will crawl your site in a logical manor, crawling the pages with high authority (well linked to internally/externally) much more often. The point is, you can see that not all of your pages are being crawled every day. You want your best content crawled as frequently as possible.
"To be more blunt, if a page hasn't been crawled recently, it won't rank well." This quote is taken from one of my favorite resources on this topic, is this post by AJ Kohn. http://www.blindfiveyearold.com/crawl-optimization
Crawl efficiency is guiding the search spiders to your best content and helping them learn what types of pages you can ignore. You do this primarily through: Site Structure, Internal Linking, robots.txt, NoFollow attribute and Parameter Handling in Google Webmaster Tools.
-
You can actually let Google know about a new mass of pages through the sitemap. The sitemap is a single file what can be parsed to produce a large list of links.
Google can discover new pages by comparing the list of links with what they know about.
Here's an intro link that covers the sitemap: http://blog.kissmetrics.com/get-google-to-index/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How to replace an already ranked page with a better, more optimised one?
Hello peeps! I need your collective wisdom to help me deal with something. We manage a website that is doing quite well in its niche, however we have the following problem: Our section landing pages are well established and they rank for a wide range of search terms, including some with a transactional focus. It is obvious that these pages do not cater for users with transactional intent. Our competitors are targeting those transactional keywords with a completely different type of pages, and are winning across the board (annoying but understandable). We have now created a number of pages, which are very similar to the ones that our competitors are using and with an even better on-page SEO score ... WIN! ...well, not so much! Our old section pages are still ranking for the transactional search terms and our new pages are getting very little traction and are having a really slow start. 1. I suspect there is some sort of page cannibalisation going on. How would you address that?
Intermediate & Advanced SEO | | Yordan.Vasilev
2. Is there a tried and tested way of telling search engines to rank your new page because it meets the search intent in a better way? Please note that we cannot just redirect the old page to the new one - there are structural and commercial reasons for keeping the old page as it is.
3. Is there anything else that I am missing? Your help is much appreciated.
Thanks
Yordan0 -
No Index No follow instead of Rel canoncical on product pages
Hi all, we handle our product pages no with rel canonical now, we have 1 url that is indexed http://www.prams.net/cam-combi-family the other colours have different urls like http://www.prams.net/cam-combi-family-3-in-1-pram-reversible-seat-car-seat-grey-d which canonicalize to the indexed page. Google still crawls all those pages. For crawl budget reasons we want to use "no index, no follow" instead on these pages (the pages for the other colours)? Google would then crawl fewer pages more often? Does this make sense? Are their any downsides doing it? Thanks in advance Dieter
Intermediate & Advanced SEO | | Storesco1 -
How to associate content on one page to another page
Hi all, I would like associate content on "Page A" with "Page B". The content is not the same, but we want to tell Google it should be associated. Is there an easy way to do this?
Intermediate & Advanced SEO | | Viewpoints1 -
How do I find the links on my site that link to another one of my pages?
I ran IIS Seo toolkit and it found about 40 pages that I have no idea how they exist. What tool can I use to find out what internal link is linking to them so I can fix them or get rid of them?
Intermediate & Advanced SEO | | EcommerceSite0 -
Splitting one page into two
Good day everyone! If you have a page that ranks well for two highly competitive, yet mutually irrelevant, terms, but that the page will be split into two as part of a website redesign, would you 301 it to term X or term Y? What criteria do you use? Are there any other things I should do to avoid the wrong page ranking for the wrong term? I don't want users searching for term X to end up in page Y. Thanks!
Intermediate & Advanced SEO | | andrep0 -
Why my own page is not indexed for that keyword?
hi, I recently recreated the page www.zenucchi.it /ITA/poltrona-frau-brescia.html on the third level domain poltronafraubrescia.zenucchi.it by putting it on the home page. The first page is still indexed for the keyword poltrona frau brescia . But the new page is no indexed for that keyword and i don't know why ( even if the page is indexed in google ) .. I state that the new domain has the same autorithy and that i put a 301 redirect to pass his authority to the new one that has many more incoming links that did not have previous .. i hope you'll help me thanks a lot
Intermediate & Advanced SEO | | guidoboem0 -
How do I increase rankings when the indexed page is the homepage?
Hi Forum, This is a two-part question. The first is: "what may be the cause of some rank declines?" and the second is "how do I bring them back up when the indexed page is the homepage?" Over the last week I noticed some declines in several of my top keywords, many of which point to the site's homepage. The site itself is an eCommerce site, which had less visits last week than normal (holidays it seems, since the data jibes with key dates). Can a decline in traffic cause ranking declines? Any other ideas of where to look? Secondly, for those keywords that link to the homepage, how do we bring these back up since a homepage can't be optimized for every single keyword? We sell yoga products and can't have a homepage that is optimized for keywords like "yoga mat," "yoga blocks," "yoga pilates clothing," and several others, as these are our category pages' keywords. Any thoughts? Thanks!
Intermediate & Advanced SEO | | pano0 -
Should I index tag pages?
Should I exclude the tag pages? Or should I go ahead and keep them indexed? Is there a general opinion on this topic?
Intermediate & Advanced SEO | | NikkiGaul0