Crawl efficiency - Page indexed after one minute!
-
Hey Guys,A site that has 5+ million pages indexed and 300 new pages a day.I hear a lot that sites at this level its all about efficient crawlabitliy.The pages of this site gets indexed one minute after the page is online.1) Does this mean that the site is already crawling efficient and there is not much else to do about it?2) By increasing crawlability efficiency, should I expect gogole to crawl my site less (less bandwith google takes from my site for the same amount of crawl)or to crawl my site more often?Thanks
-
This is a complicated question that I can't give a simple answer for, as every site is set-up differently and has it's own challenges. You will likely use a variety of the techniques mentioned in my last paragraph above. Good luck.
-
Thanks Anthony,
Your explanation was very helpful.
Assuming that 3 millions pages out of my 5 are not so important for google to be crawling or indexing.
What would be the best way to optimize my crawl efficiency in relation to the amount of pages?
Just <noindex>3 million pages on the site, I believe this can be a risk move.</noindex>
Perhaps robots.txt but that would not de-index the existing pages.
-
Crawl efficiency isn't exactly the same as indexation speed. It is normal for a new page to be indexed quickly, often times it is linked to from the blog home page, shared on social networks, etc.
Crawl efficiency has a lot to do with making sure your most important pages are crawled as frequently as possible. Let's use the example of your site with 5,000,000 pages indexed. Perhaps there are 100,000 of those pages that are extremely important for your website. Your top categories, all of your products, your content, etc.
Then you are left with 4,900,000 pages that are not that important, but needed for the functionality of your website (pagination, filtering, sorting, etc). You have to determine, is it a good thing that Google has 5 million pages of your site indexed? Do you want Google regularly crawling those 4,900,000 pages, potentially at the expense of your more important pages?
Next, you check your Google Webmaster Tools and see that Google is crawling about 130,000 pages/day on your site. At that rate, it would take Google 38 days (over an entire month) to crawl your entire site. Of course, it doesn't actually work that way - Google will crawl your site in a logical manor, crawling the pages with high authority (well linked to internally/externally) much more often. The point is, you can see that not all of your pages are being crawled every day. You want your best content crawled as frequently as possible.
"To be more blunt, if a page hasn't been crawled recently, it won't rank well." This quote is taken from one of my favorite resources on this topic, is this post by AJ Kohn. http://www.blindfiveyearold.com/crawl-optimization
Crawl efficiency is guiding the search spiders to your best content and helping them learn what types of pages you can ignore. You do this primarily through: Site Structure, Internal Linking, robots.txt, NoFollow attribute and Parameter Handling in Google Webmaster Tools.
-
You can actually let Google know about a new mass of pages through the sitemap. The sitemap is a single file what can be parsed to produce a large list of links.
Google can discover new pages by comparing the list of links with what they know about.
Here's an intro link that covers the sitemap: http://blog.kissmetrics.com/get-google-to-index/
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Webshop landing pages and product pages
Hi, I am doing extensive keyword research for the SEO of a big webshop. Since this shop sells technical books and software (legal books, tax software and so on), I come across a lot of very specific keywords for separate products. Isn't it better to try and rank in the SERP's with all the separate product pages, instead of with the landing (category) pages?
Intermediate & Advanced SEO | | Mat_C0 -
How to 301 Redirect /page.php to /page, after a RewriteRule has already made /page.php accessible by /page (Getting errors)
A site has its URLs with php extensions, like this: example.com/page.php I used the following rewrite to remove the extension so that the page can now be accessed from example.com/page RewriteCond %{REQUEST_FILENAME}.php -f
Intermediate & Advanced SEO | | rcseo
RewriteRule ^(.*)$ $1.php [L] It works great. I can access it via the example.com/page URL. However, the problem is the page can still be accessed from example.com/page.php. Because I have external links going to the page, I want to 301 redirect example.com/page.php to example.com/page. I've tried this a couple of ways but I get redirect loops or 500 internal server errors. Is there a way to have both? Remove the extension and 301 the .php to no extension? By the way, if it matters, page.php is an actual file in the root directory (not created through another rewrite or URI routing). I'm hoping I can do this, and not just throw a example.com/page canonical tag on the page. Thanks!0 -
Best way to get pages indexed fast?
Any suggestion on best ways to get new sites pages indexed? Was thinking getting high pr inbound links on fiverr but always a little risky right? Thanks for your opinions.
Intermediate & Advanced SEO | | mweidner27820 -
Links to images on a page diluting page value?
We have been doing some testing with additional images on a page. For example, the page here:
Intermediate & Advanced SEO | | Peter264
http://flyawaysimulation.com/downloads/files/2550/sukhoi-su-27-flanker-package-for-fsx/ Notice the images under the heading Images/Screenshots After adding these images, we noticed a ranking drop for that page (-27 places) in the SERPS. Could the large amount of images - in particular the links on the images (links to the larger versions) be causing it to dilute the value of the actual page? Any suggestions, advice or opinions will be much appreciated.0 -
Wrong Page Indexing in SERPS - Suggestions?
Hey Moz'ers! I have a quick question. Our company (Savvy Panda) is working on ranking for the keyword: "Milwaukee SEO". On our website, we have a page for "Milwaukee SEO" in our services section that's optimized for the keyword and we've been doing link building to this. However, when you search for "Milwaukee SEO" a different page is being displayed in the SERP's. The page that's showing up in the SERP's is a category view of our blog of articles with the tag "Milwaukee SEO". **Is there a way to alert google that the page showing up in the SERP's is not the most relevant and request a new URL to be indexed for that spot? ** I saw a webinar awhile back that showed something like that using google webmaster sitelinks denote tool. I would hate to denote that URL and then loose any kind of indexing for the keyword.
Intermediate & Advanced SEO | | SavvyPanda
Ideas, suggestions?0 -
Sitemap not indexing pages
My website has about 5000 pages submitted in the sitemap but only 900 being indexed. When I checked Google Webmaster Tools about a week ago 4500 pages were being indexed. Any suggestions about what happened or how to fix it? Thanks!
Intermediate & Advanced SEO | | theLotter0 -
Ranking with other pages not index
The site ranks on page 4-5 with other page like privacy, about us, term pages. I encounter this problem allot in the last weeks; this usually occurs after the page sits 1-2 months on page 1 for the terms. I'm thinking of to much use the same anchor as a primary issue. The sites in questions are 1-5 pages microniche sites. Any suggestions is appreciated. Thank You
Intermediate & Advanced SEO | | m3fan0 -
404'd pages still in index
I recently launched a site and shortly after performed a URL rewrite (not the greatest idea, i know). The developer 404'd the old pages instead of a permanent 301 redirect. This caused a mess in the index. I have tried to use Google's removal tool to remove these URL's from the index. These pages were being removed but now I am finding them in the index as just URL's to the 404'd page (i.e. no title tag or meta description). Should I wait this out or now go back and 301 redirect the old URL's (that are 404'd now) to the new URL's? I am sure this is the reason for my lack of ranking as the rest of my site is pretty well optimized and I have some quality links.
Intermediate & Advanced SEO | | mj7750