Improving Crawl Efficieny

BeckyKey

Hi

I'm reading about crawl efficiency & have looked in WMT at the current crawl rate - letting Google optimise this as recommended.

What it's set to is 0.5 requests every 2 seconds, which is 15 URLs every minute.

To me this doesn't sound very good, especially for a site with over 20,000 pages at least?

I'm reading about improving this but if anyone has advice that would be great

BeckyKey

Great thank you for this! I'll take them on board

Becky

ThompsonPaul

You may be overthinking this, Becky. Once the bot has crawled a page, there's no reason (or benefit to you) for it to crawl the page again unless its content has changed. The usual way for it to detect this is through your xml sitemap,. If it's properly coded, it will have a <lastmod>date for Googlebot to reference.

Googlebot does continue to recrawl pages it already knows about "just in case", but your biggest focus should be on ensuring that your most recently added content is crawled quickly upon publishing. This is where making sure your sitemap is updating quickly and accurately, making sure it is pinging search engines on update, and making sure you have links from solid existing pages to the new content will help. If you have blog content many folks don't know that you can submit the blog's RSS feed as an additional sitemap! That's one of the quickest ways to get it noticed.

The other thing you can do to assist the crawling effectiveness is to make certain you're not forcing the crawler to waste its time crawling superfluous, duplicate, thin, or otherwise useless URLs.</lastmod>

Hope that helps?

Paul

seoman10

There are actually several aspects to your question.

1. Google will make its own decision as to how important pages and therefore how often it should be crawled

2. Site speed is a ranking factor

3. Most SEO's belief that Google has a maximum timeframe in which to crawl each page/site. However, I have seen some chronically slow sites which have still crawl and indexed.

I forgot to mention about using an xml site map can help search engines find pages.

Again, be very careful not to confuse crawling and indexing. Crawling is only updating the index, once indexed if it doesn't rank you have another SEO problem, not a technical crawling problem.

Any think a user can access a crawler should be able to find it no problem, however if you have hidden pages the crawler may not find them.

BeckyKey

Hi

Yes working on that

I just read something which said - A “scheduler” directs Googlebot to crawl the URLs in the priority order, under the constraints of the crawl budget. URLs are being added to the list and prioritized.

So, if you have pages which havent been crawled/indexed as they're seen as a low priority for crawling - how can I improve or change this if need be?

Can I even impact it at all? Can I help crawlers be more efficient at finding/crawling pages I want to rank or not?

Does any of this even help SEO?

seoman10

As a general rule pages will be indexed unless there is a technical issue or a penalty involved.

What you need to be more concerned with is the position of those pages within the index. That obviously comes back to the whole SEO game.

You can use the site parameter followed by a search term that is present on the page you want to check to make sure the pages indexed, like: site:domain.com "page name"

BeckyKey

Ok thank you, so there must be ways to improve on the number of pages Google indexes?

seoman10

You can obviously do a fetch and submit through search console, but that is designed for one-off changes. Even if you submit pages and make all sorts of signals Google will still make up its own mind what it's going to do and when.

If your content isn't changing much it is probably a disadvantage to have the Google crawler coming back too often as it will slow the site down. If a page is changing regularly the Google bot will normally gobble it pretty quick.

If it was me I would let you let it make its own decision, unless it is causing your problem.

Also keep in mind that crawl and index are two separate kettles of fish, Google crawler will crawl every site and every page that it can find, but doesn't necessarily index.

BeckyKey

Hi - yes it's the default.

I know we can't figure out exactly what Google is doing, but we can improve crawl efficiency.

If those pages aren't being crawled for weeks, isnt there a way to improve this? How have you found out they haven't been crawled for weeks?

seoman10

P.S. I think the crawl rate setting you are referring to is the Google default if you move the radio button to manual

seoman10

Google is very clever working out how often it needs to crawl your site, pages that get updated more often will get crawled more often. There is no way of influencing exactly what the Google bot does, mostly it will make its own decisions.

If you are talking about other web crawlers, you may need to put guidelines in place in terms of robots.txt or settings on the specific control panel.

20,000 pages to Google isn't a problem! Yes, it may take time. You say it is crawling at '0.5 requests every 2 seconds' - if I've got my calculation right in theory Google will have crawled 20,000 URLs in less than a day!

On my site I have a page which I updated about 2 hours ago, and the change has already replicated to Google, and yet other pages I know for a fact haven't been crawled for weeks.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Improving Crawl Efficieny

Got a burning SEO question?

Browse Questions

Explore more categories

Related Questions

Crawl Stats Decline After Site Launch (Pages Crawled Per Day, KB Downloaded Per Day)

May integrating my main category page in the index page improve my ranking of main category keyword?

When Mobile and Desktop sites have the same page URLs, how should I handle the 'View Desktop Site' link on a mobile site to ensure a smooth crawl?

How can we improve the seo on our site?

Will Google bots crawl tablet optimized pages of our site?

Stop Google crawling a site at set times

Can the experts out here can review our site for improved performance and suggestions

Old pages still crawled by SE returning 404s. Better to put 301 or block with robots.txt ?