Improving Crawl Efficieny
-
Hi
I'm reading about crawl efficiency & have looked in WMT at the current crawl rate - letting Google optimise this as recommended.
What it's set to is 0.5 requests every 2 seconds, which is 15 URLs every minute.
To me this doesn't sound very good, especially for a site with over 20,000 pages at least?
I'm reading about improving this but if anyone has advice that would be great
-
Great thank you for this! I'll take them on board
Becky
-
You may be overthinking this, Becky. Once the bot has crawled a page, there's no reason (or benefit to you) for it to crawl the page again unless its content has changed. The usual way for it to detect this is through your xml sitemap,. If it's properly coded, it will have a <lastmod>date for Googlebot to reference.
Googlebot does continue to recrawl pages it already knows about "just in case", but your biggest focus should be on ensuring that your most recently added content is crawled quickly upon publishing. This is where making sure your sitemap is updating quickly and accurately, making sure it is pinging search engines on update, and making sure you have links from solid existing pages to the new content will help. If you have blog content many folks don't know that you can submit the blog's RSS feed as an additional sitemap! That's one of the quickest ways to get it noticed.
The other thing you can do to assist the crawling effectiveness is to make certain you're not forcing the crawler to waste its time crawling superfluous, duplicate, thin, or otherwise useless URLs.</lastmod>
Hope that helps?
Paul
-
There are actually several aspects to your question.
1. Google will make its own decision as to how important pages and therefore how often it should be crawled
2. Site speed is a ranking factor
3. Most SEO's belief that Google has a maximum timeframe in which to crawl each page/site. However, I have seen some chronically slow sites which have still crawl and indexed.
I forgot to mention about using an xml site map can help search engines find pages.
Again, be very careful not to confuse crawling and indexing. Crawling is only updating the index, once indexed if it doesn't rank you have another SEO problem, not a technical crawling problem.
Any think a user can access a crawler should be able to find it no problem, however if you have hidden pages the crawler may not find them.
-
Hi
Yes working on that
I just read something which said - A “scheduler” directs Googlebot to crawl the URLs in the priority order, under the constraints of the crawl budget. URLs are being added to the list and prioritized.
So, if you have pages which havent been crawled/indexed as they're seen as a low priority for crawling - how can I improve or change this if need be?
Can I even impact it at all? Can I help crawlers be more efficient at finding/crawling pages I want to rank or not?
Does any of this even help SEO?
-
As a general rule pages will be indexed unless there is a technical issue or a penalty involved.
What you need to be more concerned with is the position of those pages within the index. That obviously comes back to the whole SEO game.
You can use the site parameter followed by a search term that is present on the page you want to check to make sure the pages indexed, like: site:domain.com "page name"
-
Ok thank you, so there must be ways to improve on the number of pages Google indexes?
-
You can obviously do a fetch and submit through search console, but that is designed for one-off changes. Even if you submit pages and make all sorts of signals Google will still make up its own mind what it's going to do and when.
If your content isn't changing much it is probably a disadvantage to have the Google crawler coming back too often as it will slow the site down. If a page is changing regularly the Google bot will normally gobble it pretty quick.
If it was me I would let you let it make its own decision, unless it is causing your problem.
Also keep in mind that crawl and index are two separate kettles of fish, Google crawler will crawl every site and every page that it can find, but doesn't necessarily index.
-
Hi - yes it's the default.
I know we can't figure out exactly what Google is doing, but we can improve crawl efficiency.
If those pages aren't being crawled for weeks, isnt there a way to improve this? How have you found out they haven't been crawled for weeks?
-
P.S. I think the crawl rate setting you are referring to is the Google default if you move the radio button to manual
-
Google is very clever working out how often it needs to crawl your site, pages that get updated more often will get crawled more often. There is no way of influencing exactly what the Google bot does, mostly it will make its own decisions.
If you are talking about other web crawlers, you may need to put guidelines in place in terms of robots.txt or settings on the specific control panel.
20,000 pages to Google isn't a problem! Yes, it may take time. You say it is crawling at '0.5 requests every 2 seconds' - if I've got my calculation right in theory Google will have crawled 20,000 URLs in less than a day!
On my site I have a page which I updated about 2 hours ago, and the change has already replicated to Google, and yet other pages I know for a fact haven't been crawled for weeks.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
When I crawl my website I have urls with (#!162738372878) at the end of my urls
When I crawl my website I have urls with (#!162738372878) at the end of my urls. I used screaming frog to look check my website and I seen these. My normal urls are in there too, but each of them have a copy with this strange symbol and number at the end. I used a website builder called homestead to make the website and I seen a bunch of there urls in my crawl as well - http://editor.homestead.com/faq is an example I recently created a new website with their new website builder and transferred it to my old domain. However, I didnt know they didnt offer 301 redirects or canonical tags(learned about those afterwards) and I changed my page names. So they recommended I leave the old website published along with the new website. So if I search my website name on google, sometimes both will show in the results. I just want to sort this all out somehow. My website is www.coastlinetvinstalls.com Any feedback is greatly appreciated. Thanks, Matt
Intermediate & Advanced SEO | | Matt160 -
Google crawling 200 page site thousands of times/day. Why?
Hello all, I'm looking at something a bit wonky for one of the websites I manage. It's similar enough to other websites I manage (built on a template) that I'm surprised to see this issue occurring. The xml sitemap submitted shows Google there are 229 pages on the site. Starting in the beginning of December Google really ramped up their intensity in crawling the site. At its high point Google crawled 13,359 pages in a single day. I mentioned I manage other similar sites - this is a very unusual spike. There are no resources like infinite scroll that auto generates content and would cause Google some grief. So follow up questions to my "why?" is "how is this affecting my SEO efforts?" and "what do I do about it?". I've never encountered this before, but I think limiting my crawl budget would be treating the symptom instead of finding the cure. Any advice is appreciated. Thanks! *edited for grammar.
Intermediate & Advanced SEO | | brettmandoes0 -
Do CTR manipulation services actually work to improve rankings?
I've seen a variety of services on the fringe of the SEO world that send a flow of (fake) traffic to your website via Google, to drive up your SERP CTR and site engagement. Seems gray hat, but I'm curious as to whether it actually works. The latest data I've seen from trustworthy sources (example and example 2) seems mixed on whether CTR has a direct impact on search rankings. Google claims it doesn't. I think it's possible it directly impacts rankings, or its possible Google is using some other metric to reward high engagement pages and CTR correlates with that. Any insight on whether CTR manipulation services actually work?
Intermediate & Advanced SEO | | AdamThompson1 -
Of the two examples of markup (microdata, schema) code below, which of the two is better designed for its purpose of Q&A, and what might be suggested to improve upon these lines of code (context: questions and answers within article content.
ANSWER SEEN 'WITHIN THE QUESTION' BRACKET So you ask, why is the sky blue?
Intermediate & Advanced SEO | | RedFrog
Well, the answer is not so simple; In the day-time, when it's clear and cloudless,
the sky is blue because molecules in the air scatter blue light from the sun more than they scatter red light.
When we look towards the sun at sunset, we see red and orange colours because the blue light has been scattered out and away from the line of sight. See Structured Data Testing Results 'QUESTION' AND 'ANSWER' IN 2 SEPARATE BRACKETS Why Is The Sky Blue? Well, the answer is not so simple; In the day-time, when it's clear and cloudless,
the sky is blue because molecules in the air scatter blue light from the sun more than they scatter red light.
When we look towards the sun at sunset, we see red and orange colours because the blue light has been scattered out and away from the line of sight. See Structured Data Testing Results Thanks, Mark0 -
How I can improve my website On page and Off page
My Website is guitarcontrol.com, I have very strong competition in market. Please advice me the list of improvements on my websites. In regarding ON page, Linkbuiding and Social media. What I can do to improve my website ranking?
Intermediate & Advanced SEO | | zoe.wilson170 -
Magento E-Commerce Crawl Issues
Hi Guys, First post here! I am responsible for a Magento e-commerce store and there are a few crawl issues and potential solutions that I am working and would like to get some advice to see if you agree with my approach. Old Product Pages - The majority of our stock is seasonal, therefore when a product sells out, it is not usually going to come back into stock. However the approach for Magento websites is to leave the page present but take the product off the category pages, so users can still find these pages from the search engines and they are orphaned pages as not linked to from elsewhere and not totally clear products are out of stock (just doesn't show the size pulldown or 'Add to Basket' button). There is no process in place to 301 redirect these pages either. My solution to this problem is to: 1. Change design of these pages so a clear message is shown to users that the product is out of stock and suggest related products to reduce bounce rates. I was also planning on having a link from an 'Out of Stock' page on the site to these products so they are orphaned but is this required do you think? 2. When I know for sure (e.g. over a month) that the product will not be returned (e.g. refund) by the user, then 301 redirect the product pages back to category page. How do other users 301 redirect their pages in Magento, I would like an easy to use system. Crawl Errors Identified in Google Webmaster Tools It seems in the last 2 weeks there has been a sharp increase in the number of soft 404 pages identified on the website. When I inspect these pages they seem to be categories and sub categories that no longer have any products in them. However, I don't want to delete these pages as new products might come in and go onto these category pages, therefore how should I approach this? A suggestion I have thought of is to put related products on to these pages? Any better ideas? Thanks, Graeme
Intermediate & Advanced SEO | | graeme19940 -
Crawl diagnostic how important is these 2 types of errors and what to do?
Hi,
Intermediate & Advanced SEO | | nicolaj1977
I am trying to SEO optimized my webpage dreamesatehuahin.com When I saw SEO Moz webpage crawl diagnostic I kind of got a big surprise due to the high no. of errors. I don’t know if this is the kind of errors that need to be taken very serious i my paticular case, When I am looking at the details I can see the errors are cause by the way my wordpress theme is put together. I don’t know how to resolve this. But If important I might hire a programmer. DUPLICATE ERRORS (40 ISSUES HIGH PRIORITY ACCORDING TO MOZ)
They are all the same as this one.
http://www.dreamestatehuahin.com/property-feature/restaurent/page/2/
is eaqual to this one
http://www.dreamestatehuahin.com/property-feature/restaurent/page/2/?view=list This one exsist
http://www.dreamestatehuahin.com/property-feature/car-park/
while a level down don’t exsit
http://www.dreamestatehuahin.com/property-feature/ DUPLICATE PAGE TITLE (806 ISSUES MEDIUM PRIORITY ACCORDING TO MOZ)
This is related to search results and pagination.
Etc. Title for each of these pages is the same
http://www.dreamestatehuahin.com/property-search/page/1 http://www.dreamestatehuahin.com/property-search/page/2 http://www.dreamestatehuahin.com/property-search/page/3 http://www.dreamestatehuahin.com/property-search/page/4 Title element is to long (405)
http://www.dreamestatehuahin.com/property-feature/fitness/?view=list
this is not what I consider real pages but maybe its actually is a page for google. The title from souce code is auto generated and in this case it not makes sense
<title>Fitness Archives - Dream Estate Hua Hin | Property For Sale And RentDream Estate Hua Hin | Property For Sale And Rent</title> I know at the moment there are properly more important things for our website like content, title, meta descriptions, intern and extern links and are looking into this and taking the whole optimization seriously. Have for instance just hired a content writer rewrite and create new content based on keywords research. I WOULD REALLY APPRICIATE SOME EXPERIENCE PEOPLE FEEDBACK ON HOW IMPORTANT IS IT THAT I FIX THIS ISSUES IF AT ALL POSSIBLE? best regards, Nicolaj1 -
Page 1 Reached, Further Page Improvements and What Next ?
Moz, I have a particularly tricky competitive keyword that i have finally climbed our website to the 10th position of page 1, i am particularly pleased about this as all of the website and content is German which i have little understanding of and i have little support on this, I am pleased with the content and layout of the page and i am monitoring all Google Analytics values very closely, as well as the SERP positions, So as far as further progression with this page and hopefully climbing further up page 1, where do you think i should focus my efforts ? Page Speed optimization?, Building links to this page ?, blogging on this topic (with links) , Mobile responsive design (More difficult), further improvements to pages and content linked from this page ? Further improvements to the website in general?,further effort on tracking visitors and user experience monitoring (Like setting up Crazyegg or something?) Any other ideas would be greatly appreciated, Thanks all, James
Intermediate & Advanced SEO | | Antony_Towle0