What To Do About Yahoo Slurp Bot Bogging My Site Down?
-
Hello,
Our IT department has informed me that they have seen extremely heavy traffic from the Yahoo Slurp bot in recent days. They are claiming this bot has single-handedly caused one of our servers to crash.
I am a bit skeptical of this, as I have not found these particular legitimate search engine bots to be aggressive resource hogs, especially for an enterprise-level web server.
I have requested to examine the server logs myself, but have not had success with this. IT is requesting to block this particular bot, but I am apprehensive about doing this, as I don't want this to have any negative implications on our site showing in Yahoo News or other Yahoo properties.
Does anyone else have experience with this bot being an overly-zealous resource drag, and if so, what is the best course of action to satisfy all parties?
-
Examining the server logs yourself probably wont help your understanding of the issue unless you know what your looking at specifically. On the Yahoo note, i have found Slurp to be really bad in the past, but no legitimate bot should be able to bring down a properly configured web server, especially an 'enterprise-level' one.
I would check your .htaccess and apache settings for bad redirects (or web.conf if on windows) before considering banning the bot. Other things to check would be website code or if a bot hits a massive and horribly optimised Database Query for example, that could bring the server down.
Ask IT exactly what the bot did that caused the server to go down, they should atleast be able to tell you that. If not then they need to run load tests against the website itself to try and reproduce the scenario and thus debug the issue, if indeed there is one.
Tl;dr :- Normally bad config or code / queries are to blame for this kind of thing. I'd review that before blocking a bot that crawls hundreds of thousands of other sites without issue.
-
You should be able to can control the rate at which the bot accesses you pages by adding a crawl delay in your robots.txt file. Robots.txt and crawl delay is discussed here: http://en.wikipedia.org/wiki/Robots_exclusion_standard, and Slurp bot here: https://help.yahoo.com/kb/SLN22600.html.
Should look like this in your robots.txt file:
User-agent: Slurp
Crawl-delay: 30
The crawl delay is the number of seconds the bot should wait between pageview (ask your IT guys what's appropriate for you). I stuck 30 in there, meaning the Slurp bot would only be able to access up to 2 pages a minute.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
On-site Search - Revisited (again, *zZz*)
Howdy Moz fans! Okay so there's a mountain of information out there on the webernet about internal search results... but i'm finding some contradiction and a lot of pre-2014 stuff. Id like to hear some 2016 opinion and specifically around a couple of thoughts of my own, as well as some i've deduced from other sources. For clarity, I work on a large retail site with over 4 million products (product pages), and my predicament is thus - I want Google to be able to find and rank my product pages. Yes, I can link to a number of the best ones by creating well planned links via categorisation, silos, efficient menus etc (done), but can I utilise site search for this purpose? It was my understanding that Google bots don't/can't/won't use a search function... how could it? It's like expeciting it to find your members only area, it can't login! How can it find and index the millions of combinations of search results without typing in "XXXXL underpants" and all the other search combinations? Do I really need to robots.txt my search query parameter? How/why/when would googlebot generate that query parameter? Site Search is B.A.D - I read this everywhere I go, but is it really? I've read - "It eats up all your search quota", "search results have no content and are classed as spam", "results pages have no value" I want to find a positive SEO output to having a search function on my website, not just try and stifle Mr Googlebot. What I am trying to learn here is what the options are, and what are their outcomes? So far I have - _Robots.txt - _Remove the search pages from Google _No Index - _Allow the crawl but don't index the search pages. _No Follow - _I'm not sure this is even a valid idea, but I picked it up somewhere out there. _Just leave it alone - _Some of your search results might get ranked and bring traffic in. It appears that each and every option has it's positive and negative connotations. It'd be great to hear from this here community on their experiences in this practice.
Intermediate & Advanced SEO | | Mark_Elton0 -
Is my site being penalized?
I've gone through all the points on https://moz.com/blog/technical-site-audit-for-2015 but the site only ranks for its brand name after months. The website is not ranking in the top 100 for any main keywords (2,3,4 word phrases), only for a handful of very long phrases (4+). All of the content is unique, all pages are indexed, the website is fast and doesn't contain any crawl errors and there are a couple of links pointing to it. There is a sitewide follow link in the footer pointing to another domain, its parent company and vice-versa. This is not done for any SEO reasons but the companies are related and also the products are supplementary of each other. Could this be an issue? Or is my site being penalized by something else?
Intermediate & Advanced SEO | | Robbern0 -
Site been plagiarised - duplicate content
Hi, I look after two websites, one sells commercial mortgages the other sells residential mortgages. We recently redesigned both sites, and one was moved to a new domain name as we rebranded it from being a trading style of the other brand to being a brand in its own right. I have recently discovered that one of my most important pages on the residential mortgages site is not in Google's index. I did a bit of poking around with Copyscape and found another broker has copied our page almost word-for-word. I then used copyscape to find all the other instances of plagiarism on the other broker's site and there are a few! It now looks like they have copied pages from our commercial mortgages site as well. I think the reason our page has been removed from the index is that we relaunced both these sites with new navigation and consequently new urls. Can anyone back me up on this theory? I am 100% sure that our page is the original version because we write everything in-house and I check it with copyscape before it gets published, Also the fact that this other broker has copied from several different sites corroborates this view. Our legal team has written two letters (not sent yet) - one to the broker and the other to the broker's web designer. These letters ask the recipient to remove the copied content within 14 days. If they do remove our content from our site, how do I get Google to reindex our pages, given that Google thinks OUR pages are the copied ones and not the other way around? Does anyone have any experience with this? Or, will it just happen automatically? I have no experience of this scenario! In the past, where I've found duplicate content like this, I've just rewritten the page, and chalked it up to experience but I don't really want to in this case because, frankly, the copy on these pages is really good! And, I don't think it's fair that someone else could potentially be getting customers that were persuaded by OUR copy. Any advice would be greatly appreciated. Thanks, Amelia
Intermediate & Advanced SEO | | CommT0 -
Why is this site not indexed by Google?
Hi all and thanks for your help in advance. I've been asked to take a look at a site, http://www.yourdairygold.ie as it currently does not appear for its brand name, Your Dairygold on Google Ireland even though it's been live for a few months now. I've checked all the usual issues such as robots.txt (doesn't have one) and the robots meta tag (doesn't have them). The even stranger thing is that the site does rank on Yahoo! and Bing. Google Webmaster Tools shows that Googlebot is crawling around 150 pages a day but the total number of pages indexed is zero. It does appear if you carry out a site: search on Google however. The site is very poorly optimised in terms of title tags, unnecessary redirects etc which I'm working on now but I wondered if you guys had any further insights. Thanks again for your help.
Intermediate & Advanced SEO | | iProspect-Ireland0 -
Baffled why my site is not improving in rankings.
Site shows up in the Map results when ever Google shows them. But for all other organic terms site ranks way back. Have lots of unique content and one page grade of an A. The site is http://alexpadillabailbonds.com The main page is optimized for "sacramento bail bonds" with a Moz grade of A yet its not included in the search results. It was before. Any insight from any one will greatly help. Thanks.
Intermediate & Advanced SEO | | andreyzolnikov0 -
Removing A Blog From Site...
Hi Everyone, One of my clients I am doing marketing consulting for is a big law firm. For the past 3 years they have been paying someone to write blog posts everyday in hopes of improving search traffic to site. The blog did indeed increase traffic to the site, but analyzing the stats, the firm generates no leads (via form or phone) from any of the search traffic that lands in the blog. Furthermore, I'm seeing Google send many search queries that people use to get to the site to blog pages, when it would be much more beneficial to have that traffic go to the main part of the website. In short, the law firm's blog provides little to no value to end users and was written entirely for SEO purposes. Now the law firm's website has 6,000 unique pages, and only 400 pages of the site are NON-blog pages (the good stuff, essentially). About 35% of the site's total site traffic lands on the blog pages from search, but again... this traffic does not convert, has very high bounce rate and I doubt there is any branding benefit either. With all that said, I didn't know if it would be best to delete the blog, redirect blog pages to some other page on the site, etc? The law firm has ceased writing new blog posts upon my recommendation, as well. I am afraid of doing something ill-advised with the blog since it accounts now for 95% of the pages of the website. But again, it's useless drivel in my eyes that adds no value and was simply a misguided SEO effort from another marketer that heard blogs are good for SEO. I would certainly appreciate any guidance or advice on how best to handle this situation. Thank you for your kind help!
Intermediate & Advanced SEO | | gbkevin0 -
SEO for bigcommerce site
I have a site on bigcommerce platform .from Where do i need start SEO for these types of ecommerce sites.Looking for Experts ideas . Thank you.
Intermediate & Advanced SEO | | innofidelity0 -
Badges For a B2b site
love this seo tactic but it seems hard to get people to adopt it Has anyone seen a successful badge campaign for a b2b site? please provide examples if you can.
Intermediate & Advanced SEO | | DavidKonigsberg0