Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
May know what's the meaning of these parameters in .htaccess?
-
Begin HackRepair.com Blacklist
RewriteEngine on
Abuse Agent Blocking
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bolt\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CazoodleBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Default\ Browser\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} discobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ecxi [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GT::WWW [NC,OR]
RewriteCond %{HTTP_USER_AGENT} heritrix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTP::Lite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IDBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} id-search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} id-search.org [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IRLbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ISC\ Systems\ iRc\ Search\ 2.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LinksManager.com_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} linkwalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Maxthon$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MFC_Tear_Sample [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^microsoft.url [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft\ URL\ Control [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} panscient.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PECL::HTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PeoplePal [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PHPCrawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PleaseCrawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Rippers\ 0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SBIder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SeaMonkey$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Snoopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Steeler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Toata\ dragostea\ mea\ pentru\ diavola [NC,OR]
RewriteCond %{HTTP_USER_AGENT} URI::Fetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} User-Agent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webalta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCollage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Wells\ Search\ II [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WEP\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW-Mechanize [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zermelo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(.)Zeus.Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ZyBorg [NC]
RewriteRule ^. - [F,L]Abuse bot blocking rule end
End HackRepair.com Blacklist
-
Now it's clear. Thanks a lot ThompsonPaul!

-
Thanks!

Typically these blacklists are created and maintained by security specialists who have done testing on the different bots to determine which are legit/beneficial and which are crapbots. They then provide these lists for others to use. Often the lists are amalgamations of bots detected and analysed on a number of different sites and by a number of different specialists to act as a double-check for each other.
You do need to be careful that you are using a well-curated list, as carelessly blocking bots can cause problems for legitimate bots. You would check out the creator of such a list the same way you'd check out the creator of a plugin you're considering using - check reviews, look at comments and responses on the post that provides the blacklist etc.
That answer your question?
Paul
-
Hi ThompsonPaul,
Wow! Superb explanation. One thing I just want to clarify, how would I know if these bots are "bad bots".
Thanks a lot!

-
As Lynn mentions, these entries form a blacklist for "bad bots". These are bots that are identified as being harmful (or at least non-helpful) to the real use of a website. Bots are essentially spiders that crawl and record the pages of your site the same way the GoogleBot does.There are 2 main reasons for blocking them
-
Too many unnecessary bots can put a real strain on server resources, causing the site to slow down for real users. This can be especially problematic with bad bots as they do not respect the entries in your robots.txt file and so will crawl even blocked pages. This can mean huge numbers of extra pages get crawled, leading to even more load.
-
Many (most?) of these bots are collecting data for nefarious purposes. Some are scrapers to collect your site content in order to re-use it illegally on another site, some are scanning for certain files/plugins on your site known to be insecure so they can target them for attack, etc.
Best case scenario, these bots waste your bandwidth and can cause site slowdowns on low-powered (e.g. shared) servers. Worst case, they can actually cause harm to your site.
There are literally many thousands of these types of bots out there, and their creators often change their identifying user agents just to get around these types of blacklists. But many have been around for some time and still use the same identifier. So having a blacklist to block the most common of them is actually very good security practice. To be totally proactive however, you'd need to update the list every couple of months.
Bottom line - those entries are providing some security and overload protection for your site, and there's essentially no downside to having them in place even if they're not catching everything.
Hope that helps - if any of my explanation isn't clear, just holler

Paul
-
-
Thanks Lynn! I'll just remove these parameters and leave this one:
BEGIN WordPress
<ifmodule mod_rewrite.c="">RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
Rewritecond %{http_host} ^domain.com [NC]
Rewriterule ^(.*)$ http://www.domain.com/$1 [R=301,NC]</ifmodule>END WordPress
-
I dont use something like this myself. I suppose if you are having some problem with bots it might be useful, maybe someone else can chime in if they have some experience with this kind of blocking.
-
Thanks Lynn! Is this really necessary?
-
HI,
It is checking to see if the visiting user agent contains any of these strings (NC is telling it non case sensitive) and if it does to return a 403 forbidden message.
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How does educational organization schema interact with Google's knowledge graph?
Hi there! I was just wondering if the granular options of the Organization schema, like Educational Organization (http://schema.org/EducationalOrganization) and CollegeOrUniversity (http://schema.org/CollegeOrUniversity) schema work the same when it comes to pulling data into the knowledge graph. I've typically always used the Organization schema for customers but was wondering if there are any drawbacks for going deep into the hierarchy of schema. Cheers 😄
Intermediate & Advanced SEO | | Corbec8880 -
Should I redirect a domain we control but which has been labeled 'toxic' or just shut it down?
Hi Mozzers: We recently launched a site for a client which involved bringing in and redirecting content which formerly had been hosted on different domains. One of these domains still existed and we have yet to bring over the content from it. It has also been flagged as a suspicious/toxic backlink source to our new domain. Would I be wise to redirect this old domain or should I just shut it down? None of the pages seem to have particular equity as link sources. Part of me is asking myself 'Why would we redirect a domain deemed toxic, why not just shut it down.' Thanks in advance, dave
Intermediate & Advanced SEO | | Daaveey0 -
How do I know if I am correctly solving an uppercase url issue that may be affecting Googlebot?
We have a large e-commerce site (10k+ SKUs). https://www.flagandbanner.com. As I have begun analyzing how to improve it I have discovered that we have thousands of urls that have uppercase characters. For instance: https://www.flagandbanner.com/Products/patriotic-paper-lanterns-string-lights.asp. This is inconsistently applied throughout the site. I directed our website vendor to fix the issue and they placed 301 redirects via a rule to the web.config file. Any url that contains an uppercase character now displays as a lowercase. However, as I use screaming frog to monitor our site, I see all these 301 redirects--thousands of them. The XML sitemap still shows the the uppercase versions. We have had indexing issues as well. So I'm wondering what is the most effective way to make sure that I'm not placing an extra burden on Googlebot when they index our site? Should I have just not cared about the uppercase issue and let it alone?
Intermediate & Advanced SEO | | webrocket0 -
Is Chamber of Commerce membership a "paid" link, breaking Google's rules?
Hi guys, This drives me nuts. I hear all the time that any time value is exchanged for a link that it technically violates Google's guidelines. What about real organizations, chambers of commerce, trade groups, etc. that you are a part of that have online directories with DO-follow links. On one hand people will say these are great links with real value outside of search and great for local SEO..and on the other hand some hardliners are saying that these technically should be no-follow. Thoughts???
Intermediate & Advanced SEO | | RickyShockley0 -
What happens to a domain in SERPs when it's set to redirect to another?
We have just acquired a competing website and are wondering whether to leave it running as is for now, or set the domain to redirect to our own site. If we set up this redirect, what would happen to the old site in Google SERPs? Would the site drop off from results? If so, would we capture this new search traffic or is it a free for all and all sites compete for the search traffic as normal? Thanks in advance. Paul
Intermediate & Advanced SEO | | kevinliao0 -
Using the same content on different TLD's
HI Everyone, We have clients for whom we are going to work with in different countries but sometimes with the same language. For example we might have a client in a competitive niche working in Germany, Austria and Switzerland (Swiss German) ie we're going to potentially rewrite our website three times in German, We're thinking of using Google's href lang tags and use pretty much the same content - is this a safe option, has anyone actually tries this successfully or otherwise? All answers appreciated. Cheers, Mel.
Intermediate & Advanced SEO | | dancape1 -
Soft 404's from pages blocked by robots.txt -- cause for concern?
We're seeing soft 404 errors appear in our google webmaster tools section on pages that are blocked by robots.txt (our search result pages). Should we be concerned? Is there anything we can do about this?
Intermediate & Advanced SEO | | nicole.healthline4 -
Is there any negative SEO effect of having comma's in URL's?
Hello, I have a client who has a large ecommerce website. Some category names have been created with comma's in - which has meant that their software has automatically generated URL's with comma's in for every page that comes beneath the category in the site hierarchy. eg. 1 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/ eg. 2 : http://shop.deliaonline.com/store/music,-dvd-and-games/dvds-and-blu_rays/action-and-adventure/ etc... I know that URL's with comma's in look a bit ugly! But is there 'any' SEO reason why URL's with comma's in are any less effective? Kind Regs, RB
Intermediate & Advanced SEO | | RichBestSEO0