Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Migrated Domain, 90% Drop in Organic Traffic, HELP!!!!
One week ago we migrated our old domain www.nyc-officespace-leader.com to https://www.metro-manhattan.com/. Our organic search traffic from Google has dropped about 90%. Is this normal? If so, how long should it take to recover? We filed a submitted a domain change request on Webmaster tools one week ago. We are noticing that many of the www.nyc-officespace-leader.com pages are still indexed which seems strange after a week. To complicate things, we filed a disavow file on April 9th for spammy links that pointed the NYC site. We filed the identical disavow of those links to the new Metro domain to ensure low quality links don't point to the new domain. Prior to making the domain change request, we migrated 30-40 non critical pages from NYC to Metro domains. Webmaster Tools indicated that the traffic was normal on the migrated pages. We then migrated remaining pages and filed the domain change request on April 4th. It is after April 4th that traffic and ranking declined. I would like to mention that there was no change in content; identical content was migrated from Metro to NYC This does not seem normal. Research prior to the migration indicated that if proper steps were taken it should proceed with limited disruption in traffic and ranking. Any ideas on how to remedy this situation? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
How to Target Country Specific Website Traffic?
I have a website with .com domain but I need to generate traffic from UK? I have already set my GEO Targeting location as UK in Google Webmasters & set country location as UK in Google Analytics as well but still, i get traffic only from India. I have also set Geo-targeting code at the backend of the website. But nothing seems works. Can anyone help me how can is do this? I am unable to understand what else can be done.
Intermediate & Advanced SEO | | seoninj0 -
What To Do About Yahoo Slurp Bot Bogging My Site Down?
Hello, Our IT department has informed me that they have seen extremely heavy traffic from the Yahoo Slurp bot in recent days. They are claiming this bot has single-handedly caused one of our servers to crash. I am a bit skeptical of this, as I have not found these particular legitimate search engine bots to be aggressive resource hogs, especially for an enterprise-level web server. I have requested to examine the server logs myself, but have not had success with this. IT is requesting to block this particular bot, but I am apprehensive about doing this, as I don't want this to have any negative implications on our site showing in Yahoo News or other Yahoo properties. Does anyone else have experience with this bot being an overly-zealous resource drag, and if so, what is the best course of action to satisfy all parties?
Intermediate & Advanced SEO | | RobbieFoglia1 -
Site experiencing drop in Google rankings and organic traffic after redesign.
Hello, The company that I work for recently implemented a complete redesign for our company website. The former site was old, cumbersome and in desperate need of an update. We streamlined the site structure and made sure to redirect as many pages as we could find to new thematically related pages with 301 redirects. After the launch of our new site we saw a large upswing in "soft" 404 errors despite the fact that most of these pages do redirect upon inspection. So in relation to the soft 404s, for example, is it merely a matter of labeling them as fixed if they redirect properly, or could their be an underling issue with the site itself? Also, a majority or the urls labeled "not found" in webmaster tools are properly redirected. Do these merely need to be marked as fixed, or is there something else that needs to be fixed like the sitemap structure? I appreciate any and all input. Beyond Indigo
Intermediate & Advanced SEO | | BeyondIndigo1 -
90% Traffic Drop...
Hey Moz Community Our team has been racking our collective brains about a 90% drop in traffic a niche client of ours has seen. The traffic drop occurred September 20th. The site averages 300-400 unique visits a month (very targeted & niche) & dropped to 30-40 uvm. The domain is on an exact match but other then that I can't see anything that would be deserving of such a significant drop in traffic. Our campaign had been performing very well & we were seeing steady gains in traffic & ranking over the long term until September 20th. I'm curious if the exact match update would have such a big impact months after it was released (July/July if I remember correctly). I also didn't think it could be such a big issue because that domain has been used by us for a couple years. Is the Exact Domain Match such a big deal? eJGe3Vw
Intermediate & Advanced SEO | | hendersondavidp0 -
SIte Redesign - Disaster for Organic Traffic
A client just redesigned their site and launched it around May 30. The organic traffic has had a MAJOR drop and has not returned yet. All of the old pages have been 301 redirected to the new pages. Any thoughts on what could be causing this to www.brickhousesecurity.com? In Google Webmaster Tools, before the redesign we were receiving about 300,000 impressions and 10-12,000 clicks. Now the impressions are only 100,000 with half as many clicks. Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0 -
What's going on with my organic traffic from Google?
I am working on eCommerce website Vista Stores. My website's traffic is going down due to certain reason. I have done R & D and have assumption with auto generated content which I have added on few product pages. You can find out attachment to know more about current situation of traffic. 6789134845_d1a1578960_b.jpg
Intermediate & Advanced SEO | | CommercePundit0 -
Canonical Tags & Search Bots
Does anyone know for sure if search engine bots still crawl links on a page whose canonical tags are set to a different page? So in short, would it be similar to a no-index follow? Thanks! -Margarita
Intermediate & Advanced SEO | | MargaritaS0