Regular Expressions for Filtering BOT Traffic?

AWCthreads

I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.

However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.

How do I determine what the regular expression is for additional bots so I can apply them to the filter?

I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.

Chris_CM

No problem, feel free to reach out if you have any other RegEx related questions.

Regards,

Chris

AWCthreads

I will definitely do that for Rackspace bots, Chris.

Thank you for taking the time to walk me through this and tweak my filter.

I'll give the site you posted a visit.

Chris_CM

If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.

AWCthreads

Crap.

Well, I guess the vernacular is what I need to know.

Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?

I could really see myself botching this filtering business.

Chris_CM

Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.

AWCthreads

Does it need the . before the )

Chris_CM

Ok, try this:

Just added rackspace as another match, it should work if the name is exactly right.

Hope this helps,

Chris

SErOb

Agreed! That's why I suggest using it in combination with the variables you mentioned above.

AWCthreads

rackspace cloud servers

Maybe my problem is I'm not looking in the right place.

I'm in audience>technology>network and the column shows "service provider."

Chris_CM

How is it titled in the ISP report exactly?

AWCthreads

For example,

Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.

What is the reg expression for rackspace?

Chris_CM

Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.

SErOb

"...a combination of operating system, location, and some other factors can do the trick."

Yep, combined with those, look for "Avg. Time on Page = 00:00:00"

Chris_CM

Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez

Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.

AWCthreads

Sure. Here's the post for filtering the bots.

Chris_CM

If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?

Regards,

Chris

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Regular Expressions for Filtering BOT Traffic?

Got a burning SEO question?

Explore more categories

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved