Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moved brand's shop to a new domain. will our organic traffic recuperate?
Hello, We are a healthcare company with a strong domain authority and several thousand pages of service related content at brand.com. We've been operating an ancillary ecommerce store that sells related 3rd party products at brand.com/shop for a little over a year. We recently invested in a platform upgrade and moved our site to a new domain, brandshop.com. We implemented page-level 301 redirects including all category pages, product detail pages, canonical and non-canonical URLs, etc.. which the understanding that there would not be any loss in page rank. What we're seeing over the last 2 months is an initial dive in organic traffic, followed by a ramp-up period of if impressions (but not position) in the following weeks, another drop and we've steady at this low for the last 2 weeks. Another area that might have hurt us, the 301 redirects were implemented correctly immediately post launch (on a wednesday), but it was discovered on the following Monday that our .htaccess file had reverted to an old version without the redirect rules. For 3-4 days, all traffic was being redirected from brand.com/shop/url to brandshop.com/badurl. Can we expect to recover our organic traffic giving the launch screw up with the .htaccess file, or is it more of an issue with us separating from the brand.com domain? Thanks,
Intermediate & Advanced SEO | | eugene_p
Eugene0 -
My site lost a lot of traffic in the lastest update - what to do?
Hi all, there seems to have been an algorithm update on February 7. One of my big sites www.poussette.com, lost about 25 % of its organic traffic afterwards and has not revovered yet. What are the best steps to take right now? It is 7 years old we continuously did conservative SEO (technical, link building, adding content). Thanks in advance. Dieter
Intermediate & Advanced SEO | | Storesco0 -
Onpage Reviews, SEO & Traffic Uplift
Hi I wondered if anyone knew of any case studies to reinforce the importance of on page reviews for SEO & increasing traffic. I'd like to push it in my company, however it would be great to show them some results from a case study. Thank you!
Intermediate & Advanced SEO | | BeckyKey1 -
Natural Fluctuation in Search Traffic
This is going to sound like a weird question... I'm curious to know whether there is a natural fluctuation in the actual number of searches being made online each week. It would be great to relate this to the performance of my own organic traffic each week. For example, if organic search traffic is down 10% week on week, is that because search in general is down 10%? Has anybody ever looking into this?
Intermediate & Advanced SEO | | ausmed0 -
95% of organic traffic lands in my homepage, despite having a 250 page website with a "seo optimized" hierarchical structure. Any suggestion as to what might be happening?
Challenging issue All the "usual suspects" have been discarded: all pages included in google index, no google penalties, metas optimized, kw's segregated by pages/cluster of pages to avoid cannibalization... BUT, we know we are missing something website is www.e-florex.com and is an e-commerce site based on magento Any ideas you might think are worth exploring? Thanks in advance for your help Juan
Intermediate & Advanced SEO | | juanmarn0 -
90% Traffic Drop...
Hey Moz Community Our team has been racking our collective brains about a 90% drop in traffic a niche client of ours has seen. The traffic drop occurred September 20th. The site averages 300-400 unique visits a month (very targeted & niche) & dropped to 30-40 uvm. The domain is on an exact match but other then that I can't see anything that would be deserving of such a significant drop in traffic. Our campaign had been performing very well & we were seeing steady gains in traffic & ranking over the long term until September 20th. I'm curious if the exact match update would have such a big impact months after it was released (July/July if I remember correctly). I also didn't think it could be such a big issue because that domain has been used by us for a couple years. Is the Exact Domain Match such a big deal? eJGe3Vw
Intermediate & Advanced SEO | | hendersondavidp0 -
What is the best way to allow content to be used on other sites for syndication without taking the chance of duplicate content filters
Cookstr appears to be syndicating content to shape.com and mensfitness.com a) They integrate their data into partner sites with an attribution back to their site and skinned it with the partners look. b) they link the image back to their image hosted on cookstr c) The page does not have microformats or as much data as their own page does so their own page is better SEO. Is this the best strategy or is there something better they could be doing to safely allow others to use our content, we don't want to share the content if we're going to get hit for a duplicate content filter or have another site out rank us with our own data. Thanks for your help in advance! their original content page: http://www.cookstr.com/recipes/sauteacuteed-escarole-with-pancetta their syndicated content pages: http://www.shape.com/healthy-eating/healthy-recipes/recipe/sauteacuteed-escarole-with-pancetta
Intermediate & Advanced SEO | | irvingw
http://www.mensfitness.com/nutrition/healthy-recipes/recipe/sauteacuteed-escarole-with-pancetta0 -
Fading Text Links Look Like Spammy Hidden Links to a g-bot?
Ah, Hello Mozzers, it's been a while since I was here. Wanted to run something by you... I'm looking to incorporate some fading text using Javascript onto a site homepage using the method described here; http://blog.thomascsherman.com/2009/08/text-slideshow-or-any-content-with-fades/ so, my question is; does anyone think that Google might see this text as a possible dark hat SEO anchor text manipulation (similar to hidden links)? The text will contain various links (4 or 5) that will cycle through one another, fading in and out, but to a bot the text may appear initially invisible, like so; style="display: none;"><a href="">Link Here</a> All links will be internal. My gut instinct is that I'm just being stupid here, but I wanted to stay on the side of caution with this one! Thanks for your time 🙂 http://blog.thomascsherman.com/2009/08/text-slideshow-or-any-content-with-fades
Intermediate & Advanced SEO | | PeterAlexLeigh0