Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sudden drop in organic traffic after migration from Django to Wordpress.
I have seen a sudden drop organic reach in a particular page of our website www.hackerearth.com/innovation earlier this was www.hackerearth.com/sprint. I although understand that it happens while migration but it has been a while we did the migration. The migration happened around May month. Something similar has happened to our blog. Earlier it was a blog.hackerearth.com now hackerearth.com/blog _Could anyone suggest me what could be the possible issue for the drop in traffic? _
Intermediate & Advanced SEO | | Rajnish_HE0 -
Gradual Increase in Domain Authority After Domain Migration But No Improvement in Organic Traffic Yet
We migrated our domain in early April and simultaneously added an SSL certificate. Everything was done by the books. All redirects implemented perfectly, very few errors. Google notified via Search Console. Despite all steps being done perfectly our domain authority dropped from 24 to 8. Organic traffic dropped from about 80 per day to about 10. Each month domain authority increases by 2 or 3. We are now back up to a DA of 16. But no improvement in organic traffic yet. At what point should organic traffic start to return? Hopefully the consistent improvement in DA is a good sign. I have been told that adding SSL and moving the domain at the same time was a very bad idea. We are starting link building next week. Hopefully that will help further. Any ideas as to when this situation will improve? Needless to say it has been awful for our business.
Intermediate & Advanced SEO | | Kingalan10 -
How to avoid adult traffic to site?
A client of ours is increasingly getting a lot of adult traffic to their site, where they show up only for adult searches and not at all for relevant searches. How can we stop Google associating their site with adult content? Here's a blog example, giving advice to parents on girls and body image issues: https://www.commonsensemedia.org/blog/girls-and-body-image keywords driving traffic to this page are all around images for 'young nude girls' etc.
Intermediate & Advanced SEO | | MediaCause0 -
Ipad Sales & Traffic Improvement for my Ecommerce site
Do you guys know any tool or software which provides follow things for my ecommerce site? Real Time/ next day data for ipad traffic Real Time/ next day data for ipad urls visited Read time/ next day data for ipad Page rendering load time for all the urls separately Real Time/ next day data for ipad network load time for all the urls separately Real Time/ next day data for ipad dom processing time for the all the urls separately Real Time/ next day data for ipad request queuing load time for all the urls separtely Real Time/ next day data for ipad web application load time for all the urls separtely Real Time/ next day data for ipad total load time for each url Real Time/ Next day data for ipad timestamp i.e Time of each url being accessed by the visitor Real Time/ next day data for ipad visitor city Real Time/ next day data for ipad visitor country code Real Time/ next day data for ipad visitor duration on that page Real Time/ next day data for ipad visitor user agent name foreg chrome, IE, safari, firefox etc Real time/ next day data for ipad visitor user agent OS foreg. ipad only Real time/ next day data for ipad user agent version foreg. ipad 8.0, ipad 6.0, ipad air, ipad ratina, ipad mini etc Real time/ next day data for ipad visitor for each url session trace in water fall like backend time, dom processing, page load, waiting on ajax, interactions of visitors etc Real time/ next day data for ipad visitor for each url with total request for each page. Real time/ next day data for ipad visitors for each url with javascript error on the page and javascript url plus stake track of that error. Real time/ next day data for ipad visitors for each url with ajax error on the page and ajax url plus stake track of the error Real time/ next day data for ipad visitors for each and every url where each and every request time taken in waterfall layout. Real time/ next day data for ipad visitors funnel visiualization tracking Real time/ next day data for ipad visitors transcations tracking. Please note that all above data also require day wise, country wise, previous days and month, model wise sorting, pagination feature, etc. waiting for your reply Regards, Mit
Intermediate & Advanced SEO | | mit0 -
Blocking some countries and redirecting that traffic
Hi there, I have a video site, which is on CDN and is really expensive to run. So I want to block most of the countries and only keep HQ ones. I wonder if there's a difference if I just block them and show blank page, or if I show them a page with text and let's say a link to a different site or if I just simply redirect to some other site. Do you think I can still get good ranking on google on countries that I don't block?
Intermediate & Advanced SEO | | melbog0 -
Removing large section of content with traffic, what is best de-indexing option?
If we are removing 100 old urls (archives of authors that no longer write for us), what is the best option? we could 301 traffic to the main directory de-index using no-index, follow 404 the pages Thanks!
Intermediate & Advanced SEO | | nicole.healthline0 -
Filter after 301 and linked with high PR
Hi, I'd like to ask you what should I do in my situation. I've shorted my URLs from something like this: domain.com/module/action/type/id/keyword to this: domain.com/keyword After 301 SERP refreshed and position stayed the same (yea, lucky me :). After 2 days I got some hight PR links (4 and 5). After 8 days my new URL disapprear to one keyword. Now this take 6 days... I've removed these links and still no results. So the question is - what should I do? Remove new url and replace it with old one, get new links?
Intermediate & Advanced SEO | | sui0