Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Huge Drop in Direct Traffic in G4
Our direct traffic dropped 50% in October. Is anyone else seeing a drop in direct traffic in October in G4? It hasn't shifted to another source or unassigned it's just gone. Has anyone else experienced this and what might be the reasons?
Intermediate & Advanced SEO | | inhouseninja1 -
Site under attack from Android SEO bots - expert help needed
For last 25 days, we are facing a weird attack on our site. We are getting 10x the normal mobile traffic - all from Android, searching for our name specifically. We are sure that this is not authentic traffic as the traffic is coming from Organic searches and bouncing off. Initially, we thought this was a DDoS attack, but that does not seem to be the case. It looks like someone is trying to damage our Google reputation by performing too many searches and bouncing off. Has any one else faced a similar issue before? What can be done to mitigate the impact on site. (FYI - we get ~2M visits month on month, 80% from Google organic searches). Any help would be highly appreciated.
Intermediate & Advanced SEO | | KJ_AV0 -
Help, no organic traffic recovery after new site launch (it's been 6 months)!
I worked with a team of developers to launch a new site back in March. I was (and still am) in charge of SEO for the site, including combining 4 sites into 1. I made sure 301 redirects were in place to combine the sites and pretty much every SEO tactic I can think of to make sure the site would maintain rankings following launch. However, here we are 6 months later and YoY numbers are down -70% on average for organic traffic. Anyone mind taking a look at http://www.guestguidepublications.com and seeing if there's a glaring mistake I'm missing?!?!?! Thanks ahead of time!
Intermediate & Advanced SEO | | Annapurna-Digital1 -
Do I miss traffic (thus, page value) by using the GWMT Parameter Handling Tool?
I'm working through duplicate content issues. The tracking code or the session id in the URL is being recognized as a different page than the original. Example: www.example.com is dup content to www.example.com?_nk=x&ad=y&_ga=z, which is tied to a marketing campaign If my setup in the URL parameter tool is set to: Effect = None Crawl = Representative URL, then do I: 1. Miss all the traffic being driven to the ?_nk page?
Intermediate & Advanced SEO | | johnnybgunn
2. With a Rep URL, there still would be two indexed listings: the .com & the .com?_nk...right? Neither is good. Redirects of all the URLs is not an option b/c there are hundreds of these that would need to be redirected. And I also don't want to slow down page load time with excessive redirects, which has been the case when adding 100+ redirects for the recent website migration we did.0 -
How will this affect the rankings and traffic of the new site once this happens?
Hi, we will be moving a clients’ site address from one domain to another and will of course be doing 301 redirects and notifying Google of the site address change in WMT. The problem is, that at some point in the future (say 3-6 months), the old domain will be going live with a new site as the current client does not own the domain and the owner will be wanting it back unfortunately. How will this affect the rankings and traffic of the new site (new domain) once this (old domain with new site) happens? Will the site address change be enough to keep the rankings but it will lose backlink traffic? Or will rankings go down since the 301 redirects will in essence no longer be in affect? Many thanks for your help in advance.
Intermediate & Advanced SEO | | WSIDW0 -
Disavow links of my own in niche forums that i post to regularly?
Hi Yall, I'm disavowing a new set of links and have come across a wall: Let's say your niche is in web hosting and you post to forums such as a webhostingtalk.com (a forum very popular in the hosting business). If your sole purpose is mostly selling your business and you have links (not anchor text keywords) that you direct users to for specific products and such...do you do a disavow those links? I'm not leaving links like: Web hosting, or, Free Hosting... I'm posting deals and answering some questions on other posts that direct to my site with traditional links. Thank you
Intermediate & Advanced SEO | | Shawn1240 -
Can bots identify shmushed keywords?
I remember reading some years ago that domains and pages that have smushed keywords, such as cheapbaseballs.com/redbaseball.html could be identified by Google as "cheap baseballs" and "red base ball". Is this still correct?
Intermediate & Advanced SEO | | CFSSEO0 -
Massive decreases in traffic
Hi i've been looking at the affects of googles algorithmic updates over the last couple years and the impact on sites/competitors i have been monitoring in the space. Two sites which surprised me, in having a dramatic decline in search traffic were: kriskris.com (over 200k visitors to around 10k) only-cookware.com (from 40k visitors at its peak to only around 1000k) (semrush traffic data attached) Both sites have great quality content and social signals. The only thing i can think of is a over-optimization of anchor text, and types of links. dnrm0Oa.png cuaLzrI.png
Intermediate & Advanced SEO | | monster990