Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Gradual Increase in Domain Authority After Domain Migration But No Improvement in Organic Traffic Yet
We migrated our domain in early April and simultaneously added an SSL certificate. Everything was done by the books. All redirects implemented perfectly, very few errors. Google notified via Search Console. Despite all steps being done perfectly our domain authority dropped from 24 to 8. Organic traffic dropped from about 80 per day to about 10. Each month domain authority increases by 2 or 3. We are now back up to a DA of 16. But no improvement in organic traffic yet. At what point should organic traffic start to return? Hopefully the consistent improvement in DA is a good sign. I have been told that adding SSL and moving the domain at the same time was a very bad idea. We are starting link building next week. Hopefully that will help further. Any ideas as to when this situation will improve? Needless to say it has been awful for our business.
Intermediate & Advanced SEO | | Kingalan10 -
WordPress – parent category "blog" instead of regular "post page"?
In WordPress you normally show you blog posts on: Your home page. Your "posts page" (configurable in the Reading Settings) I want to do neither and have a third option instead: Assign a parent category called "blog" for all posts, and show the latest posts on that category's archive page. For the readers, the experience will be 100% the same as a regular "posts page". The UI, permalinks, and breadcrumbs will be 100% the same. But, I have heard that the "posts page" is important for Google for indexing and understanding your blog. So is is smarter SEO-wise to use a "posts page" instead of a parent category named "blog"? What negative effects might there be, if I have no "posts page" and just use the parent category "blog" instead?
Intermediate & Advanced SEO | | NikolasB0 -
Spike then Drop in Direct Traffic?
We've been doing some SEO work over the last few weeks and earlier this week we saw a large spike in traffic. Yay we all thought, but then yesterday the traffic levels returned to pre-celebratory levels. I've been doing some digging to try and find out what was different Monday and Tuesday this week. Mondays are usually big traffic days for us anyway, but this week was by far the biggest, and Tuesday was even higher still, our best day ever. After some poking, I found that the direct traffic followed the same pattern as our overall traffic levels (image attached). The first spike coincides with an email we sent out that day, but the later spike we just don't know where it came from? I understand loosely that direct isn't easily traceable, but can anyone help us understand more about this second spike? Thanks! ayqL2wi
Intermediate & Advanced SEO | | HB170 -
Search traffic down 30% this month
Our search traffic has been growing at a steady clip for the last year but is down about 30% this month. As part of a redesign, we've repurposed our home page (blog.getvero.com). Rather than serve as a feed of recent posts, it's now an email signup page. We created a new page (blog.getvero.com/posts/) to display new posts. I think this is likely the reason for the drop in search traffic but I'm frustrated that it's losing us thousands of visitors per month. A few questions: 1. How long will it take to recover from this? 2. Is there anything we can do to speed up the recovery process? 3. Why are some of our best performing posts seeing less search traffic even though the URL hasn't changed? Any help is greatly appreciated.
Intermediate & Advanced SEO | | Nobody16116983020420 -
New Website Launch - Traffic Way Down
We launched a new website in June. Traffic plummeted after the launch, we crept back up for a couple of months, but now we are flat, nowhere near our pre-launch traffic or previous year's traffic. For the past 6 months our analytics have been worrying us - Overall traffic and new visitor traffic is down over 10%, bounce rate is up almost 35% since site launched, keywords aren't ranking where they used to, and of course, web sales are down. Is this supposed to happen when a new site is launched, and how long does a new this transition last? We have done all the technical audits, adding relevant content, we're at a loss. Any suggestions where to look next to improve traffic to pre-launch numbers?
Intermediate & Advanced SEO | | WaySEO0 -
Having Content be the First thing the bots see
If you have all of your homepage content in a tab set at the bottom of the page, but really would want that to be the first thing Google reads when it crawls your site, is there something you can implement where Google reads your content first before it reads the rest of your site? Does this cause any violations or are there any red flags that get raised from doing this? The goal here would just be to get Google to read the content first, not hide any content
Intermediate & Advanced SEO | | imageworks-2612900 -
SIte Redesign - Disaster for Organic Traffic
A client just redesigned their site and launched it around May 30. The organic traffic has had a MAJOR drop and has not returned yet. All of the old pages have been 301 redirected to the new pages. Any thoughts on what could be causing this to www.brickhousesecurity.com? In Google Webmaster Tools, before the redesign we were receiving about 300,000 impressions and 10-12,000 clicks. Now the impressions are only 100,000 with half as many clicks. Thanks!
Intermediate & Advanced SEO | | AlightAnalytics0