Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I stop spam Google Organic traffic?
Hey Moz, I'm a rather experienced SEO who just encountered a problem I have never faced. I am hoping to get some advice or be pointed in the right direction. I just started work for a new client. Really great client and website. Nicer than most design/content. They will need some rel canonical work but that is not the issue here. The traffic looked great at first glance 131k visits in April. Google Analytics Acquisition Overview showed 94% of the traffic as organic. When I dug deeper and looked at the organic source I saw that Google was 99.9% of it. Normal enough. Then I looked at the time on site and my jaw dropped. 118,454 Organic New Users for Google only stayed on the site for 3 seconds. There is no way that the traffic is real. It does not match what Google Webmaster tools, Moz, and Ahrefs are telling me. How do I stop a service that is sending fake organic Google traffic?
Intermediate & Advanced SEO | | placementLabs0 -
How to increase traffic?
This may seem a bit of a broad question but grateful for all knowledgeable input! There are many things to focus on when looking at organic SEO, but where should I be spending the most time/effort in order to increase traffic.
Intermediate & Advanced SEO | | seoman100 -
Https Loss of Search traffic
Hey guys, We moved our site to from http to https. We subsequently lost 25% in our search traffic in 1 Month. We changed a few other pieces such as images, added new content etc. Has anyone got any suggestions on how we start to understand what happened? Thanks in advance.
Intermediate & Advanced SEO | | Johnny_AppleSeed0 -
Lot of duplicate content and still traffic is increasing... how does it work?
Hello Mozzers, I've a dilemma with a client's site I am working on that is make me questioning my SEO knowledge, or the way Google treat duplicate content. I'll explain now. The situation is the following: organic traffic is constantly increasing since last September, in every section of the site (home page, categories and product pages) even though: they have tons of duplicate content from same content in old and new URLs (which are in two different languages, even if the actual content on the page is in the same language in both of the URL versions) indexation is completely left to Google decision (no robots file, no sitemap, no meta robots in code, no use of canonical, no redirect applied to any of the old URLs, etc) a lot (really, a lot) of URLs with query parameters (which brings to more duplicated content) linked from the inner page of the site (and indexed in some case) they have Analytics but don't use Webmaster Tools Now... they expect me to help them increase even more the traffic they're getting, and I'll go first on "regular" onpage optimization, as their title, meta description and headers are not optimized at all according to the page content, but after that I was thinking on fixing the issues with indexation and content duplication, but I am worried I can "break the toy", as things are going well for them. Should I be confident that fixing these issues will bring to even better results or do you think is better for me to focus on other kind of improvements? Thanks for your help!
Intermediate & Advanced SEO | | Guybrush_Threepw00d0 -
Do I have a Panda filter on a specific segment?
Our site gets a decent level of search traffic and doesn't have any site-wide penalty issues, but one of our sections looks like it might be under some form of filter. Unfortunately for us, they're our buy pages! Check out http://www.carwow.co.uk/deals/Volkswagen/Golf it's unique content and I've built white hat links into it, including about 5 from university websites (.ac.uk domains DA70+). If you search something like "volkswagen golf deals" the pages on page 1 have weak thin content and pretty much no links. That content section wasn't always unique, in fact the vast majority of it may well be classed as dupe content as there's no Trim data and they look like this: http://www.carwow.co.uk/deals/Fiat/Punto While we never had much volume, the traffic on all /deals/ pages appears to drop significantly around the time of the May Panda update (4.0). We're planning on completely re-launching these pages with a new design, unique trim content and a paragraph (c.200 words) about the model. Am I right in assuming that there's a Panda filter on the /deals/ segment so regardless of what I do to one deals page it won't rank well, and we have to re-do the whole section?
Intermediate & Advanced SEO | | Matt.Carwow0 -
How to get traffic from a particular Geographical region?
Our company is based out of India and has a web site with .in domain ; however our target customers are from North America and Australia.
Intermediate & Advanced SEO | | TPS2013
The problem is we get as high as 70% of organic traffic from India.
This 70% traffic from India has little use to us. Possibly because we have ”.in “ domain the Google local search is active.
How to reverse this situation; I mean we are looking for more traffic from across the globe except India.
Any suggestions ? P.S. Changing domain from .in to .com is not an option as its the part of our brand advertised for last 7 years1 -
Linking Strategies 2013 - Regular Routines/ Tips
So, what is the latest linking strategies and the best practices for the new year? I'm looking to start a clean, fresh website and would love to implement a great new strategy or tactics that really work with the right amount of effort. Is there a guide available? Preferably from website on-page optimization all the way to a regular routine of what to do. I know it won't be easy, but that's why I love the ever changing world of SEO!
Intermediate & Advanced SEO | | Paul_Tovey0 -
Google Filter? Drop from top first page to bottom second page?
My site has dropped from the first page top spots to the bottom second page, about 2 month ago. From time to time it reappears in the first page, is this some kind of google filter? How do I solve this issue?
Intermediate & Advanced SEO | | Ofer230