Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirecting traffic to https
Hey! i was wondering, should i force all traffic to https address? i know that overall a better secured website will rank better since it earns more trust from users which means less bounce rate and the list of benefits is endless..
Intermediate & Advanced SEO | | SharonEKG
but should i FORCE ALL traffic to a https? or maybe only force a http to https? or not at all?2 -
Optimising Shopify Filtered Pages
Hi Guys, Currently working with a couple Shopify ecommerce sites, currently the main category urls cannot be optimised for SEO as they are auto-generated and basically filtered pages. Examples: http://tinyurl.com/hm7nm7p http://tinyurl.com/zlcoft4 One solution we have came up with is to create HTML based pages for each of these categories example: http://site.com.au/collections/women-sandals In the backend and keep the filtered page setup. So these pages can be crawled and indexed. I was wondering if this is the most viable solution to this problem for Shopify? Cheers.
Intermediate & Advanced SEO | | jayoliverwright0 -
Traffic has not recovered from https switch a year ago.
I have an ecommerce site that was switched to https a year ago almost to the day. Our category pages are about half of what they were. The redirects were put in properly, and everything in webmaster tools looks good. Anything out there I may not have thought of? Want to add that the drop is only in Google, Bing stayed just fine.
Intermediate & Advanced SEO | | EcommerceSite0 -
Subdomains vs directories on existing website with good search traffic
Hello everyone, I operate a website called Icy Veins (www.icy-veins.com), which gives gaming advice for World of Warcraft and Hearthstone, two titles from Blizzard Entertainment. Up until recently, we had articles for both games on the main subdomain (www.icy-veins.com), without a directory structure. The articles for World of Warcraft ended in -wow and those for Hearthstone ended in -hearthstone and that was it. We are planning to cover more games from Blizzard entertainment soon, so we hired a SEO consultant to figure out whether we should use directories (www.icy-veins.com/wow/, www.icy-veins.com/hearthstone/, etc.) or subdomains (www.icy-veins.com, wow.icy-veins.com, hearthstone.icy-veins.com). For a number of reason, the consultant was adamant that subdomains was the way to go. So, I implemented subdomains and I have 301-redirects from all the old URLs to the new ones, and after 2 weeks, the amount of search traffic we get has been slowly decreasing, as the new URLs were getting index. Now, we are getting about 20%-25% less search traffic. For example, the week before the subdomains went live we received 900,000 visits from search engines (11-17 May). This week, we only received 700,000 visits. All our new URLs are indexed, but they rank slightly lower than the old URLs used to, so I was wondering if this was something that was to be expected and that will improve in time or if I should just go for subdomains. Thank you in advance.
Intermediate & Advanced SEO | | damienthivolle0 -
90% Traffic Drop...
Hey Moz Community Our team has been racking our collective brains about a 90% drop in traffic a niche client of ours has seen. The traffic drop occurred September 20th. The site averages 300-400 unique visits a month (very targeted & niche) & dropped to 30-40 uvm. The domain is on an exact match but other then that I can't see anything that would be deserving of such a significant drop in traffic. Our campaign had been performing very well & we were seeing steady gains in traffic & ranking over the long term until September 20th. I'm curious if the exact match update would have such a big impact months after it was released (July/July if I remember correctly). I also didn't think it could be such a big issue because that domain has been used by us for a couple years. Is the Exact Domain Match such a big deal? eJGe3Vw
Intermediate & Advanced SEO | | hendersondavidp0 -
Big Drop in Traffic, No change in Position
Penguin 2.0 was a great update for one of my biggest client. A website that was using terrible black hat techniques and ranked first on the most important keyword in my clients niche got kickt from the SERP's and my client jumped from 4th to 1st. The jump in traffic was enormous and on top of that 5% of the traffic converted instead of the usual 2,5%- 3% on other traffic. Untill July 2nd. Traffic from the keyword dropped by 80% while we were still in position 1, after a lot of digging I thaught I found what caused it, Google booted the keyword from their autofill. My question is if anyone has seen a removal from tthe autocomplete making that big of a difference in search volume.
Intermediate & Advanced SEO | | Laurensvda1 -
Almost no organic traffic
Hi, We have an online store, it is up & running since January 1st. Since then we really didn't see any improvements on our organic traffic at all. About 10% of our traffic is coming from organic search, and more than 20% of organic search actually coming from branded keywords. We haven't paid a lot of attention to SEO so far. I mean, we paid attention to the practices, however we focused on a better customer/user experience more than SEO. We improved our product pages, reduced checkout process to one step, used bigger icons / buttons. According to our customers, our website is pretty easy to navigate and shop. We haven't received any major complaint so far. Except couple of products, all the content we have is original, we didn't use any manufacturer product content or copied from another website. However, looks like all these efforts don't mean a lot to Google, unless we have a solid backlinks. Currently i am considering to make category pages NOINDEX and implement microdata from schema.org. However, Is it good idea to make category pages NOINDEX for an ecommerce website? I would like to hear your comments/recommendations what else we can do to create some organic traffic.
Intermediate & Advanced SEO | | serkie0 -
Google Filter? Drop from top first page to bottom second page?
My site has dropped from the first page top spots to the bottom second page, about 2 month ago. From time to time it reappears in the first page, is this some kind of google filter? How do I solve this issue?
Intermediate & Advanced SEO | | Ofer230