Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Removing indexed internal search pages from Google when it's driving lots of traffic?
Hi I'm working on an E-Commerce site and the internal Search results page is our 3rd most popular landing page. I've also seen Google has often used this page as a "Google-selected canonical" on Search Console on a few pages, and it has thousands of these Search pages indexed. Hoping you can help with the below: To remove these results, is it as simple as adding "noindex/follow" to Search pages? Should I do it incrementally? There are parameters (brand, colour, size, etc.) in the indexed results and maybe I should block each one of them over time. Will there be an initial negative impact on results I should warn others about? Thanks!
Intermediate & Advanced SEO | | Frankie-BTDublin0 -
Gradual traffic drop of personal finance website in the last three months
Dear All, I have personal finance website https://mymoneysouq.com and the traffic dropped by less than half of what is was before last three months. I am figuring out all the possible issues and doing everything that comes to our mind to improve the quality of our website. I tried the following before posting here:1. Tried contacting website owner which we think spam and add all such domains to our disavow list2. We found little duplicate content on sites like Quora, we made those answers down by reporting to Quora3. Reported to DMCA on 3 articles articles(partial) from our website.4. We are trying improving user experience5. Removed one of our page that shared by many people but our page was not indexed by Google.6. Checked and modified content if any our articles are having more keywords than what SEO experts recommend. 7. We are working on researching more and figuring our what else can might have gone wrong with our traffic.8. Working on improving EAT I attached our traffic drop graph. I believe this drop is not natural it happened because of some issue at our end and we are not able to figure out the exact reasons.Surprisingly another site with not so high quality content started ranking now in the top.I am here to get community members/experts help on this. I could provide you if you need any further details. Thanks a lot for your time. We really appreciate any tips that you can share with us.Q2S1tlK Q2S1tlK
Intermediate & Advanced SEO | | swamyallamraju0 -
Creating two websites from one and building up traffic to the new domain quickly
A client has an existing successful website that sells niche products - they are well known in their marketplace. They have two sets of key customers, let's call them (a) and (b), that need addressing in different ways to maximise sales. (a) is the more specialist end of the market, where people have complex needs - there are fewer of them but repeat business is likely, and we can talk to them in more technical language. (b) is the layman's end of the market - there is a vast pool of potential customers but they'll be more casual buyers and need to be addressed more in layman's terms. So what they want to do is to take their existing website, and essentially split it into two different websites, one for each market. The one that will use the existing domain, with all the links that have built up over the years pointing to it, will be the site for the more specialist end of the market (a). The domain name suits it better, which is why he wants to use the existing domain with that site and not the other. (b) will be a brand new domain. The client will write new product descriptions across the board so that the two sets of product information are not duplicate. I'd rather he didn't do this at all, because of the risk involved, and the difficulty of building up the traffic to the new site, which is after all the one with the best chance of mass market sales. But given that the client has decided that this is definitely what he wants, does anyone have any thoughts on what the action plan should be?
Intermediate & Advanced SEO | | helga730 -
301 Redirect? How to leverage the traffic on our old domain.
I've seen multiple questions about this but there's a few different answers on ways to approach it. Figured I'd personally ask for our situation. Any advice would be appreciated. We formed a new company with a new name / domain while at the same time buying an existing company in our industry. The domain and site of the company we acquired is ranking for some valuable keywords and still getting a significant amount of traffic (about half of what our new site is getting). A big downside has been, when they moved that site to a different server, something happened to where the site became uneducable so it's full of bad pricing and information. Because of that, we've had a maintenance page up for a little bit because it was generating calls to our sales team (GOOD) but the customer was having seen incredibly incorrect information (BAD) Rather than correcting those issues or figuring out why the site is un-editable, we just want to find a way where we can leverage that traffic and have them end up at our new site. Would we 301 redirect the entire domain to our new one? If we did that would the old domain still keep the majority of it's page rank?
Intermediate & Advanced SEO | | HuskyCargo1 -
Drop in traffic after redesign
Is it common for a site to see slight traffic drops after a site redesign (containing cleaner code, more usability and basically just being more helpful for the end user)? A new site of ours went live last Wednesday and has experienced a drop in traffic. If you have seen this in your own site, how did you recover? And how long did the recovery take?
Intermediate & Advanced SEO | | Gordian0 -
Blocking some countries and redirecting that traffic
Hi there, I have a video site, which is on CDN and is really expensive to run. So I want to block most of the countries and only keep HQ ones. I wonder if there's a difference if I just block them and show blank page, or if I show them a page with text and let's say a link to a different site or if I just simply redirect to some other site. Do you think I can still get good ranking on google on countries that I don't block?
Intermediate & Advanced SEO | | melbog0 -
Google Analytics: how to filter out pages with low bounce rate?
Hello here, I am trying to find out how I can filter out pages in Google Analytics according to their bounce rate. The way I am doing now is the following: 1. I am working inside the Content > Site Content > Landing Pages report 2. Once there, I click the "advanced" link on the right of the filter field. 3. Once there, I define to "include" "Bounce Rate" "Greater than" "0.50" which should show me which pages have a bounce rate higher of 0.50%.... instead I get the following warning on the graph: "Search constraints on metrics can not be applied to this graph" I am afraid I am using the wrong approach... any ideas are very welcome! Thank you in advance.
Intermediate & Advanced SEO | | fablau0 -
Whats your regular routine ? Can we learn new things from each other
I tend to work on the on page changes first of all following keyword research. Then take a look at some internal linking, Setup a wordpress blog on /blog or sub domain and get my copywriter to start adding regular content . Next stage is link building Old fashioned emails requests, blog comments taking a look through existing sites we own for relevant places. On going analysis once positions change.
Intermediate & Advanced SEO | | onlinemediadirect0