Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Keywords and keyword traffic
Hi I am struggling to know what keywords i should be targeting and how the website should be best optimised for said keywords. The website offers bespoke service in the lake district UK a popular tourist destination, The business operates within say a 30 km riadus of the area. So target vistors to the website would specifically be looking for services in the lake district. The trouble is for many targeted keywords for the area are quite low or no data shown. For example: tipi camping lake district, tipi hire lake district, Glamping lake district However nationally keywords for the service have a lot higher traffic i.e. tipi hire or tipi camping, glamping what keywords should be my target? and should I targeting my website for? I don't want to target customers looking for these services outside of the lake district and also by targeting keywords without the term lake district means my competition is greater as i'm competing with the whole of the Uk for serivces It can't provide. please advise thanks
Intermediate & Advanced SEO | | Bengo-990 -
HELP!!! Steep Drop in Organic Traffic Starting 11/1/16
Starting November 1st, organic web traffic from Google dropped from an average of about 60 visits a day to about 5 per day. So we are more than 90% off!!!! At the end of September, we modified the header of the site to simplify it. We also added a snippet of code to each page to enable Zoho "Sales IQ" to work. Sales IQ enables us to track visitors and engage in chat sessions with them. Apart from that no changes have been made from the site. Any ideas as to what could have caused this drop in traffic? Was there a Google update at that time that could have caused the drop? Or could the recent site changes have caused this? I have attached a Google Webmasters Tool report showing the drop in traffic. I would very much appreciate some insight into this, as all organic traffic to our site has ceased. Thanks,
Intermediate & Advanced SEO | | Kingalan1
Alan 9VNB1O50 -
My direct traffic went up and my organic traffic went down. Help!
So on Oct. 21, our direct traffic increased 3x and our organic traffic decreased 3x. And it has been that way ever since. Almost like they flip flopped. Additionally, that was the same day I started retargeting to our site. I have tagged all the links from the ads and they're being counted as google paid clicks in GA. And our accounts are linked. I am just dumbfounded as to how this could happen.
Intermediate & Advanced SEO | | Eric_OWPP1 -
Website architecture - levels vs filters and authority loss - Enterprise SEO
Hi Everyone, I am participating in the development of a marketplace website where the main channel will be traffic via SEO. We have encountered the directories (levels) vs filters situation. 1. Does everyone still agree that if we have too many levels, authority is loss as you do down through the levels? Does everyone agree that there should be a max of 3 levels and never 4. Example 1 www.domain.com/level1/level2/level3 vs www.domain.com/level1 In theory, the content on "level 3" will have a lower DA than the content on "level1". 2. Does everyone agree that for enterprise SEO (huge marketplace websites) filters are a better idea than levels? Example 2 www.domain.com/level1/level2/level3 vs www.domain.com/filter-option1 In theory, the content on "level 3" will have a lower DA than the content on "filter-option1". Thanks so much in advance
Intermediate & Advanced SEO | | Carla_Dawson0 -
What will happen if we 302 a page that is ranking #1 in google for a high traffic term?
We're planning to test something and we want to 302 a page to another page for a period of time. The question is, the original page is ranking #1 for a high traffic term. I want to know what will happen if we do this? Will we lose our rank? Will the traffic remain the same? Ultimately I do not want to lose traffic and I do not want to 301 until it has been properly tested.
Intermediate & Advanced SEO | | maxcdn0 -
New site. How important is traffic for a new site? And what about domain age?
Hi guys. I've been building a new site because i've seen a real SEO opportunity out there. I'm a mixing professional by trade and so I wanted to take advantage of SEO to help gain more work. Here's the site: www.signalchainstudios.co.uk I'm curious about domain age. This site fairly well optimised for my keywords, and my site got pretty good content on it (i think so anyway). But it's no where to be seen on the SERP's (link at all). Is this just a domain age issue? I'd have though it might be in the top 50 because my site's services are not hard to rank for at all! Also what about traffic? Does Google want to see an 'active' site before it considers 'promoting' it up the ranks? Or are back links and good content the main factor in the equation? Thanks in advance. I love this community to bits 🙂 Isaac.
Intermediate & Advanced SEO | | isaac6631 -
Wordpress to HubSpot CMS - I had major crawl issues post launch and now traffic is down 400%
Hi there good looking person! Our traffic went from 12k visitors in july to 3k visitors in july. << www.thedsmgroup.com >>When we moved our site from wordpress to the hubspot COS (their CMS system), I didnt submit a new sitemap to google webmaster tools. I didn't know that I had to... and to be honest, I've never submitted or re-submitted a sitemap to GWT. I have always built clean sites with fresh content and good internal linking and never worried about it. Yoast kind of took care of the rest, as all of my sites and our clients' sites were always on wordpress. Well, lesson learned. I got this message on June 27th in GWT_http://www.thedsmgroup.com/: Increase in not found errors__Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site's pages._One month after our site launched we had 1,000 404s on our website. Ouch. Google thought we had a 1,200 page website with only 200 good pages and 1,000 error pages. Not very trust worthy... We never had a 404 ever before this, as we added a plugin to wordpress that would 301 any 404 to the homepage, so we never had a broken link on our site, which is not ideal for UX, but as far as google was concerned, our site was always clean. Obviously I have submitted a new sitemap to GWT a few weeks ago, and we are moving in the right direction... **but have I taken care of everything I need to? I'm not sure. Our traffic is still around 100 visitors per day, not 400 per day as it was before we launched the new site.**Thoughts?I'm not totally freaking out or anything, but a month ago we ranked #1 and #2 for "marketing agency nj", now we aren't in the top 100. I've never had a problem like this. _I added a few screen grabs from Google Webmaster Tools that should be helpful.__Bottom line, have I done everything I need to or do I need to do something with all of these "not found" error details that I have in GWT?_None of these "not found" pages have any value and I'm not sure how Google even found them... For example: http://www.thedsmgroup.com/supersize-page-test/screen-shot-2012-11-06-at-2-33-22-pmHelp! -JasonuhLLtou&h4QmGCW#0 uhLLtou&h4QmGCW#1
Intermediate & Advanced SEO | | Charlene-Wingfield0