Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Using Similar Expired URLs to Send Traffic to My Site
Thanks in advance for any help! I have an existing website with content on a particular topic. I have discovered a few similar expired URLs that might still get some traffic. One in particular still has a number of valid links from other sites. Would it make sense for me to buy those URLs (which are really cheap) and just use them to send that traffic to my site? If so, am I better using a 301 redirect or having a home page on the new site that just mentions that the old site is expired, and that they might want to instead link over to my site?
Intermediate & Advanced SEO | | alanjosephs0 -
Website Traffic Is Down
Hi, My Website www.financeninvestments.com is down for almost now 2 years. I was receiving the good traffic before this but now the traffic is almost down. I want to again do something to get my Traffic back with some consistent efforts. So what efforts should i do to make this back.Pls suggest.
Intermediate & Advanced SEO | | rahulsoni250 -
Why isn't my uneven link flow among index pages causing uneven search traffic?
I'm working with a site that has millions of pages. The link flow through index pages is atrocious, such that for the letter A (for example) the index page A/1.html has a page authority of 25 and the next pages drop until A/70.html (the last index page listing pages that start with A) has a page authority of just 1. However, the pages linked to from the low page authority index pages (that is, the pages whose second letter is at the end of the alphabet) get just as much traffic as the pages linked to from A/1.html (the pages whose second letter is A or B). The site gets a lot of traffic and has a lot of pages, so this is not just a statistical biip. The evidence is overwhelming that the pages from the low authority index pages are getting just as much traffic as those getting traffic from the high authority index pages. Why is this? Should I "fix" the bad link flow problem if traffic patterns indicate there's no problem? Is this hurting me in some other way? Thanks
Intermediate & Advanced SEO | | GilReich0 -
Traffic down 60% - about to cry, please help
Hiya guys and girls, I've just spent 6 months, a lot of blood sweat and tears, and money developing www.happier.co.uk. In the last weeks the site started to make a trickle of money, still loss making but showing green shoots. But then on Friday the traffic dropped due to my rankings on google.co.uk dropping. Visits: Thur 25th april = 1950 Fri 26th april = 1284 Sat 27th april = 906 So it looks like Ive been hit with some sort of penalty. I did get a warning on the 20th april about an increase in the number of 404 errors, currently showing 77. I've now remove the links to those 404 pages, ive left the 404 pages as is, as was suggested here: http://www.seomoz.org/blog/how-to-fix-crawl-errors-in-google-webmaster-tools. Could that be the reason? We have spent a lot of time on site design and content. We think the site is good, but I agree it has a long way to go but without income that is hard, so we have been struggling through. Any ideas on the reason/s for the penalty? Big thanks, Julian.
Intermediate & Advanced SEO | | julianhearn0 -
Fading Text Links Look Like Spammy Hidden Links to a g-bot?
Ah, Hello Mozzers, it's been a while since I was here. Wanted to run something by you... I'm looking to incorporate some fading text using Javascript onto a site homepage using the method described here; http://blog.thomascsherman.com/2009/08/text-slideshow-or-any-content-with-fades/ so, my question is; does anyone think that Google might see this text as a possible dark hat SEO anchor text manipulation (similar to hidden links)? The text will contain various links (4 or 5) that will cycle through one another, fading in and out, but to a bot the text may appear initially invisible, like so; style="display: none;"><a href="">Link Here</a> All links will be internal. My gut instinct is that I'm just being stupid here, but I wanted to stay on the side of caution with this one! Thanks for your time 🙂 http://blog.thomascsherman.com/2009/08/text-slideshow-or-any-content-with-fades
Intermediate & Advanced SEO | | PeterAlexLeigh0 -
Why do i not receive google traffic?
over the 4-5 months i have published over 3000 unique articles which i have payed well over 10 000usd for, but i still only receive about 20 google visitors a day for that content. i uploaded the 3000 articles after i 301 redirected the old site to a a new domain (old site had 1000 articles, and at least 300visits from google a day), and all the old conetnt receives the traffic fine (301 redirect is working 100percent now and pr went from 0 to 3pr) articles are also good ranging from 400-800 words. 90 percent of them are indexed by google, most of them have been bookmarked to digg reddit etc website domain is over 10 years old - alltopics.com why google doesnt send me the traffic i deserve?
Intermediate & Advanced SEO | | rxesiv0 -
Removing large section of content with traffic, what is best de-indexing option?
If we are removing 100 old urls (archives of authors that no longer write for us), what is the best option? we could 301 traffic to the main directory de-index using no-index, follow 404 the pages Thanks!
Intermediate & Advanced SEO | | nicole.healthline0 -
How to best utilize network of 50 sites to increase traffic on main site
Hey All, First off I wanna thank everyone who has responded to all my previous questions! Love to see a community that is so willing to help those who are learning the ropes! Anyways back to my point. We have a main site that is a PR 3 and our main focal point for lead generation. We recently acquired 50 additional sites (all with a PR of 1-3) that we would like to use as our own little back linking campaign with. All the domains are completely relevant to our main site as well as specific pages within our main site. I know that reciprocal links will get me no where and that google is quickly on to the attempted 3 way link exchange. My question is how do I best link these 50 sites to not only maintain there own integrity and PR but also assist our main site. Thanks All!
Intermediate & Advanced SEO | | deuce1s0