Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Migrated Domain, 90% Drop in Organic Traffic, HELP!!!!
One week ago we migrated our old domain www.nyc-officespace-leader.com to https://www.metro-manhattan.com/. Our organic search traffic from Google has dropped about 90%. Is this normal? If so, how long should it take to recover? We filed a submitted a domain change request on Webmaster tools one week ago. We are noticing that many of the www.nyc-officespace-leader.com pages are still indexed which seems strange after a week. To complicate things, we filed a disavow file on April 9th for spammy links that pointed the NYC site. We filed the identical disavow of those links to the new Metro domain to ensure low quality links don't point to the new domain. Prior to making the domain change request, we migrated 30-40 non critical pages from NYC to Metro domains. Webmaster Tools indicated that the traffic was normal on the migrated pages. We then migrated remaining pages and filed the domain change request on April 4th. It is after April 4th that traffic and ranking declined. I would like to mention that there was no change in content; identical content was migrated from Metro to NYC This does not seem normal. Research prior to the migration indicated that if proper steps were taken it should proceed with limited disruption in traffic and ranking. Any ideas on how to remedy this situation? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
How will this affect the rankings and traffic of the new site once this happens?
Hi, we will be moving a clients’ site address from one domain to another and will of course be doing 301 redirects and notifying Google of the site address change in WMT. The problem is, that at some point in the future (say 3-6 months), the old domain will be going live with a new site as the current client does not own the domain and the owner will be wanting it back unfortunately. How will this affect the rankings and traffic of the new site (new domain) once this (old domain with new site) happens? Will the site address change be enough to keep the rankings but it will lose backlink traffic? Or will rankings go down since the 301 redirects will in essence no longer be in affect? Many thanks for your help in advance.
Intermediate & Advanced SEO | | WSIDW0 -
Will redirecting poor traffic web pages increase web presence
A number of pages on my site have low traffic metrics. I intend to redirect poor performing pages to the most appropriate page with high traffic. Example
Intermediate & Advanced SEO | | Mark_Ch
www.sampledomomain.co.uk/low-traffic-greyshoes
www.sampledomomain.co.uk/low-traffic-greenshoes
www.sampledomomain.co.uk/low-traffic-redshoes all of the above will be redirected to the following page:
www.sampledomomain.co.uk/high-traffic-blackshoes Question
Will carrying out htaccess redirects from the above example influence to web positioning of both www.sampledomomain.co.uk/high-traffic-blackshoes and www.sampledomomain.co.uk Regards Mark0 -
URL rewrite traffic drop
Hello, A while ago (Sep. 19 2013) we had a new url structure upgrade for products pages within our website (with all the needed 301 redirects in place,internal links & sitemaps updates), but our new urls lost the serps of the old ones and with that we experienced a big traffic drop (and since September I can't see any sign of recovery).
Intermediate & Advanced SEO | | Silviu
Here are just 3 examples of old and coresponding new urls: http://www.nobelcom.com/phone-cards/calling-Mexico-from-United-States-1-182.html
http://www.nobelcom.com/Mexico-phone-cards-182.html http://www.nobelcom.com/es/phone-cards/calling-Mexico-from-United-States-1-182.html
http://www.nobelcom.com/es/Mexico-tarjetas-telefonicas-182.html http://www.nobelcom.com/phone-cards/calling-Angola-Cell-from-Canada-55-407.html
http://www.nobelcom.com/Angola-Cell-phone-cards/from-Canada-55-407.html We followed every seo/usability rule and have no clue why this happened. Any ideea? Cheers,
S.0 -
Big hit to traffic a while ago, and slow recovery. Is there anything we've missed?
www.movehub.com We took a big hit to our organic traffic when we implemented an HTML form which included a list of every country in the world, twice. This rolled out onto every page on our website. And it got indexed by Google (webmaster tools showed our content keywords as being those from the form occurring 9000+ times on the site) We've fixed this and the content keywords are back to normal, however our traffic has not yet fully recovered. Is there anything on our site that you think could be sending spam signals to Google, or could be impeding our organic traffic growth?
Intermediate & Advanced SEO | | AmyCatlow0 -
Why do I get India, Pakistan, Turkey traffic mostly?
Hi there, I've been wondering. Why do I get most of the traffic from these countries? My sites are english, I host in USA. I don't target a thing for those countries traffic, yet I get huge amounts of traffic from these countries. Any ideas?
Intermediate & Advanced SEO | | melbog0 -
How to get traffic from a particular Geographical region?
Our company is based out of India and has a web site with .in domain ; however our target customers are from North America and Australia.
Intermediate & Advanced SEO | | TPS2013
The problem is we get as high as 70% of organic traffic from India.
This 70% traffic from India has little use to us. Possibly because we have ”.in “ domain the Google local search is active.
How to reverse this situation; I mean we are looking for more traffic from across the globe except India.
Any suggestions ? P.S. Changing domain from .in to .com is not an option as its the part of our brand advertised for last 7 years1 -
How to best utilize network of 50 sites to increase traffic on main site
Hey All, First off I wanna thank everyone who has responded to all my previous questions! Love to see a community that is so willing to help those who are learning the ropes! Anyways back to my point. We have a main site that is a PR 3 and our main focal point for lead generation. We recently acquired 50 additional sites (all with a PR of 1-3) that we would like to use as our own little back linking campaign with. All the domains are completely relevant to our main site as well as specific pages within our main site. I know that reciprocal links will get me no where and that google is quickly on to the attempted 3 way link exchange. My question is how do I best link these 50 sites to not only maintain there own integrity and PR but also assist our main site. Thanks All!
Intermediate & Advanced SEO | | deuce1s0