Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filtering Views in GA
Hi there, Does anyone here any experience in filtering views in Google Analytics by TLD? I thought the filter type of hostname would have done what I was looking for but it hasn't and I can only find information online about doing it for subdomains rather than top level ones. Many thanks in advance.
Intermediate & Advanced SEO | | BAO.Agency0 -
Will Regularly Adding New Blog Posts Improve Ranking?
We have added very little new website content in the last year. Our domain is www.metro-manhattan.com. Would adding a brand-new blog post once a week help improve our ranking in Google? A few years ago adding new content would've had quickly had a positive effect. Is that still the case? Or should we focus content creation resources in other areas such as social media? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Google Mobile algo traffic issue?
Hello, I have just been approach by a website owner - site isn't mobile friendly in any way - and they've seen a significant fall off in traffic since 23 Jan... backlink profile is clean (and no linkbuilding undertaken) - nothing else has changed... - more than half their traffic is via mobile devices and they've lost a good 1/3 of their traffic - and drilling deeper it's their organic traffic that's been hit. Anybody else seeing similar? edit... for reference: https://www.davidnaylor.co.uk/google-released-mobile-algorithm-think.html
Intermediate & Advanced SEO | | McTaggart0 -
Search traffic down 30% this month
Our search traffic has been growing at a steady clip for the last year but is down about 30% this month. As part of a redesign, we've repurposed our home page (blog.getvero.com). Rather than serve as a feed of recent posts, it's now an email signup page. We created a new page (blog.getvero.com/posts/) to display new posts. I think this is likely the reason for the drop in search traffic but I'm frustrated that it's losing us thousands of visitors per month. A few questions: 1. How long will it take to recover from this? 2. Is there anything we can do to speed up the recovery process? 3. Why are some of our best performing posts seeing less search traffic even though the URL hasn't changed? Any help is greatly appreciated.
Intermediate & Advanced SEO | | Nobody16116983020420 -
Traffic impact from switching hosting.
Good Afternoon! Does anybody know what sort of impact I can expect to see from switching hosting? Not only that but how long it takes to come back from that sort of thing? Our website has steadily been dropping since I took it over about a month ago. I have been slowly, tediously trying to prune the bad stuff, and one of our issues is with out host. Any thoughts would be great! Thanks.
Intermediate & Advanced SEO | | HashtagHustler0 -
Unexplained Drop In Ranking and Traffic-HELP!
I operate a real estate web site in New York City (www.nyc-officespace-leader.com). It was hit by Penguin in April 2012, with search volume falling from 6,800 per month in March 2012 to 3,300 by June 2012. After refreshing content and changing the theme, volume recovered to 4,300 per month in October 2013. There was a big improvement in early October 2013, perhaps tied to a Panda update. In November 2013 I hired an SEO company. They are reputable; on MOZ's recommended list. After following all their suggestions (searching and removing duplicate content, disavowing toxic links, improving the site structure to make it easier for Google to index listings, re-writing ten key landing pages, improving the design of the user interface) ranking and traffic started to decline in April of 2014 and crashed in June 2014 after an upgraded design with improved user interface was launched. Search volume is went from 4700 in March to around 3800 in June. However ranking on the keywords that generate conversions has really declined, and clicks from those terms are down at least 65%. My online business is severely compromised after I have spent almost double the anticipated budget to improve ranking and conversion. A few questions: 1. Could a drop in the number of domains lining to our site have led to this decline? About 30 domains that had toxic links to us agreed to remove them. We had another 70 domains disavowed in late April. We only have 78 domains pointing to our domain now, far less than before (see attached AHREFs image). It seems there is a correlation in the timeline between the number of domains pointing to us and ranking performance. The number of domains pointing to us has never been this low. Could this be causing the drop? My SEO firm believes that the quality of these links are very low and the fact that many are gone is in fact a plus. 2. The number of indexed pages has jumped to 851 from 675 in early June (see attached image from Google Webmaster tools), right after a site upgrade. The number of pages in the site map is around 650. Could the indexation of the extra 175 page somehow have diluted the quality of the site in Google's eyes? We have filed removal request for these pages in Mid June and again last week with Google but they still appear. In 2013 we also launched an upgrade and Google indexed an extra 500 pages (canonical tags were not set up correctly) and search volume and ranking collapsed. Oddly enough when the number of pages indexed by Google fell, ranking improved. I wonder if something similar has occurred. 3. May 2014 Panda update. Many of our URLs are product URLs of listings. They have less than 100 words. Could Google suddenly be penalizing us for that? It is very difficult to write descriptions of hundreds of words for products that change quickly. I would think the Google takes this into account. If someone could present some insight into this issue I would be very, very grateful. I have spent over $25,000 on SEO reports, wireframe design and coding and now find myself in a worse position than when I started. My SEO provider is now requesting that I purchase even more reports for several thousand dollars and I can't afford it, nor can I justify it after such poor results. I wish they would take it upon themselves to identify what went wrong. In any case, if anyone has any suggestions I would really appreciate it. I am very suspicious that this drop started in earnest at the time of link removal and the disavow and accelerated at the time of the launch of the upgrade. Thanks, Alan XjSCiIdAwWgU2ps e5DerSo tYqemUO
Intermediate & Advanced SEO | | Kingalan10 -
Subdomain Blog Sitemap link - Add it to regular domain?
Example of setup:
Intermediate & Advanced SEO | | EEE3
www.fancydomain.com
blog.fancydomain.com Because of certain limitations, I'm told we can't put our blogs at the subdirectory level, so we are hosting our blogs at the subdomain level (blog.fancydomain.com). I've been asked to incorporate the blog's sitemap link on the regular domain, or even in the regular domain's sitemap. 1. Putting the a link to blog.fancydomain.com/sitemap_index.xml in the www.fancydomain.com/sitemap.xml -- isn't this against sitemap.org protocol? 2. Is there even a reason to do this? We do have a link to the blog's home page from the www.fancydomain.com navigation, and the blog is set up with its sitemap and link to the sitemap in the footer. 3. What about just including a text link "Blog Sitemap" (linking to blog.fancydomain.com/sitemap_index.html) in the footer of the www.fancydomain.com (adjacent to the text link "Sitemap" which already exists for the www.fancydomain.com's sitemap. Just trying to make sense of this, and figure out why or if it should be done. Thanks!0 -
My traffic dropped over 60% - was I penalized?
Hi all, We launched a major update of our site in the middle of June. We have lots of pages and were indexed very quickly, and started ranking well for long tail terms. Last week, our organic traffic suddenly dropped over 60% as our pages started ranking much lower. One issue we discovered was that our site was responding to all subdomains, not just www, and Google did seem to be crawling two alternate subdomains -- Webmaster Tools shows crawl activity, but no pages indexed on these. We fixed that problem a couple days ago (all subdomains 301 to the www). Is that something that would have caused a sudden drop like we saw? This would have been an issue since the relaunch, though one of the subdomains only started getting crawled (~1,000 pages/day) in August. We have investigated a few other things that may have been a factor: We sent out a press release via iReach a few weeks ago which makes up the majority of our recent backlinks. Our site occasionally returns a 502 no gateway error when under heavy load, Google sees this 3-10 times at day. GA shows a page load spike the day before the drop, but we had worse spikes in the past that did not seem to have an impact. Did we just get lucky with a "honeymoon" phase with Google? This is the site: http://goo.gl/3DCbl Indexing continues -- we now have over 500k pages indexed and Google is crawling faster than ever, about 30,000 pages per day. Thanks!
Intermediate & Advanced SEO | | tact0