Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sudden drop in organic traffic after migration from Django to Wordpress.
I have seen a sudden drop organic reach in a particular page of our website www.hackerearth.com/innovation earlier this was www.hackerearth.com/sprint. I although understand that it happens while migration but it has been a while we did the migration. The migration happened around May month. Something similar has happened to our blog. Earlier it was a blog.hackerearth.com now hackerearth.com/blog _Could anyone suggest me what could be the possible issue for the drop in traffic? _
Intermediate & Advanced SEO | | Rajnish_HE0 -
Optimising Shopify Filtered Pages
Hi Guys, Currently working with a couple Shopify ecommerce sites, currently the main category urls cannot be optimised for SEO as they are auto-generated and basically filtered pages. Examples: http://tinyurl.com/hm7nm7p http://tinyurl.com/zlcoft4 One solution we have came up with is to create HTML based pages for each of these categories example: http://site.com.au/collections/women-sandals In the backend and keep the filtered page setup. So these pages can be crawled and indexed. I was wondering if this is the most viable solution to this problem for Shopify? Cheers.
Intermediate & Advanced SEO | | jayoliverwright0 -
Over 30,000 pages but only 100 get traffic... can I kill the others?
I have a website with over 30,000 pages. But only around 100 are getting traffic from Google/being used by the company. How safe is it for me to kill the other pages? Usually I'd do rel canonical or 301's to scrap as much link juice as I can from them, but at 30,000 we just don't have any place to 301 the pages that makes sense and rel canonical to irrelevant pages seems... wrong? So my hope was to just kill the pages, reuse their content when needed, but pretty much start fresh. Let me know your thoughts. Thanks,
Intermediate & Advanced SEO | | jacob.young.cricut0 -
My site shows 503 error to Google bot, but can see the site fine. Not indexing in Google. Help
Hi, This site is not indexed on Google at all. http://www.thethreehorseshoespub.co.uk Looking into it, it seems to be giving a 503 error to the google bot. I can see the site I have checked source code Checked robots Did have a sitemap param. but removed it for testing GWMT is showing 'unreachable' if I submit a site map or fetch Any ideas on how to remove this error? Many thanks in advance
Intermediate & Advanced SEO | | SolveWebMedia0 -
Site re-design, full site domain A/B test, will we drop in rankings while leaking traffic
We are re-launching a client site that does very well in Google. The new site is on a www2 domain which we are going to send a controlled amount of traffic to, 10%, 25%, 50%, 75% to 100% over a 5 week period. This will lead to a reduction in traffic to the original domain. As I don't want to launch a competing domain the www2 site will not be indexed until 100% is reached. If Google sees the traffic numbers reducing over this period will we drop? This is the only part I am unsure of as the urls and site structure are the same apart from some new lower level pages which we will introduce in a controlled manner later? Any thoughts or experience of this type of re-launch would be much appreciated. Thanks Pete
Intermediate & Advanced SEO | | leshonk0 -
My organic traffic has died!! Why?
Hi there, I have recently updated this website, as in new design, new URL structure, the works. The old site before I introduced the new site was doing well, as in up and down around 2 000 visitors a day. The site sort of started to go a bit down the slightest bit just before we launched the new site on the same domain but nothing of any concern. We 301ed all the old URLs to their corresponding URL structure as to keep most of the link juice from the old site, however with the new site going live the begining of this month June 2014, till now the site has dropped off to just 200 per day. I have been searching high and low for an answer and have been comming up blank. Why could this have happened, what did I do incorrectly? www.zulu.org.za
Intermediate & Advanced SEO | | ProsperoDigital0 -
Lost 86% of traffic after moving old static site to WordPress
I hired a company to convert an old static website www.rawfoodexplained.com with about 1200 pages of content to WordPress. Four days after launch it lost almost 90% of traffic. It was getting over 60,000 uniques while nobody touched the site for several years. It’s been 21 days since the WordPress launch. I read a lot of stuff prior to moving it (including Moz's case study) and I was expecting to lose in short term 30% of traffic max… I don’t understand what is wrong. The internal link structure is the same, every url is 301 to the same url only without[dot]html (ie www.rawfoodexplained.com/science.html is 301′s to http://www.rawfoodexplained.com/science/ ), it’s added to Google Webmaster tool and Google indexed the new pages… Any ideas what could be possible wrong? I do understand the website is not optimized (meta descriptions etc, but it wasn't before either) .... Do you think putting back the old site would recover the traffic? I would appreciate any thoughts Thank you
Intermediate & Advanced SEO | | JakubH0 -
What to do after a sudden drop in traffic on May 8?
Hello, I own Foodio54.com, which provides restaurant recommendations (mostly for the US). I apologize in advance for the lengthy questions below, but we're not sure what else to do. On May 8 we first noticed a dip in Google results, however the full impact of this sudden change was masked by an increase in Mother's Day traffic and is only today fully apparent. It seems as though we've lost between 30% and 50% of our traffic. We have received no notices in Google Webmaster Tools of any unnatural links, nor do we engage in link buying or anything else that's shady, and have no reason to believe this is a manual action. I have several theories and I was hoping to get feedback on them or anything else that anyone thinks could be helpful. 1. We have a lot of pictures of restaurants and each picture has its own page and these pages aside from the image are very similar. I decided to put a noindex,follow on the picture pages (just last night) especially considering Google's recent changes to image search that send less traffic anyways. Is there any way to remove these faster? There's about 3.5 million of them. I was going to exclude them in robots.txt, but that won't help the ones that are already indexed. Example Photo Page: http://foodio54.com/photos/trulucks-austin-2143458 2. We recently (within the last 2 months) got menu data from SinglePlatform, which also provides menus to UrbanSpoon and Yelp and many others, we were worried that adding a page just for menus that was identical to what is on Urbanspoon et all would just be duplicate content so we added these inline with our listing pages. We've added menus on about 200k listings.
Intermediate & Advanced SEO | | MikeVH
A. Is Google considering this entire listing page duplicate content because the menu is identical to everyone else?
B. If it is, should we move the menus to their own pages and just exclude them with robots.txt? We have an idea on how to make these menus unique for us, but it's going to be a while before we can create enough content to make that worthwhile. Example Listing with Menu: http://foodio54.com/restaurant/Austin-TX/d66e1/Trulucks 3. Anything else? Thank you in advance. Any insight anyone in the community has would be greatly appreciated. --Mike Van Heyde0