Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Gradual traffic drop of personal finance website in the last three months
Dear All, I have personal finance website https://mymoneysouq.com and the traffic dropped by less than half of what is was before last three months. I am figuring out all the possible issues and doing everything that comes to our mind to improve the quality of our website. I tried the following before posting here:1. Tried contacting website owner which we think spam and add all such domains to our disavow list2. We found little duplicate content on sites like Quora, we made those answers down by reporting to Quora3. Reported to DMCA on 3 articles articles(partial) from our website.4. We are trying improving user experience5. Removed one of our page that shared by many people but our page was not indexed by Google.6. Checked and modified content if any our articles are having more keywords than what SEO experts recommend. 7. We are working on researching more and figuring our what else can might have gone wrong with our traffic.8. Working on improving EAT I attached our traffic drop graph. I believe this drop is not natural it happened because of some issue at our end and we are not able to figure out the exact reasons.Surprisingly another site with not so high quality content started ranking now in the top.I am here to get community members/experts help on this. I could provide you if you need any further details. Thanks a lot for your time. We really appreciate any tips that you can share with us.Q2S1tlK Q2S1tlK
Intermediate & Advanced SEO | | swamyallamraju0 -
Still Seeing GSC Traffic in HTTP Property Post-Migration
We migrated to HTTPS in June 2017, so why would I still be seeing a bit of traffic in our HTTP property in Google Search Console? QyqQ2
Intermediate & Advanced SEO | | catbur0 -
Help, no organic traffic recovery after new site launch (it's been 6 months)!
I worked with a team of developers to launch a new site back in March. I was (and still am) in charge of SEO for the site, including combining 4 sites into 1. I made sure 301 redirects were in place to combine the sites and pretty much every SEO tactic I can think of to make sure the site would maintain rankings following launch. However, here we are 6 months later and YoY numbers are down -70% on average for organic traffic. Anyone mind taking a look at http://www.guestguidepublications.com and seeing if there's a glaring mistake I'm missing?!?!?! Thanks ahead of time!
Intermediate & Advanced SEO | | Annapurna-Digital1 -
Technical SEO Issues - Traffic Drop
Hi guys, I hope you're all doing well! We're a small personalised gifts company who specialise in the provision of phone cases, mugs, macbook covers and the like. I head up the Digital Marketing but have little experience in the technical side of SEO and have very limited resources in terms of budget and staffing. Over the past few months, I've been working on stripping down the thin content on the site, fixing duplicate content issues and focusing on other digital channels to boost revenue. However, as of recent we've noticed a significant drop in traffic and our rankings. I've tried to diagnose the problem and I'm convinced there are some technical SEO fixes that need to be implemented. Our website is www.mrnutcase.com If any of you have any ideas, I'd love to hear some of them. Greatly appreciated, Danny
Intermediate & Advanced SEO | | DannyNutcase0 -
How can I stop spam Google Organic traffic?
Hey Moz, I'm a rather experienced SEO who just encountered a problem I have never faced. I am hoping to get some advice or be pointed in the right direction. I just started work for a new client. Really great client and website. Nicer than most design/content. They will need some rel canonical work but that is not the issue here. The traffic looked great at first glance 131k visits in April. Google Analytics Acquisition Overview showed 94% of the traffic as organic. When I dug deeper and looked at the organic source I saw that Google was 99.9% of it. Normal enough. Then I looked at the time on site and my jaw dropped. 118,454 Organic New Users for Google only stayed on the site for 3 seconds. There is no way that the traffic is real. It does not match what Google Webmaster tools, Moz, and Ahrefs are telling me. How do I stop a service that is sending fake organic Google traffic?
Intermediate & Advanced SEO | | placementLabs0 -
Can a large fluctuation of links cause traffic loss?
I've been asked to look at a site that has lost 70/80% if their search traffic. This happened suddenly around the 17th April. Traffic dropped off over a couple of days and then flat-lined over the next couple of weeks. The screenshot attached, shows the impressions/clicks reported in GWT. When I investigated I found: There had been no changes/updates to the site in question There were no messages in GWT indicating a manual penalty The number of pages indexed shows no significant change There are no particular trends in keywords/queries affected (they all were.) I did discover that ahrefs.com showed that a large number of links were reported lost on the 17th April. (17k links from 1 domain). These links reappeared around the 26th/27th April. But traffic shows no sign of any recovery. The links in question were from a single development server (that shouldn't have been indexed in the first place, but that's another matter.) Is it possible that these links were, maybe artificially, boosting the authority of the affected site? Has the sudden fluctuation in such a large number of links caused the site to trip an algorithmic penalty (penguin?) Without going into too much detail as I'm bound by client confidentiality - The affected site is really a large database and the links pointing to it are generated by a half dozen or so article based sister sites based on how the articles are tagged. The links point to dynamically generated content based on the url. The site does provide a useful/valuable service/purpose - it's not trying to "game the system" in order to rank. That doesn't mean to say that it hasn't been performing better in search than it should have been. This means that the affected site has ~900,000 links pointing to is that are the names of different "entities". Any thoughts/insights would be appreciated. I've expresses a pessimistic outlook to the client, but as you can imaging they are confused and concerned. LVSceCN.png
Intermediate & Advanced SEO | | DougRoberts0 -
Almost no organic traffic
Hi, We have an online store, it is up & running since January 1st. Since then we really didn't see any improvements on our organic traffic at all. About 10% of our traffic is coming from organic search, and more than 20% of organic search actually coming from branded keywords. We haven't paid a lot of attention to SEO so far. I mean, we paid attention to the practices, however we focused on a better customer/user experience more than SEO. We improved our product pages, reduced checkout process to one step, used bigger icons / buttons. According to our customers, our website is pretty easy to navigate and shop. We haven't received any major complaint so far. Except couple of products, all the content we have is original, we didn't use any manufacturer product content or copied from another website. However, looks like all these efforts don't mean a lot to Google, unless we have a solid backlinks. Currently i am considering to make category pages NOINDEX and implement microdata from schema.org. However, Is it good idea to make category pages NOINDEX for an ecommerce website? I would like to hear your comments/recommendations what else we can do to create some organic traffic.
Intermediate & Advanced SEO | | serkie0 -
Ranking & Traffic drops in last month
Over the last month, our rankings have been in a slow slide - that is until this week, when they absolutely crashed. Here are some example phrases: Phrase 11-Mar 5-Mar bug shields 24 9
Intermediate & Advanced SEO | | ShawnHerrick
floor mats 25 14
nerf bars 23 12
running boards 61 14
snow plows 25 18 For the life of me, I can't see what would have caused such drastic changes. Our site is almost completely unique content. Some things, like Warranty & Install instructions, are from the manufacturer to protect us from liabilities. We come up with our own feature text, and we have custom written articles, blog posts, research guides, etc. We also appear to be the only one of our competitors being affected in this fashion. Any thoughts would be helpful. Domain is realtruck.com.0