Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Gradual traffic drop of personal finance website in the last three months
Dear All, I have personal finance website https://mymoneysouq.com and the traffic dropped by less than half of what is was before last three months. I am figuring out all the possible issues and doing everything that comes to our mind to improve the quality of our website. I tried the following before posting here:1. Tried contacting website owner which we think spam and add all such domains to our disavow list2. We found little duplicate content on sites like Quora, we made those answers down by reporting to Quora3. Reported to DMCA on 3 articles articles(partial) from our website.4. We are trying improving user experience5. Removed one of our page that shared by many people but our page was not indexed by Google.6. Checked and modified content if any our articles are having more keywords than what SEO experts recommend. 7. We are working on researching more and figuring our what else can might have gone wrong with our traffic.8. Working on improving EAT I attached our traffic drop graph. I believe this drop is not natural it happened because of some issue at our end and we are not able to figure out the exact reasons.Surprisingly another site with not so high quality content started ranking now in the top.I am here to get community members/experts help on this. I could provide you if you need any further details. Thanks a lot for your time. We really appreciate any tips that you can share with us.Q2S1tlK Q2S1tlK
Intermediate & Advanced SEO | | swamyallamraju0 -
How can I stop spam Google Organic traffic?
Hey Moz, I'm a rather experienced SEO who just encountered a problem I have never faced. I am hoping to get some advice or be pointed in the right direction. I just started work for a new client. Really great client and website. Nicer than most design/content. They will need some rel canonical work but that is not the issue here. The traffic looked great at first glance 131k visits in April. Google Analytics Acquisition Overview showed 94% of the traffic as organic. When I dug deeper and looked at the organic source I saw that Google was 99.9% of it. Normal enough. Then I looked at the time on site and my jaw dropped. 118,454 Organic New Users for Google only stayed on the site for 3 seconds. There is no way that the traffic is real. It does not match what Google Webmaster tools, Moz, and Ahrefs are telling me. How do I stop a service that is sending fake organic Google traffic?
Intermediate & Advanced SEO | | placementLabs0 -
Site Migration and Traffic Help!
Hi Moz, I recently migrated my website with the help of an SEO company using 301 redirects. The reason for the move was to change our CMS from .aspx to Drupal/Wordpress. The homepage (www.shiftins.com) and the blog (www.shiftins.com/blog) were the only two pages that kept the same url. Everything else was redirected. It's been about two months since the redirects were completed and traffic has dropped off about 90%. I'm starting to worry that something was not done properly and my traffic may never return. The process for the redirects seem correct when I checked the work the SEO company did. All pages were duplicated, redirected to individual pages, then the old pages were de-indexed. Are there any insights the community can provide? Please help!
Intermediate & Advanced SEO | | shictins1 -
Has anyone else seen a Google Plus Local listing displace a regular search listing?
I have a particular site that I have been working on for about eight months and had the site on Page 1 of Google search results for eight keywords (they are fairly small local-based keywords, so I'm really not trying to boast). Perhaps six weeks ago for two of the keywords we popped into the #2 position for Google Plus Local results. When this happened the site completely disappeared from the regular search results. A couple weeks later, the Google Plus Local listing was gone, and the site was back on Page 1 in the regular listings. This has gone back and forth several times, with either a very high Local result or a very high regular search result, but only one at a time. I suppose it would make sense for the same site to only be able to have one position on the front page at any given time, but my searches for info on this have been entirely fruitless. Has anyone else seen anything like this or have any thoughts? Cheers.
Intermediate & Advanced SEO | | IanKietzman271 -
How to handle link building to product pages that change regularly?
How do I handle building links to an eCommerce site where the product pages change regularly because product is only available for a certain time frame? Should I focus on building links to the category pages instead?
Intermediate & Advanced SEO | | mj7750 -
Google bot vs google mobile bot
Hi everyone 🙂 I seriously hope you can come up with an idea to a solution for the problem below, cause I am kinda stuck 😕 Situation: A client of mine has a webshop located on a hosted server. The shop is made in a closed CMS, meaning that I have very limited options for changing the code. Limited access to pagehead and can within the CMS only use JavaScript and HTML. The only place I have access to a server-side language is in the root where a Defualt.asp file redirects the visitor to a specific folder where the webshop is located. The webshop have 2 "languages"/store views. One for normal browsers and google-bot and one for mobile browsers and google-mobile-bot.In the default.asp (asp classic). I do a test for user agent and redirect the user to one domain or the mobile, sub-domain. All good right? unfortunately not. Now we arrive at the core of the problem. Since the mobile shop was added on a later date, Google already had most of the pages from the shop in it's index. and apparently uses them as entrance pages to crawl the site with the mobile bot. Hence it never sees the default.asp (or outright ignores it).. and this causes as you might have guessed a huge pile of "Dub-content" Normally you would just place some user-agent detection in the page head and either throw Google a 301 or a rel-canon. But since I only have access to JavaScript and html in the page head, this cannot be done. I'm kinda running out of options quickly, so if anyone has an idea as to how the BEEP! I get Google to index the right domains for the right devices, please feel free to comment. 🙂 Any and all ideas are more then welcome.
Intermediate & Advanced SEO | | ReneReinholdt0 -
When you provide traffic estimates, do you factor in CTR?
There are several studies that show CTR based on position. When a client asks for traffic estimates do you multiply CTR by estimated search volume? Why or why not?
Intermediate & Advanced SEO | | nicole.healthline0 -
Sudden drop in ranks and traffic after migrating community website into main domain
Hi, We recently moved our community website (around 50K web pages) to our main domain. It now resides as a sub-domain on our main website. e.g. Before - we had www.mainwebsite.com and www.communitywebsite.com After - we have www.communitywebsite.mainwebsite.com This change took place on July 19th. After a week, we saw 16% drop in organic traffic to mainwebsite.com. Our ranks on most of the head keywords including brand keywords have dropped. We had created 301 redirects from pages on www.communitywebsite.com before this change was made. Has anybody seen this kind of impact when domains are merged? Should we expect that within 3-4 weeks Google will be able to re-index and re-rank all the pages? Is there anything else we could do to rectify the situation? Any feedback/suggestions are welcome!
Intermediate & Advanced SEO | | Amjath0