Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Insane traffic loss and indexed pages after June Core Update, what can i do to bring it back?
Hello Everybody! After June Core Update was released, we saw an insane drop on traffic/revenue and indexed pages on GSC (Image attached below) The biggest problem here was: Our pages that were out of the index were shown as "Blocked by robots.txt", and when we run the "fetch as Google" tool, it says "Crawl Anomaly". Even though, our robots.txt it's completely clean (Without any disallow's or noindex rules), so I strongly believe that the reason that this pattern of error is showing, is because of the June Core Update. I've come up with some solutions, but none of them seems to work: 1- Add hreflang on the domain: We have other sites in other countries, and ours seems like it's the only one without this tag. The June update was primarily made to minimize two SERP results per domain (or more if google thinks it's relevant). Maybe other sites have "taken our spot" on the SERPS, our domain is considerably newer in comparison to the other countries. 2- Mannualy index all the important pages that were lost The idea was to renew the content on the page (title, meta description, paragraphs and so on) and use the manual GSC index tool. But none of that seems to work as well, all it says is "Crawl Anomaly". 3- Create a new domain If nothing works, this should. We would be looking for a new domain name and treat it as a whole new site. (But frankly, it should be some other way out, this is for an EXTREME case and if nobody could help us. ) I'm open for ideas, and as the days have gone by, our organic revenue and traffic doesn't seem like it's coming up again. I'm Desperate for a solution Any Ideas gCi46YE
Intermediate & Advanced SEO | | muriloacct0 -
Natural Fluctuation in Search Traffic
This is going to sound like a weird question... I'm curious to know whether there is a natural fluctuation in the actual number of searches being made online each week. It would be great to relate this to the performance of my own organic traffic each week. For example, if organic search traffic is down 10% week on week, is that because search in general is down 10%? Has anybody ever looking into this?
Intermediate & Advanced SEO | | ausmed0 -
Whats the best way to implement rel = “next/prev” if we have filters?
Hi everyone, The filtered view results in paginated content and has different urls: example: https://modli.co/dresses.html?category=45&price=13%2C71&size=25 Look at what it says in search engine land: http://searchengineland.com/implementing-pagination-attributes-correctly-for-google-114970 Look at Advanced Techniques paragraph. do you agree? it seem like google will index the page multiple times for every filter variant. Thanks, Yehoshua
Intermediate & Advanced SEO | | Yehoshua0 -
Lot of duplicate content and still traffic is increasing... how does it work?
Hello Mozzers, I've a dilemma with a client's site I am working on that is make me questioning my SEO knowledge, or the way Google treat duplicate content. I'll explain now. The situation is the following: organic traffic is constantly increasing since last September, in every section of the site (home page, categories and product pages) even though: they have tons of duplicate content from same content in old and new URLs (which are in two different languages, even if the actual content on the page is in the same language in both of the URL versions) indexation is completely left to Google decision (no robots file, no sitemap, no meta robots in code, no use of canonical, no redirect applied to any of the old URLs, etc) a lot (really, a lot) of URLs with query parameters (which brings to more duplicated content) linked from the inner page of the site (and indexed in some case) they have Analytics but don't use Webmaster Tools Now... they expect me to help them increase even more the traffic they're getting, and I'll go first on "regular" onpage optimization, as their title, meta description and headers are not optimized at all according to the page content, but after that I was thinking on fixing the issues with indexation and content duplication, but I am worried I can "break the toy", as things are going well for them. Should I be confident that fixing these issues will bring to even better results or do you think is better for me to focus on other kind of improvements? Thanks for your help!
Intermediate & Advanced SEO | | Guybrush_Threepw00d0 -
Brand traffic moved from organic to PPC - could it affect rankings?
Hi, We've just increased a lot of branded PPC clicks for one of our clients. I've worked out that roughly 5000 clicks per month has been moved from organic search to PPC (all brand related search queries). These clicks are very cheap, but the client has expressed worries about what these clicks could do to our organic rankings. Lots of brand search in organic results proves to Google that this is a strong brand, right? So what happens when all the searches are still there, but the organic listings stop getting the clicks? Could this have a ring effect on other non-brand rankings?
Intermediate & Advanced SEO | | Inevo0 -
Unexplained Drop In Ranking and Traffic-HELP!
I operate a real estate web site in New York City (www.nyc-officespace-leader.com). It was hit by Penguin in April 2012, with search volume falling from 6,800 per month in March 2012 to 3,300 by June 2012. After refreshing content and changing the theme, volume recovered to 4,300 per month in October 2013. There was a big improvement in early October 2013, perhaps tied to a Panda update. In November 2013 I hired an SEO company. They are reputable; on MOZ's recommended list. After following all their suggestions (searching and removing duplicate content, disavowing toxic links, improving the site structure to make it easier for Google to index listings, re-writing ten key landing pages, improving the design of the user interface) ranking and traffic started to decline in April of 2014 and crashed in June 2014 after an upgraded design with improved user interface was launched. Search volume is went from 4700 in March to around 3800 in June. However ranking on the keywords that generate conversions has really declined, and clicks from those terms are down at least 65%. My online business is severely compromised after I have spent almost double the anticipated budget to improve ranking and conversion. A few questions: 1. Could a drop in the number of domains lining to our site have led to this decline? About 30 domains that had toxic links to us agreed to remove them. We had another 70 domains disavowed in late April. We only have 78 domains pointing to our domain now, far less than before (see attached AHREFs image). It seems there is a correlation in the timeline between the number of domains pointing to us and ranking performance. The number of domains pointing to us has never been this low. Could this be causing the drop? My SEO firm believes that the quality of these links are very low and the fact that many are gone is in fact a plus. 2. The number of indexed pages has jumped to 851 from 675 in early June (see attached image from Google Webmaster tools), right after a site upgrade. The number of pages in the site map is around 650. Could the indexation of the extra 175 page somehow have diluted the quality of the site in Google's eyes? We have filed removal request for these pages in Mid June and again last week with Google but they still appear. In 2013 we also launched an upgrade and Google indexed an extra 500 pages (canonical tags were not set up correctly) and search volume and ranking collapsed. Oddly enough when the number of pages indexed by Google fell, ranking improved. I wonder if something similar has occurred. 3. May 2014 Panda update. Many of our URLs are product URLs of listings. They have less than 100 words. Could Google suddenly be penalizing us for that? It is very difficult to write descriptions of hundreds of words for products that change quickly. I would think the Google takes this into account. If someone could present some insight into this issue I would be very, very grateful. I have spent over $25,000 on SEO reports, wireframe design and coding and now find myself in a worse position than when I started. My SEO provider is now requesting that I purchase even more reports for several thousand dollars and I can't afford it, nor can I justify it after such poor results. I wish they would take it upon themselves to identify what went wrong. In any case, if anyone has any suggestions I would really appreciate it. I am very suspicious that this drop started in earnest at the time of link removal and the disavow and accelerated at the time of the launch of the upgrade. Thanks, Alan XjSCiIdAwWgU2ps e5DerSo tYqemUO
Intermediate & Advanced SEO | | Kingalan10 -
Dramatic decline in traffic with same unchanged rankings
Hello I would be grateful for any input on this. I'm the webmaster of the site.. -> www.worktopfactory.co.uk Before May 22, 2013, penguin 2 updates, i was getting around 700 - 800 Unique hits per day After pengin 2 Updates, There is no difference In ranking... But my traffic has halved Saturday for example the only received 66 hits. Please check my ranking stats Total Keywords 300 Rankings 220 In Top 3 288 On First Page 6. But traffic stats is Week ending: 6/16 Change 6/23 6/16 Change 6/23 6/16 Change 6/23
Intermediate & Advanced SEO | | JaffeyApple
Organic Search Visits
Total number of organic (unpaid) visits to your site from search engines.
1,782 -11% 1,589 37 -16% 31 1,745 -11% 1,558
URLs Receiving Entrances Via Search
The number of distinct URLs on your site that receive one or more organic (unpaid) visits from a search engine.
370 -4% 354 8 13% 9 362 -5% 345
Non-Paid Keywords Sending Search Visits
The number of distinct keywords that send one or more organic (unpaid) visits to your site.
886 -2% 865 8 0% 8 878 -2% 857 My questions are 1. Why is there a major decline in traffic when ranking is more orless same 2. What is the possible solution? 3. Am I targeting wrong keywords? If so, what would the alternatives be? Please note the 300 I have inserted were simply cut and pasted from a list of 1103 targeted kws. I would be grateful for any suggestions, so I may get traffic back to where it was before. Thanks0 -
Ranking & Traffic drops in last month
Over the last month, our rankings have been in a slow slide - that is until this week, when they absolutely crashed. Here are some example phrases: Phrase 11-Mar 5-Mar bug shields 24 9
Intermediate & Advanced SEO | | ShawnHerrick
floor mats 25 14
nerf bars 23 12
running boards 61 14
snow plows 25 18 For the life of me, I can't see what would have caused such drastic changes. Our site is almost completely unique content. Some things, like Warranty & Install instructions, are from the manufacturer to protect us from liabilities. We come up with our own feature text, and we have custom written articles, blog posts, research guides, etc. We also appear to be the only one of our competitors being affected in this fashion. Any thoughts would be helpful. Domain is realtruck.com.0