Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Optimising Shopify Filtered Pages
Hi Guys, Currently working with a couple Shopify ecommerce sites, currently the main category urls cannot be optimised for SEO as they are auto-generated and basically filtered pages. Examples: http://tinyurl.com/hm7nm7p http://tinyurl.com/zlcoft4 One solution we have came up with is to create HTML based pages for each of these categories example: http://site.com.au/collections/women-sandals In the backend and keep the filtered page setup. So these pages can be crawled and indexed. I was wondering if this is the most viable solution to this problem for Shopify? Cheers.
Intermediate & Advanced SEO | | jayoliverwright0 -
Do I have a Panda filter on a specific segment?
Our site gets a decent level of search traffic and doesn't have any site-wide penalty issues, but one of our sections looks like it might be under some form of filter. Unfortunately for us, they're our buy pages! Check out http://www.carwow.co.uk/deals/Volkswagen/Golf it's unique content and I've built white hat links into it, including about 5 from university websites (.ac.uk domains DA70+). If you search something like "volkswagen golf deals" the pages on page 1 have weak thin content and pretty much no links. That content section wasn't always unique, in fact the vast majority of it may well be classed as dupe content as there's no Trim data and they look like this: http://www.carwow.co.uk/deals/Fiat/Punto While we never had much volume, the traffic on all /deals/ pages appears to drop significantly around the time of the May Panda update (4.0). We're planning on completely re-launching these pages with a new design, unique trim content and a paragraph (c.200 words) about the model. Am I right in assuming that there's a Panda filter on the /deals/ segment so regardless of what I do to one deals page it won't rank well, and we have to re-do the whole section?
Intermediate & Advanced SEO | | Matt.Carwow0 -
Site re-design, full site domain A/B test, will we drop in rankings while leaking traffic
We are re-launching a client site that does very well in Google. The new site is on a www2 domain which we are going to send a controlled amount of traffic to, 10%, 25%, 50%, 75% to 100% over a 5 week period. This will lead to a reduction in traffic to the original domain. As I don't want to launch a competing domain the www2 site will not be indexed until 100% is reached. If Google sees the traffic numbers reducing over this period will we drop? This is the only part I am unsure of as the urls and site structure are the same apart from some new lower level pages which we will introduce in a controlled manner later? Any thoughts or experience of this type of re-launch would be much appreciated. Thanks Pete
Intermediate & Advanced SEO | | leshonk0 -
Unexplained Drop In Ranking and Traffic-HELP!
I operate a real estate web site in New York City (www.nyc-officespace-leader.com). It was hit by Penguin in April 2012, with search volume falling from 6,800 per month in March 2012 to 3,300 by June 2012. After refreshing content and changing the theme, volume recovered to 4,300 per month in October 2013. There was a big improvement in early October 2013, perhaps tied to a Panda update. In November 2013 I hired an SEO company. They are reputable; on MOZ's recommended list. After following all their suggestions (searching and removing duplicate content, disavowing toxic links, improving the site structure to make it easier for Google to index listings, re-writing ten key landing pages, improving the design of the user interface) ranking and traffic started to decline in April of 2014 and crashed in June 2014 after an upgraded design with improved user interface was launched. Search volume is went from 4700 in March to around 3800 in June. However ranking on the keywords that generate conversions has really declined, and clicks from those terms are down at least 65%. My online business is severely compromised after I have spent almost double the anticipated budget to improve ranking and conversion. A few questions: 1. Could a drop in the number of domains lining to our site have led to this decline? About 30 domains that had toxic links to us agreed to remove them. We had another 70 domains disavowed in late April. We only have 78 domains pointing to our domain now, far less than before (see attached AHREFs image). It seems there is a correlation in the timeline between the number of domains pointing to us and ranking performance. The number of domains pointing to us has never been this low. Could this be causing the drop? My SEO firm believes that the quality of these links are very low and the fact that many are gone is in fact a plus. 2. The number of indexed pages has jumped to 851 from 675 in early June (see attached image from Google Webmaster tools), right after a site upgrade. The number of pages in the site map is around 650. Could the indexation of the extra 175 page somehow have diluted the quality of the site in Google's eyes? We have filed removal request for these pages in Mid June and again last week with Google but they still appear. In 2013 we also launched an upgrade and Google indexed an extra 500 pages (canonical tags were not set up correctly) and search volume and ranking collapsed. Oddly enough when the number of pages indexed by Google fell, ranking improved. I wonder if something similar has occurred. 3. May 2014 Panda update. Many of our URLs are product URLs of listings. They have less than 100 words. Could Google suddenly be penalizing us for that? It is very difficult to write descriptions of hundreds of words for products that change quickly. I would think the Google takes this into account. If someone could present some insight into this issue I would be very, very grateful. I have spent over $25,000 on SEO reports, wireframe design and coding and now find myself in a worse position than when I started. My SEO provider is now requesting that I purchase even more reports for several thousand dollars and I can't afford it, nor can I justify it after such poor results. I wish they would take it upon themselves to identify what went wrong. In any case, if anyone has any suggestions I would really appreciate it. I am very suspicious that this drop started in earnest at the time of link removal and the disavow and accelerated at the time of the launch of the upgrade. Thanks, Alan XjSCiIdAwWgU2ps e5DerSo tYqemUO
Intermediate & Advanced SEO | | Kingalan10 -
Google Analytics: how to filter out pages with low bounce rate?
Hello here, I am trying to find out how I can filter out pages in Google Analytics according to their bounce rate. The way I am doing now is the following: 1. I am working inside the Content > Site Content > Landing Pages report 2. Once there, I click the "advanced" link on the right of the filter field. 3. Once there, I define to "include" "Bounce Rate" "Greater than" "0.50" which should show me which pages have a bounce rate higher of 0.50%.... instead I get the following warning on the graph: "Search constraints on metrics can not be applied to this graph" I am afraid I am using the wrong approach... any ideas are very welcome! Thank you in advance.
Intermediate & Advanced SEO | | fablau0 -
Best way to handle traffic from links brought in from old domain.
I've seen many versions of answers to this question both in the forum, and throughout the internet... However, none of them seem to specifically address this particular situation. Here goes: I work for a company that has a website (www.example.com) but has also operated under a few different names in the past. I discovered that a friend of the company was still holding onto one of the domains that belonged to one of the older versions of the company (www.asample.com) and he was kind enough to transfer it into our account. My first reaction was to simply 301 redirect the older to the newer. After I did this, I discovered that there were still quite a few active and very relevant links to that domain, upon reporting this to the company owners they were suddenly concerned that a customer may feel misdirected by clicking www.asample.com and having www.example.com pop up. So I constructed a single page on the old domain that explained that www.asample.com was now called www.example.com and provided a link. We recently did a little house cleaning and moved all of our online holdings "under one roof" so to speak, and when the rep was going over things with the owners began to exclaim that this was a horrible idea, and that domain should instead be linked to it's own hosting account, and wordpress (or some other CMS) should be installed, and a few pages of content about the companies/subject should be posted. So the question: Which one of these is the most beneficial to the site and the business that are currently operating (www.example.com?) I don't see a real problem with any of these answers, but I do see a potentially un-needed expense in the third solution if a simple 301 will bring about the most value. Anyone else dealt with a situation like this?
Intermediate & Advanced SEO | | modulusman0 -
Page HTML great for humans, but seems to be very bad for bots?
We recently switched platforms and use Joomla for our website. Our product page underwent a huge transformation and it seems to be user friendly for a human, but when you look at one of our product pages in SEOBrowser it seems that we are doing a horrible job optimizing the page and our html almost makes us look spammy. Here is an example or a product page on our site: http://urbanitystudios.com/custom-invitations-and-announcements/shop-by-event/cocktail/beer-mug And, if you take a look in something like SEObrowser, it makes us look not so good. For example, all of our footer and header links show up. Our color picker is a bunch of pngs (over 60 to be exact), our tabs are the same (except for product description and reviews) on every single product page... In thinking about the bots: 1-How do we handle all of the links from footer, header and the same content in the tabs 2-How do we signal to them that all that is important on the page is the description of the product? 3-We installed schema for price and product image, etc but can we take it further? 4-How do we handle the "attribute" section (i.e. our color picker, our text input, etc). Any clarification I need to provide, please let me know.
Intermediate & Advanced SEO | | UrbanityStudios0 -
Why do i not receive google traffic?
over the 4-5 months i have published over 3000 unique articles which i have payed well over 10 000usd for, but i still only receive about 20 google visitors a day for that content. i uploaded the 3000 articles after i 301 redirected the old site to a a new domain (old site had 1000 articles, and at least 300visits from google a day), and all the old conetnt receives the traffic fine (301 redirect is working 100percent now and pr went from 0 to 3pr) articles are also good ranging from 400-800 words. 90 percent of them are indexed by google, most of them have been bookmarked to digg reddit etc website domain is over 10 years old - alltopics.com why google doesnt send me the traffic i deserve?
Intermediate & Advanced SEO | | rxesiv0