Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Still Seeing GSC Traffic in HTTP Property Post-Migration
We migrated to HTTPS in June 2017, so why would I still be seeing a bit of traffic in our HTTP property in Google Search Console? QyqQ2
Intermediate & Advanced SEO | | catbur0 -
Spike then Drop in Direct Traffic?
We've been doing some SEO work over the last few weeks and earlier this week we saw a large spike in traffic. Yay we all thought, but then yesterday the traffic levels returned to pre-celebratory levels. I've been doing some digging to try and find out what was different Monday and Tuesday this week. Mondays are usually big traffic days for us anyway, but this week was by far the biggest, and Tuesday was even higher still, our best day ever. After some poking, I found that the direct traffic followed the same pattern as our overall traffic levels (image attached). The first spike coincides with an email we sent out that day, but the later spike we just don't know where it came from? I understand loosely that direct isn't easily traceable, but can anyone help us understand more about this second spike? Thanks! ayqL2wi
Intermediate & Advanced SEO | | HB170 -
Big hit to traffic a while ago, and slow recovery. Is there anything we've missed?
www.movehub.com We took a big hit to our organic traffic when we implemented an HTML form which included a list of every country in the world, twice. This rolled out onto every page on our website. And it got indexed by Google (webmaster tools showed our content keywords as being those from the form occurring 9000+ times on the site) We've fixed this and the content keywords are back to normal, however our traffic has not yet fully recovered. Is there anything on our site that you think could be sending spam signals to Google, or could be impeding our organic traffic growth?
Intermediate & Advanced SEO | | AmyCatlow0 -
Search traffic decline after redesign and new URL
Howdy Mozzers I’ve been a Moz fan since 2005, and been doing SEO since. This is my first major question to the community! I just started working for a new company in-house, and we’ve uncovered a serious problem. This is a bit of a long one, so I’m hoping you’ll stick it out with me! ***Since the images aren't working, here's a link to the google doc with images. https://docs.google.com/document/d/1I-iLDjBXI4d59Kl3uRMwLvpihWWKF3bQFTTNRb1R3ZM/edit?usp=sharing Background The site has gone through a few changes in the past few years. Drupal 5 and 6 hosted at bcbusinessonline.ca and now on Drupal 7 hosted at bcbusiness.ca. The redesigned responsive design site launched on January 9th, 2013. This includes changing the structure of the URL’s, such as categories, tags, and articles. We submitted a change of address through GWT shortly after the change. Problem Organic site traffic is down 50% over the last three months. Below, Google analytics, and Google Webmaster Tools shows the decline. *They used the same UA number for Google analytics, so that’s why the data is continuous Organic traffic to the site. January 2011 - Dips in January are because of the business crowd on holidays. Google Webmaster Tools data exported for bcbusiness.ca starting as far back as I could get. Redirects During the switch, the site went from bcbusinessonline.ca to bcbusiness.ca. They were implemented as 302’s on January 9th, 2013 to test, then on January 15th, they were all made 301’s. Here is how they were set up: Original: http://www.bcbusinessonline.ca/bcb/bc-blogs/conference/2010/10/07/11-phrases-never-use-your-resume --301-- http://www.bcbusiness.ca/bcb/bc-blogs/conference/2010/10/07/11-phrases-never-use-your-resume --301-- http://www.bcbusiness.ca/careers/11-phrases-never-to-use-on-your-resume Canonical issue On bcbusiness.ca, there are article pages (example) that are paginated. All of the page 2 to page N were set to the first page of the article. We addressed this issue by removing the canonical tag completely from the site on April 16th, 2013. Then, by walking through the Ayima Pagination Guide we decided for immediate and least work choice was to noindex, follow all the pages that simply list articles (example). Google Algorithm Changes (Penguin or Panda) According to SEOmoz Google Algorithm Changes there is no releases that could have impacted our site at the February 20th ballpark. However - Sitemap We have a sitemap submitted to Google Webmaster Tools, and currently have 4,229 pages indexed of 4,312 submitted. But there are a few pages we looked at that there is an inconsistency between what GWT is reporting and what a “site:” search reports. Why would the submit to index button be showing, if it’s in the index? That page is in the sitemap. Updated: 2012-11-28T22:08Z Change Frequency: Yearly Priority: 0.5 *GWT Index Stats from bcbusiness.ca What we looked at so far The redirects are all currently 301’s GWT is reporting good DNS, Server Connectivity, and Robots.txt Fetch We don’t have noindex or nofollow on pages where we haven’t intended them to be. Robots.txt isn’t blocking GoogleBot, or any pages we want to rank. We have added nofollow to all ‘Promoted Content’ or paid advertising / advertorials We had TextLinkAds on our site at one point but I removed them once I satarted working here (April 1). Sitemaps were linking to the old URL, but now updated (April)
Intermediate & Advanced SEO | | Canada_wide_media1 -
Does Google bot read embedded content?
Is embedded content "really" on my page? There are many addons nowadays that are used by embedded code and they bring the texts after the page is loaded. For example - embedded surveys. Are these read by the Google bot or do they in fact act like iframes and are not physically on my page? Thanks
Intermediate & Advanced SEO | | BeytzNet0 -
Can't seem to get traffic back post Panda / Penguin. WHY?
I have done and am doing everything I can think of to bring back lost traffic after the late 2012 updates from google hit us. I just is not working. We had some issues with our out of house web developers which screwed up our site in 2012 and after taking it in house we have Eden doing damage control form months now. We think we have fixed pretty much everything. URL structure filling up with good unique content(under way. Lots still to do) making better category descriptions redesigned homepage. Updated product pages (CMS is holding things back on that part otherwise they would be better. New CMS under construction) started more link building(its a real weak spot on our SEO as far as I can see) audited bad links from dodgy irelavent sites. hired writers to create content and link bait articles. Begun making high quality video's for both YouTube (brand awareness and viral) and on site hosting (link building and conversions) (in the pipeline not online yet). Flattened out site architecture. optimise internal link flow (got this wrong by using nofollows. In the process of thinking of a better way by reducing nun wanted Nav links on page.) i realise its not all done but I have been working ever since the drop in traffic and I'm just seeing no increase at all. I have been asking a few questions on here for the past few days but still can't put my finger on the issue. Am I just impatient and need to wait on the traffic as I am doing all the correct things? Or have I missed something and need to fix it. you anyone would like to have a quick look at my site and see if there is an obvious issue I have missed It would be great as I have been tearing my hair out trying to find the issues with my site. It's www.centralsaddlery.co.uk Criticism would me much appreciated.
Intermediate & Advanced SEO | | mark_baird0 -
Sudden Index drop, but traffic increased?
Here are the numbers- Pages submitted on sitemap- About 18k Total Pages indexed on 12/30- About 250k Total Pages indexed on 1/6- About 81k We made no site changes in that week, why the sudden drop? Also why is total pages indexed so much higher than sitemap?
Intermediate & Advanced SEO | | EcommerceSite0