Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Huge Drop in Direct Traffic in G4
Our direct traffic dropped 50% in October. Is anyone else seeing a drop in direct traffic in October in G4? It hasn't shifted to another source or unassigned it's just gone. Has anyone else experienced this and what might be the reasons?
Intermediate & Advanced SEO | | inhouseninja1 -
Removing indexed internal search pages from Google when it's driving lots of traffic?
Hi I'm working on an E-Commerce site and the internal Search results page is our 3rd most popular landing page. I've also seen Google has often used this page as a "Google-selected canonical" on Search Console on a few pages, and it has thousands of these Search pages indexed. Hoping you can help with the below: To remove these results, is it as simple as adding "noindex/follow" to Search pages? Should I do it incrementally? There are parameters (brand, colour, size, etc.) in the indexed results and maybe I should block each one of them over time. Will there be an initial negative impact on results I should warn others about? Thanks!
Intermediate & Advanced SEO | | Frankie-BTDublin0 -
How to Canonicalise all filter pages (URL parameters) to the main category
Hi guys, I am working on an e-commerce site that's running in Shopify. I noticed that the filter pages do not have canonical tags pointing to their respective main categories. I doubt that the action needed is to canonicalise each filter pages to the main category as it would take time (there are a lot of filter URLs involved). Do you know any technical coding to do in Shopify to have all filter pages canonicalise to its main category? Keen to hear from you. Cheers
Intermediate & Advanced SEO | | brandonegroup0 -
Normal that Home Page Generating Less than 4% Of Organic Traffic?
Greetings MOZ Community: My firm operates www.nyc-officespace-leader.com, a commercial real estate brokerage in New York City. Prior to the first Penguin update in April 2012, our home page used to receive about 10% or 600 of total organic visits. After the first Penguin was launched by Google organic traffic to the home dropped to maybe 5% or 200 visits per month. Since May of this year, it appears we have been penalized by Penguin 4.0 and are attempting to recover. Now our home page only generates about 140 organic visits per month, or less than 4% of organic traffic. Our home enjoyed good conversion rate, so this drop in traffic is a real loss. Does this very low level of traffic to the home page indicate something abnormal? Dropping from 10% to less than 4% is a major decline. Should we take specific steps regarding the home page like enhancing the content? Thanks, Alan
Intermediate & Advanced SEO | | Kingalan10 -
Long urls created by filters (not with query parameters)
A website adds subfolders to a category URL for each filter that's selected. In a crawl of the website some of these URLs reach over 400 characters. For example, if I select shoe size 5, 5.5 and 6, white and blue colour, price $70-$100, heel and platform styles, the URL will be as follows: www.example.com/shoes/womens/filters/shoe-size--5--5.5--6/color--white--blue/price--70-100/style--heel--platform There is a canonical that points to www.example.com/shoes/womens/ so it isn't a duplicate content issue. But these URLs still get crawled. How would you handle this? It's not a great system so I'm tempted to tell them to start over with best practice recommendations, but maybe I should just tell them to block the "/filters/" folder from crawlers? For some products however, filtered content would be worth having in search indexes (e.g. colour).
Intermediate & Advanced SEO | | Alex-Harford0 -
Keyword search filter in Google Adwords: broad? exact? phrase?
Hello all I am working in my website and analysing the potential best keywords for the SEO (post/page name and url path name). 1. I am using Google Adwords. Any other tool you would recommend? 2. Which selection should I make in the Google Adwords Keyword Tool in order to know the monthly global searches of the keywords I should target? Exact? Phrase? Broad? For instance, KEYWORD SEARCH:"Information about Madrid" BROAD MATCH: 300,000 EXACT MATCH: 1,500 Te potential of the keyword is 300,000? 300,000 searches are undertaken on a month that contains that sentence and its variations? Or the relevant keyword potential is the exacta match traffic? Thank you very much! Antonio
Intermediate & Advanced SEO | | aalcocer20030 -
My traffic dropped over 60% - was I penalized?
Hi all, We launched a major update of our site in the middle of June. We have lots of pages and were indexed very quickly, and started ranking well for long tail terms. Last week, our organic traffic suddenly dropped over 60% as our pages started ranking much lower. One issue we discovered was that our site was responding to all subdomains, not just www, and Google did seem to be crawling two alternate subdomains -- Webmaster Tools shows crawl activity, but no pages indexed on these. We fixed that problem a couple days ago (all subdomains 301 to the www). Is that something that would have caused a sudden drop like we saw? This would have been an issue since the relaunch, though one of the subdomains only started getting crawled (~1,000 pages/day) in August. We have investigated a few other things that may have been a factor: We sent out a press release via iReach a few weeks ago which makes up the majority of our recent backlinks. Our site occasionally returns a 502 no gateway error when under heavy load, Google sees this 3-10 times at day. GA shows a page load spike the day before the drop, but we had worse spikes in the past that did not seem to have an impact. Did we just get lucky with a "honeymoon" phase with Google? This is the site: http://goo.gl/3DCbl Indexing continues -- we now have over 500k pages indexed and Google is crawling faster than ever, about 30,000 pages per day. Thanks!
Intermediate & Advanced SEO | | tact0 -
Our site is recieving traffic for both .com/page and .com/page/ with the trailing slash.
Our site is recieving traffic for both .com/page and .com/page/ with the trailing slash. Should we rewrite to just the trailing slash or without because of duplicates. The other question is, if we do a rewrite, google has indexed some pages with the slash and some without - i am assuming we will lose rank for one of them once we do the rewrite, correct?
Intermediate & Advanced SEO | | Profero0