Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Keywords and keyword traffic
Hi I am struggling to know what keywords i should be targeting and how the website should be best optimised for said keywords. The website offers bespoke service in the lake district UK a popular tourist destination, The business operates within say a 30 km riadus of the area. So target vistors to the website would specifically be looking for services in the lake district. The trouble is for many targeted keywords for the area are quite low or no data shown. For example: tipi camping lake district, tipi hire lake district, Glamping lake district However nationally keywords for the service have a lot higher traffic i.e. tipi hire or tipi camping, glamping what keywords should be my target? and should I targeting my website for? I don't want to target customers looking for these services outside of the lake district and also by targeting keywords without the term lake district means my competition is greater as i'm competing with the whole of the Uk for serivces It can't provide. please advise thanks
Intermediate & Advanced SEO | | Bengo-990 -
Domain Migration of high traffic site:
We plan to perform a domain migration in 6 months time.
Intermediate & Advanced SEO | | lcourse
I read the different articles on moz relating to domain migration, but some doubts remain: Moving some linkworthy content upfront to new domain was generally recommended. I have such content (free e-learning) that I could move already now to new domain.
Should I move it now or just 2 months before migration?
Should I be concerned whether this content and early links could indicate to google a different topical theme of the new domain ? E.g. in our case free elearning app vs a commercial booking of presential courses of my core site which is somehow but not extremely strongly related) and links for elearning app may be very specific from appstores and from sites about mobile apps. we still have some annoying .php3 file extensions in many of our highest traffic pages and I would like to drop the file-extension (no further URL change). It was generally recommended to minimize other changes at the same time of domain migration, but on the other hand implementing later another 301 again may also not be optimum and it would save time to do it all at the same time. Shall I do the removal of the file extension at the same time of the domain migration or rather schedule it for 3 months later? On the same topic, would the domain migration be a good occasion to move to https instead of http at the same time, or also should we rather do this at a different time? Any thoughts or suggestions?0 -
website Based in India But need traffic from Europe and North America
We are based in India but have all our prospective clientele in Europe and North America. The problem is ; despite all our efforts we are getting almost 60% traffic from India which is not our target region. We have already tried following hosting our website on US server adding GB and US language tags webmaster target region only allows one country so we cannot set the target there Apart from this any other suggestion? Prashant
Intermediate & Advanced SEO | | TPS20130 -
Sudden Index drop, but traffic increased?
Here are the numbers- Pages submitted on sitemap- About 18k Total Pages indexed on 12/30- About 250k Total Pages indexed on 1/6- About 81k We made no site changes in that week, why the sudden drop? Also why is total pages indexed so much higher than sitemap?
Intermediate & Advanced SEO | | EcommerceSite0 -
E-commerce Site - Filter Pages
Hi, We have a client who has a fairly large e-commerce site that went live quite recently. The site is near enough fully indexed by Google, but one thing I've noticed is that filtered search results pages are being indexed, all with duplicate page titles. Obviously this is an issue that needs to be looked at ASAP. My questions is this - would we be better tweaking site settings so that page titles are constructed from the filters (brand/price/size) and therefore unique (and useful for searchers who are after a specific brand or size of a given item). Or should we rel=canonical the filtered pages so that they are eventually dropped from the index (the safer of the two options)? Thanks in advance for your help!
Intermediate & Advanced SEO | | jasarrow0 -
UK Company Major drop in traffic & rankings on one primary keyword since March
I am helping out a small UK company who have had a sudden drop in organic search traffic since March 24th. Investigation highlights some issues with the site,e.g. Potential canonicalization of home page, a few html errors, some inbound links to the /index.html version of the homepage rather than /. But, nothing particualrly major and nothing that is different to pre-March 24th. The indexed pages looks ok in Google (although Bing is ranking the non-www version of the homepage) but this does not appear in Google's index. Searches for the company name on Google.co.uk show it as top result & some keywords are ranking reasonably well (based on homepage). Selecting blocks of text from the homepage and it ranks #1, but its Google rank for the primary keyword has gone from #2 pre-March 24th to not in the top 100 results since. SEOMOZ is grading the page A for the keyword which appears prominently on the page & keyword is the first characters of the title. It is not a particularly competitive keyword. Adding UK to the keyword and the page is Google.co.uk ranked #3. It's almost as if they are being penalised for a single keyword which I've never seen or heard of before. Any ideas? ** The company has never carried out any SEO - white hat or black hat. The site is perfectly normal, nothing dodgy or concerning about it at all.** Thanks in advance for your advice.
Intermediate & Advanced SEO | | bjalc20110 -
Do any of you regularly use expired domains?
I know there has been discussion on using expired domains in the past. This is not so much a question as to how to do it or whether it works, but rather I would love to see how many of you use this in your backlink strategy. I have a domain in a low to moderately competitive niche that ranks really well, mostly on the power of a couple of expired domains. I bought the domains, created a quick wordpress site and pointed some anchor texted links to the site. It took some time for the expired domains to regain their PR, but when they did, the benefit was great. I'm considering whether I want to do this with another domain of mine. On one hand, it's a relatively inexpensive way to get some good quality anchor texted links. But, on the other hand, something in it feels "immoral" or "sneaky" to me. What do you think?
Intermediate & Advanced SEO | | MarieHaynes0