Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Keywords and keyword traffic
Hi I am struggling to know what keywords i should be targeting and how the website should be best optimised for said keywords. The website offers bespoke service in the lake district UK a popular tourist destination, The business operates within say a 30 km riadus of the area. So target vistors to the website would specifically be looking for services in the lake district. The trouble is for many targeted keywords for the area are quite low or no data shown. For example: tipi camping lake district, tipi hire lake district, Glamping lake district However nationally keywords for the service have a lot higher traffic i.e. tipi hire or tipi camping, glamping what keywords should be my target? and should I targeting my website for? I don't want to target customers looking for these services outside of the lake district and also by targeting keywords without the term lake district means my competition is greater as i'm competing with the whole of the Uk for serivces It can't provide. please advise thanks
Intermediate & Advanced SEO | | Bengo-990 -
My direct traffic went up and my organic traffic went down. Help!
So on Oct. 21, our direct traffic increased 3x and our organic traffic decreased 3x. And it has been that way ever since. Almost like they flip flopped. Additionally, that was the same day I started retargeting to our site. I have tagged all the links from the ads and they're being counted as google paid clicks in GA. And our accounts are linked. I am just dumbfounded as to how this could happen.
Intermediate & Advanced SEO | | Eric_OWPP1 -
Traffic drop on this site
I am SEO'ing this site but need some assistance in the analysis. it was doing not too bad but in the last 4 months the google traffic has really fallen off, i suspect the keywords may need improving but any tips or observations would be great.
Intermediate & Advanced SEO | | crowng0 -
How to increase traffic?
This may seem a bit of a broad question but grateful for all knowledgeable input! There are many things to focus on when looking at organic SEO, but where should I be spending the most time/effort in order to increase traffic.
Intermediate & Advanced SEO | | seoman100 -
Why do I get India, Pakistan, Turkey traffic mostly?
Hi there, I've been wondering. Why do I get most of the traffic from these countries? My sites are english, I host in USA. I don't target a thing for those countries traffic, yet I get huge amounts of traffic from these countries. Any ideas?
Intermediate & Advanced SEO | | melbog0 -
Need to move highest content pages into a sub-domain and want to minimize the loss of traffic - details inside!
Hi All! So the company that I work for owns two very strong domains in the information security industry. There are two separate sections on each site that draws a ton of long tail SEO traffic. For our corporate site we have a vulnerability database where people search for vulnerabilities to research, and find out how to remediate. On our other website we have an exploit database where people can look up exploits in order to see how to patch an attackers attack path. We are going to move these into a super database under our corporate domain and I want to ensure that we maintain or minimize the traffic loss. The exploit database which is currently on our other domain yields about three quarters of the traffic to the domain. It is obviously OK if that traffic goes directly to this new subdomain. What are my options to keep our search traffic steady for this content? There are thousands and thousands of these vulnerabilities and exploits so it would not make sense to 301 redirect all of them. What are some other options and what would you do?
Intermediate & Advanced SEO | | PatBausemer0 -
Keywords Directing Traffic To Incorrect Pages
We're experiencing an issue where we have keywords directing traffic to incorrect child landing pages. For a generic example using fake product types, a keyword search for XL Widgets might send traffic to a child landing page for Commercial Widgets instead. In some cases, the keyword phrase might point a page for a child landing page for a completely different type of product (ex: a search for XL Widgets might direct traffic to XL Gadgets instead). It's tough to figure out exactly why this might be happening, since each page is clearly optimized for its respective keyword phrase (an XL Widgets page, a Commercial Widgets page, an XL Gadgets page, etc), yet one page ends up ranking for another page’s keyword, while the desired page is pushed out of the SERPs. We're also running into an issue where one keyword phrase is pointing traffic to three different child landing pages where none of the ranking pages are the page we've optimized for that keyword phrase, or the desired page we want to rank appears lower in the SERPs than the other two pages (ex: a search for XL Widgets shows XL Gadgets on the first SERP, Commercial Widgets on the second SERP, and then finally XL Widgets down on the third or fourth SERP). We suspect this may be happening because we have too many child landing pages that are targeting keyword terms that are too similar, which might be confusing the search engines. Can anyone offer some insight into why this may be happening, and what we could potentially do to help get the right pages ranking how we'd like?
Intermediate & Advanced SEO | | ShawnHerrick0 -
What to do after a sudden drop in traffic on May 8?
Hello, I own Foodio54.com, which provides restaurant recommendations (mostly for the US). I apologize in advance for the lengthy questions below, but we're not sure what else to do. On May 8 we first noticed a dip in Google results, however the full impact of this sudden change was masked by an increase in Mother's Day traffic and is only today fully apparent. It seems as though we've lost between 30% and 50% of our traffic. We have received no notices in Google Webmaster Tools of any unnatural links, nor do we engage in link buying or anything else that's shady, and have no reason to believe this is a manual action. I have several theories and I was hoping to get feedback on them or anything else that anyone thinks could be helpful. 1. We have a lot of pictures of restaurants and each picture has its own page and these pages aside from the image are very similar. I decided to put a noindex,follow on the picture pages (just last night) especially considering Google's recent changes to image search that send less traffic anyways. Is there any way to remove these faster? There's about 3.5 million of them. I was going to exclude them in robots.txt, but that won't help the ones that are already indexed. Example Photo Page: http://foodio54.com/photos/trulucks-austin-2143458 2. We recently (within the last 2 months) got menu data from SinglePlatform, which also provides menus to UrbanSpoon and Yelp and many others, we were worried that adding a page just for menus that was identical to what is on Urbanspoon et all would just be duplicate content so we added these inline with our listing pages. We've added menus on about 200k listings.
Intermediate & Advanced SEO | | MikeVH
A. Is Google considering this entire listing page duplicate content because the menu is identical to everyone else?
B. If it is, should we move the menus to their own pages and just exclude them with robots.txt? We have an idea on how to make these menus unique for us, but it's going to be a while before we can create enough content to make that worthwhile. Example Listing with Menu: http://foodio54.com/restaurant/Austin-TX/d66e1/Trulucks 3. Anything else? Thank you in advance. Any insight anyone in the community has would be greatly appreciated. --Mike Van Heyde0