Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Filtering Views in GA
Hi there, Does anyone here any experience in filtering views in Google Analytics by TLD? I thought the filter type of hostname would have done what I was looking for but it hasn't and I can only find information online about doing it for subdomains rather than top level ones. Many thanks in advance.
Intermediate & Advanced SEO | | BAO.Agency0 -
Website Traffic Is Down
Hi, My Website www.financeninvestments.com is down for almost now 2 years. I was receiving the good traffic before this but now the traffic is almost down. I want to again do something to get my Traffic back with some consistent efforts. So what efforts should i do to make this back.Pls suggest.
Intermediate & Advanced SEO | | rahulsoni250 -
How to measure traffic for a keyword
Sitting in Country A I want to see how much traffic a particular keyword receives in Country B. Whats the best way to do it? Also, will the search results differ if I am analyzing the above sitting in Country A viz-a-viz Country B. In other words, will the IP of the country I am making the search from play a role in the results?
Intermediate & Advanced SEO | | KS__0 -
Why isn't my uneven link flow among index pages causing uneven search traffic?
I'm working with a site that has millions of pages. The link flow through index pages is atrocious, such that for the letter A (for example) the index page A/1.html has a page authority of 25 and the next pages drop until A/70.html (the last index page listing pages that start with A) has a page authority of just 1. However, the pages linked to from the low page authority index pages (that is, the pages whose second letter is at the end of the alphabet) get just as much traffic as the pages linked to from A/1.html (the pages whose second letter is A or B). The site gets a lot of traffic and has a lot of pages, so this is not just a statistical biip. The evidence is overwhelming that the pages from the low authority index pages are getting just as much traffic as those getting traffic from the high authority index pages. Why is this? Should I "fix" the bad link flow problem if traffic patterns indicate there's no problem? Is this hurting me in some other way? Thanks
Intermediate & Advanced SEO | | GilReich0 -
Why do I get India, Pakistan, Turkey traffic mostly?
Hi there, I've been wondering. Why do I get most of the traffic from these countries? My sites are english, I host in USA. I don't target a thing for those countries traffic, yet I get huge amounts of traffic from these countries. Any ideas?
Intermediate & Advanced SEO | | melbog0 -
Google Analytics: how to filter out pages with low bounce rate?
Hello here, I am trying to find out how I can filter out pages in Google Analytics according to their bounce rate. The way I am doing now is the following: 1. I am working inside the Content > Site Content > Landing Pages report 2. Once there, I click the "advanced" link on the right of the filter field. 3. Once there, I define to "include" "Bounce Rate" "Greater than" "0.50" which should show me which pages have a bounce rate higher of 0.50%.... instead I get the following warning on the graph: "Search constraints on metrics can not be applied to this graph" I am afraid I am using the wrong approach... any ideas are very welcome! Thank you in advance.
Intermediate & Advanced SEO | | fablau0 -
Subdomain Blog Sitemap link - Add it to regular domain?
Example of setup:
Intermediate & Advanced SEO | | EEE3
www.fancydomain.com
blog.fancydomain.com Because of certain limitations, I'm told we can't put our blogs at the subdirectory level, so we are hosting our blogs at the subdomain level (blog.fancydomain.com). I've been asked to incorporate the blog's sitemap link on the regular domain, or even in the regular domain's sitemap. 1. Putting the a link to blog.fancydomain.com/sitemap_index.xml in the www.fancydomain.com/sitemap.xml -- isn't this against sitemap.org protocol? 2. Is there even a reason to do this? We do have a link to the blog's home page from the www.fancydomain.com navigation, and the blog is set up with its sitemap and link to the sitemap in the footer. 3. What about just including a text link "Blog Sitemap" (linking to blog.fancydomain.com/sitemap_index.html) in the footer of the www.fancydomain.com (adjacent to the text link "Sitemap" which already exists for the www.fancydomain.com's sitemap. Just trying to make sense of this, and figure out why or if it should be done. Thanks!0 -
Question about putting high traffic keywords in my Primary navigation menu.
Hello, I seem to be having a bit of a dilemma with making a crucial site architecture decision about which high traffic keyword I should put in my primary navigation menu. I am the owner of a computer repair business that I am currently re branding out of necessity for a few reasons. My existing business website has been established for the past 5 years now and I do all of the SEO and have been on the 1st Page of GOOGLE for anything computer repair related since day 1 however, like I said am re branding my company and migrating from Joomla to WordPress so it is a great time to make some positive and effective changes to my site architecture. I am going to be using the Silo Site Architecture on the new Site and I have a very firm working knowledge on the process but I seem to have hit a snag or dilemma with one of my Primary Navigation Categories for the Silo Theme. My specif question is this please.. Doing keyword research the Keyword Phrase "Computer Repair" is the most highly searched for keyword phrase for people that have computer related problems (naturally) and Ideally "Computer Repair" should be one of my Main Menu Navigation Silo Category Themes. But... here lies the problem.... If I go with "Computer Repair" in the (Main Nav Menu) then although it gets - 823,000 Local Monthly Searches I would be opening myself up to a potential problem because normally, most people associate the Phrase Computer Repair with Desktop Computer Repair. So in essence I would be forced to use an alternate other than "Computer Repair" for the Desktop Computer Repair structure in the Silo Theme (Sidebar Nav Menu). The Keyword Phrase "Desktop Repair" gets only - 12,100 Local Monthly Searches so basically no one uses the Search Phrase "Desktop Repair. when they are looking to get their computer repaired. I hope that I did not just confuse you? Still confused? Continue reading and I will dissect my psycho babble for you..... "The Semantic Historical Logic" Historically, a Desktop has always been referred to as a computer. Hence the reason why even still today, when our "Desktop" has problems and we need to get it fixed, we Search for "Computer Repair". Why is that? That's a very good question and here is "exactly" why. Long before we had Laptops, Netbooks, Tablets and Smart Phones we had the all encompassing and mighty "Computer" that allowed us to connect to the rest of the world. It was not until Laptops actually came about where there was a need to assign an actual _"Classification System"_and all mighty and powerful "Computer" became a "Desktop Computer**"!!! ** So, there you have it. This is the reason why "Computer Repair" is synonymous with "Desktop Repair" and why "NO ONE" searches for desktop repair when their Desktop Computer is broken! ============================================================= ACTUAL EXAMPLES WITH SCREEN SHOTS BELOW! If I go with Example A: I have the the Highest Traffic Keyword Phrase in my Mast Head (Main Nav Menu) but would be forced to use Desktop Repair to classify (Desktop Repair) in (Sidebar Nav Menu) instead using the keyword phrase "Computer Repair" to classify Desktop Repair. Example A: Main Nav Theme Category = "Computer Repair" = 823,000 Monthly Loc Child Pages/ Categories = -Desktop Repair = **12,100 ** Monthly Loc -Laptop Repair = 165,000 Monthly Loc -Tablet Repair = 165,000 Monthly Loc -Remote Desktop = 1,000,000 Monthly Loc I am using WordPress - (Pages / Child Pages) not Categories & Posts! So, as you can see from (Example A:) above, not being able to use the keyword phrase "Computer Repair" to classify the "Desktop Repair" section kind of opens me up for failure to a good extent as most of my business is done on regular desktop computers which people generally think "Computer Repair" when they are searching to have their Desktop Repaired. ============================================================= Example B: Main Nav Theme Category = **"Computer Service" = **246,000 Monthly Loc Child Pages/ Categories = -Computer Repair = 823,000 Monthly Loc -Laptop Repair = 165,000 Monthly Loc -Tablet Repair = 165,000 Monthly Loc -Remote Desktop = 1,000,000 Monthly Loc I am using WordPress - (Pages / Child Pages) not Categories & Posts! Now, with (Example B:) even though the keyword Phrase "Computer Service" is not the more favorable item to have as the Silo Theme Category in the Main Navigation Menu, we can see that it is much more favorable in terms of Local Monthly Searches over the just about non searched for phrase "Desktop Repair" So as you see, I have a bit of a dilemma that a more experienced SEO could counsel me on. The question is, through your experience, which scenario would you see as more favorable for the site Architecture example A: or example B: This brings me to my next question that also creates some confusion for me. If you say I think (Example B:) would be my better bet what would you recommend that I do with the URL Structure if "Computer Service" is the Parent Page for the Silo Theme? Example: I am using the /%category%/%postname%/ permalink structure for the Silo Site Architecture for the (Blog Section) only - and am using WP Pages and Child Pages for my Silo Content for my Services (Not Posts). Would this URL be a problem in Googles eyes or a customers eyes and be perceived as SPAMMY ... http://www.pcmedicsoncall.com/computer-services/computer-repair/ More than likely, I would say yes because it looks that way to me! My question to you in regards to the link structure above is, If I take the "Computer Service" page and change the "SLUG" to (services) yes it will look better but... will that effectively work against me??? EDIT: ^^ Answered my own question on the Services deal directly above. ^^ Thank you for reading my very long winded questions but I am pretty detailed and I think that the better that I explained it the less writing and guessing what I meant would be better for all concerned (typing wise) Thank you very much and I look forward to your insightful expertise and wisdom. Marshall COMPUTER-SERVICE-MAST-HEAD.png COMPUTER-SERVICE-MAST-HEAD.png
Intermediate & Advanced SEO | | MarshallThompson310