Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Is surfacing top blog posts with read more link could create a boost in traffic to main domain?
Hi mozzers, Because our blog is located on blog.example.com on powered by Wordpress and currently can't migrate it to the main domain, unfortunately. Since we would like to grow our main's domain organic traffic and would like to test an option that could help us leverage the traffic of the top blog posts content. There is a Wordpress API that would allow us to get 100-200 words(snippet of the blog post) from the blog posts into the main domain that would provide a "Read more link" linking back to the blog.
Intermediate & Advanced SEO | | Ty1986
Is this even a good idea assuming we would make sure content is not identical?0 -
Site traffic/sales have plummeted
About 2 months ago we relaunched our Ecommerce store on Shopify Plus and have since seen a massive drop in traffic, sales and our most valuable pages are nowhere to be found. Also, GWT is showing that Google is indexing about half of our pages and none of the images are being indexed. We did extensive keyword research, created/implemented a keyword framework, wrote brand new category/product page content, implemented schema markup, optimized our blog content and even did link building where we got some 90+ DA links. We are literally at a loss for what is causing this. Our experience with Shopify Plus has been very poor because it doesn't even do basic SEO stuff so we've had to do a lot of workarounds to make it "SEO friendly". Has anyone else ever switched to Shopify Plus and had similar issues? Is there a silver bullet that you can think of that we are missing that could get the site being indexed/ranking again?
Intermediate & Advanced SEO | | Aquatell0 -
Site Migration and Traffic Help!
Hi Moz, I recently migrated my website with the help of an SEO company using 301 redirects. The reason for the move was to change our CMS from .aspx to Drupal/Wordpress. The homepage (www.shiftins.com) and the blog (www.shiftins.com/blog) were the only two pages that kept the same url. Everything else was redirected. It's been about two months since the redirects were completed and traffic has dropped off about 90%. I'm starting to worry that something was not done properly and my traffic may never return. The process for the redirects seem correct when I checked the work the SEO company did. All pages were duplicated, redirected to individual pages, then the old pages were de-indexed. Are there any insights the community can provide? Please help!
Intermediate & Advanced SEO | | shictins1 -
Making Filtered Search Results Pages Crawlable on an eCommerce Site
Hi Moz Community! Most of the category & sub-category pages on one of our client's ecommerce site are actually filtered internal search results pages. They can configure their CMS for these filtered cat/sub-cat pages to have unique meta titles & meta descriptions, but currently they can't apply custom H1s, URLs or breadcrumbs to filtered pages. We're debating whether 2 out of 5 areas for keyword optimization is enough for Google to crawl these pages and rank them for the keywords they are being optimized for, or if we really need three or more areas covered on these pages as well to make them truly crawlable (i.e. custom H1s, URLs and/or breadcrumbs)…what do you think? Thank you for your time & support, community!
Intermediate & Advanced SEO | | accpar0 -
Traffic dropped suddenly
-In early January 2013, we had to switch servers after many years with the same one. We were highly ranked and getting about 8500 unique visitors per month. -We didn't notice the traffic falling because we were focussed on a major site redesign and addition that we launched in April 2013. Visits continued to fall, this time also because the company that launched it didn't double check their work and had some dead links etc. Those were all fixed by approximately June 2013.- early January 2014 we switched servers again because we were afraid the new server we moved to was perhaps ranked poorly or was possibly a spamming site before. Currently, nothing has changed. What was about 8500 unique visitors per month 18 months ago, is now about 1,000 and no leads are coming in at all.
Intermediate & Advanced SEO | | HasitR0 -
Influence on CTR for high traffic keyword in url and redirect
I currently dominate on my site for a very high traffic keyword. My url contains this keyword in it along with the word "Free" in the beginning. Lets say my keyword is "This Keyword" then my url would be freethiskeyword.com. I rank 3rd for this keyword and generates me about 8k on a low month. I was just able to obtain my main keyword as my sole URL through an auction for a measly 2,000.00. (Very Excited about this). So now I have the URL thiskeyword.com What I want to know is what kind of influence can I expect with my new URL have in CTR. Since it is a high traffic keyword is there a automatic "Trust" factor that is involved and will users tend to click on thiskeyword.com as apposed to freethiskeyword.com? My Second Question I am torn as to what I should do with this new URL. Should I redirect my old URL to my new URL and keep both pointing to the same site? or should I try and dominate my niche and build a new site entirely. Since I currently make about 8k a month for third, if I were to build a separate site and be able to obtain 1st place for my new keyword that would generate me 2 amounts in income based on stats. CTR based on http://searchenginewatch.com/article/2049695/Top-Google-Result-Gets-36.4-of-Clicks-Study freethiskeyword.com = 8k/m for 3rd based on 10% of clicks (currently) thiskeyword.com = 24k/m for 1st based on 36% of clicks (in theory) If I keep each site separate and be able to have one site at 3rd and the other at 1st then I would be making about 32k a month. If I redirect my old url to my new url then I would only have 1st place (if I make it to first of course) and that would only make me 24k a month. It seems to me I should keep these sites separate to generate more income. I am torn what I should do. Also with the EMD penalty I am afraid to 301 my site to my new URL since it is my exact keyword as apposed to my current one. I am defiantly branded as "Free This Keyword" so moving it to thiskeyword.com could hurt me more than help (at least I think so) What you think?
Intermediate & Advanced SEO | | cbielich0 -
Url structure for multiple search filters applied to products
We have a product catalog with several hundred similar products. Our list of products allows you apply filters to hone your search, so that in fact there are over 150,000 different individual searches you could come up with on this page. Some of these searches are relevant to our SEO strategy, but most are not. Right now (for the most part) we save the state of each search with the fragment of the URL, or in other words in a way that isn't indexed by the search engines. The URL (without hashes) ranks very well in Google for our one main keyword. At the moment, Google doesn't recognize the variety of content possible on this page. An example is: http://www.example.com/main-keyword.html#style=vintage&color=blue&season=spring We're moving towards a more indexable URL structure and one that could potentially save the state of all 150,000 searches in a way that Google could read. An example would be: http://www.example.com/main-keyword/vintage/blue/spring/ I worry, though, that giving so many options in our URL will confuse Google and make a lot of duplicate content. After all, we only have a few hundred products and inevitably many of the searches will look pretty similar. Also, I worry about losing ground on the main http://www.example.com/main-keyword.html page, when it's ranking so well at the moment. So I guess the questions are: Is there such a think as having URLs be too specific? Should we noindex or set rel=canonical on the pages whose keywords are nested too deep? Will our main keyword's page suffer when it has to share all the inbound links with these other, more specific searches?
Intermediate & Advanced SEO | | boxcarpress0 -
Is traffic and content really important for an e-commerce site???
Hi All, I'm maintaining an e-commerce website and I've encountered some related keywords that I know will not convert to sales but are related to the subject and might help becoming an "authority". I'll give an example... If a car dealership wrote an amazing article about cleaning a car.
Intermediate & Advanced SEO | | BeytzNet
Obviously it is related but the chances of someone looking to clean his car will go ahead and buy one now are quite low. Also, he will probably bounce out of this page after reading the piece. To conclude, Would such an article do GOOD (helping to become an authority and having more visitors) or BAD (low conversion rate and high bounce rate)? Thanks0