Moz Q&A is closed.
After more than 13 years, and tens of thousands of questions, Moz Q&A closed on 12th December 2024. Whilst we’re not completely removing the content - many posts will still be possible to view - we have locked both new posts and new replies. More details here.
Regular Expressions for Filtering BOT Traffic?
-
I've set up a filter to remove bot traffic from Analytics. I relied on regular expressions posted in an article that eliminates what appears to be most of them.
However, there are other bots I would like to filter but I'm having a hard time determining the regular expressions for them.
How do I determine what the regular expression is for additional bots so I can apply them to the filter?
I read an Analytics "how to" but its over my head and I'm hoping for some "dumbed down" guidance.
-
No problem, feel free to reach out if you have any other RegEx related questions.
Regards,
Chris
-
I will definitely do that for Rackspace bots, Chris.
Thank you for taking the time to walk me through this and tweak my filter.
I'll give the site you posted a visit.
-
If you copy and paste my RegEx, it will filter out the rackspace bots. If you want to learn more about Regular Expressions, here is a site that explains them very well, though it may not be quite kindergarten speak.
-
Crap.
Well, I guess the vernacular is what I need to know.
Knowing what to put where is the trick isn't it? Is there a dummies guide somewhere that spells this out in kindergarten speak?
I could really see myself botching this filtering business.
-
Not unless there's a . after the word servers in the name. The . is escaping the . at the end of stumbleupon inc.
-
Does it need the . before the )
-
Ok, try this:
^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.|rackspace cloud servers)$|gomez
Just added rackspace as another match, it should work if the name is exactly right.
Hope this helps,
Chris
-
Agreed! That's why I suggest using it in combination with the variables you mentioned above.
-
rackspace cloud servers
Maybe my problem is I'm not looking in the right place.
I'm in audience>technology>network and the column shows "service provider."
-
How is it titled in the ISP report exactly?
-
For example,
Since I implemented the filter four days ago, rackspace cloud servers have visited my site 848 times, , visited 1 page each time, spent 0 seconds on the page and bounced 100% of the time.
What is the reg expression for rackspace?
-
Time on page can be a tricky one because sometimes actual visits can record 00:00:00 due to the way it is measured. I'd recommend using other factors like the ones I mentioned above.
-
"...a combination of operating system, location, and some other factors can do the trick."
Yep, combined with those, look for "Avg. Time on Page = 00:00:00"
-
Ok, can you provide some information on the bots that are getting through this that you want to sort out? If they are able to be filtered through the ISP organization as the ones in your current RegEx, you can simply add them to the list: (microsoft corp| ... ... |stumbleupon inc.|ispnamefromyourbots|ispname2|etc.)$|gomez
Otherwise, you might need to get creative and find another way to isolate them (a combination of operating system, location, and some other factors can do the trick). When adding to the list, make sure to escape special characters like . or / by using a \ before them, or else your RegEx will fail.
-
Sure. Here's the post for filtering the bots.
Here's the reg x posted: ^(microsoft corp|inktomi corporation|yahoo! inc.|google inc.|stumbleupon inc.)$|gomez
-
If you give me an idea of how you are isolating the bots I might be able to help come up with a RegEx for you. What is the RegEx you have in place to sort out the other bots?
Regards,
Chris
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Huge organic traffic drom after a perfect domain migration. What to do?
Hi, I already asked the question on different places. But so far nobody could help me.
Intermediate & Advanced SEO | Jun 12, 2020, 6:06 PM | Dennis1992038
Hope someone can help me out. If possible.
I migrated my website https://vihara.nl to https://meditatieinstituut.nl and lost about 80% traffic (see printscreens). It's over more than a month ago now and there is no sign of getting it back up. Maybe there is nothing to do and
1. I have to be patient and traffic comes back in a few months.
or
2. There is nothing to do and I've lost everything I've build up in the last years. Start over again to get the rankings back.
or maybe, maybe
3. I just forgot something that I still need to do to get the rankings back up. Or there is something I did not think of... This is done: The website is migrated 1 on 1. No changes in content, url, code, etc. Everything is exactly the same as on the previous domain. 301 redirects whole domain (via htaccess a bulk redirect). All the old pages, without exceptions, lead to the exact new page. The new domain is running from CDN (Cloudflare) with the same settings as the previous domain. SSL is installed in the exact same way. Domain migration set up in Search console (working). Uploaded new sitemap (working). Updated internal links. Changed the most important external links (where I could get contact after reaching out) In meanwhile received some new external links and also posted new content Anybody knows what to do? Or do I just have to be more patient and will it come back in a few months by itself? Looking forward to suggetions. Thanks! Gerjan Migratie-Meditatie-Instituut-2048x786.jpg verloop-sinds-de-start-2048x355.jpg0 -
Can Google Crawl AJAX filters?
Can Google crawl and render pages within Ajax Filters?
Intermediate & Advanced SEO | Jun 29, 2015, 2:31 PM | ScottOlson0 -
Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?
my site uses is set up at http://www.site.com I have my site redirected from non- www to the www in htacess file. My question is... what should my robots.txt file look like for the non-www site? Do you block robots from crawling the site like this? Or do you leave it blank? User-agent: * Disallow: / Sitemap: http://www.morganlindsayphotography.com/sitemap.xml Sitemap: http://www.morganlindsayphotography.com/video-sitemap.xml
Intermediate & Advanced SEO | Jun 17, 2015, 5:48 PM | morg454540 -
Spike then Drop in Direct Traffic?
We've been doing some SEO work over the last few weeks and earlier this week we saw a large spike in traffic. Yay we all thought, but then yesterday the traffic levels returned to pre-celebratory levels. I've been doing some digging to try and find out what was different Monday and Tuesday this week. Mondays are usually big traffic days for us anyway, but this week was by far the biggest, and Tuesday was even higher still, our best day ever. After some poking, I found that the direct traffic followed the same pattern as our overall traffic levels (image attached). The first spike coincides with an email we sent out that day, but the later spike we just don't know where it came from? I understand loosely that direct isn't easily traceable, but can anyone help us understand more about this second spike? Thanks! ayqL2wi
Intermediate & Advanced SEO | May 24, 2015, 9:47 PM | HB170 -
Subdomains vs directories on existing website with good search traffic
Hello everyone, I operate a website called Icy Veins (www.icy-veins.com), which gives gaming advice for World of Warcraft and Hearthstone, two titles from Blizzard Entertainment. Up until recently, we had articles for both games on the main subdomain (www.icy-veins.com), without a directory structure. The articles for World of Warcraft ended in -wow and those for Hearthstone ended in -hearthstone and that was it. We are planning to cover more games from Blizzard entertainment soon, so we hired a SEO consultant to figure out whether we should use directories (www.icy-veins.com/wow/, www.icy-veins.com/hearthstone/, etc.) or subdomains (www.icy-veins.com, wow.icy-veins.com, hearthstone.icy-veins.com). For a number of reason, the consultant was adamant that subdomains was the way to go. So, I implemented subdomains and I have 301-redirects from all the old URLs to the new ones, and after 2 weeks, the amount of search traffic we get has been slowly decreasing, as the new URLs were getting index. Now, we are getting about 20%-25% less search traffic. For example, the week before the subdomains went live we received 900,000 visits from search engines (11-17 May). This week, we only received 700,000 visits. All our new URLs are indexed, but they rank slightly lower than the old URLs used to, so I was wondering if this was something that was to be expected and that will improve in time or if I should just go for subdomains. Thank you in advance.
Intermediate & Advanced SEO | Jun 4, 2014, 9:32 AM | damienthivolle0 -
Can a large fluctuation of links cause traffic loss?
I've been asked to look at a site that has lost 70/80% if their search traffic. This happened suddenly around the 17th April. Traffic dropped off over a couple of days and then flat-lined over the next couple of weeks. The screenshot attached, shows the impressions/clicks reported in GWT. When I investigated I found: There had been no changes/updates to the site in question There were no messages in GWT indicating a manual penalty The number of pages indexed shows no significant change There are no particular trends in keywords/queries affected (they all were.) I did discover that ahrefs.com showed that a large number of links were reported lost on the 17th April. (17k links from 1 domain). These links reappeared around the 26th/27th April. But traffic shows no sign of any recovery. The links in question were from a single development server (that shouldn't have been indexed in the first place, but that's another matter.) Is it possible that these links were, maybe artificially, boosting the authority of the affected site? Has the sudden fluctuation in such a large number of links caused the site to trip an algorithmic penalty (penguin?) Without going into too much detail as I'm bound by client confidentiality - The affected site is really a large database and the links pointing to it are generated by a half dozen or so article based sister sites based on how the articles are tagged. The links point to dynamically generated content based on the url. The site does provide a useful/valuable service/purpose - it's not trying to "game the system" in order to rank. That doesn't mean to say that it hasn't been performing better in search than it should have been. This means that the affected site has ~900,000 links pointing to is that are the names of different "entities". Any thoughts/insights would be appreciated. I've expresses a pessimistic outlook to the client, but as you can imaging they are confused and concerned. LVSceCN.png
Intermediate & Advanced SEO | Jun 4, 2014, 3:44 AM | DougRoberts0 -
Lost 86% of traffic after moving old static site to WordPress
I hired a company to convert an old static website www.rawfoodexplained.com with about 1200 pages of content to WordPress. Four days after launch it lost almost 90% of traffic. It was getting over 60,000 uniques while nobody touched the site for several years. It’s been 21 days since the WordPress launch. I read a lot of stuff prior to moving it (including Moz's case study) and I was expecting to lose in short term 30% of traffic max… I don’t understand what is wrong. The internal link structure is the same, every url is 301 to the same url only without[dot]html (ie www.rawfoodexplained.com/science.html is 301′s to http://www.rawfoodexplained.com/science/ ), it’s added to Google Webmaster tool and Google indexed the new pages… Any ideas what could be possible wrong? I do understand the website is not optimized (meta descriptions etc, but it wasn't before either) .... Do you think putting back the old site would recover the traffic? I would appreciate any thoughts Thank you
Intermediate & Advanced SEO | Oct 20, 2013, 12:01 PM | JakubH0 -
Subdomain Blog Sitemap link - Add it to regular domain?
Example of setup:
Intermediate & Advanced SEO | Jun 14, 2013, 3:21 PM | EEE3
www.fancydomain.com
blog.fancydomain.com Because of certain limitations, I'm told we can't put our blogs at the subdirectory level, so we are hosting our blogs at the subdomain level (blog.fancydomain.com). I've been asked to incorporate the blog's sitemap link on the regular domain, or even in the regular domain's sitemap. 1. Putting the a link to blog.fancydomain.com/sitemap_index.xml in the www.fancydomain.com/sitemap.xml -- isn't this against sitemap.org protocol? 2. Is there even a reason to do this? We do have a link to the blog's home page from the www.fancydomain.com navigation, and the blog is set up with its sitemap and link to the sitemap in the footer. 3. What about just including a text link "Blog Sitemap" (linking to blog.fancydomain.com/sitemap_index.html) in the footer of the www.fancydomain.com (adjacent to the text link "Sitemap" which already exists for the www.fancydomain.com's sitemap. Just trying to make sense of this, and figure out why or if it should be done. Thanks!0