Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Does Search Console data include GMB traffic? Branded CTR is 37.8%- Good or Bad?
Hey all, Per Search Console our branded keyword CTR is 37.8%. But when that keyword is searched our GMB listing shows up on top of the #1 result. For the same 90 day period GMB shows another 35% visits to our GMB (based on the number of impressions and visits to our GMB page) listing when the same keyword is searched. My question is this. Does Search console data include clicks that came from our GMB listing or not? My thinking is like this: If GMB traffic is not calculated in search console then it means that 72.8% of people looking for our brand will end up on our site on way or another 9organic #1 result plus GMB listing visits) We are also doing PPC for this very keyword that has gets almost 20% of the remaining traffic. So after adding all up we are loosing about 8% of our branded traffic to people who are doing adwords. When you search our brand you normally see 2, 3 competitor's adwords ads. Does anyone know how this works exactly? And if you don't mind sharing your branded keyword CTR's, so I can compare to ours please. I would love to compare to a site that actually has a GMB listing ranking for the same keyword Thanks in advance, Davit
Reporting & Analytics | | Davit19850 -
Search Console Crawl Errors/Not Found - Strange URLs
Hello, In Google Search Console under Crawl > Crawl Errors > Not found I have strange URLs like the following: https://www.domain.com//UbaOZ/
Reporting & Analytics | | chuck-layton
https://www.domain.com//UPhXZ/
https://www.domain.com//KaUpZ/WYdhZ/SnQZZ/MOcUZ/ There is no info in Linked From tab. Have you seen this type of error??
Does anyone know whats causing it??
How should it be fixed?? Thanks for reading and the help!0 -
How to Diagnose "Crawled - Currently Not Indexed" in Google Search Console
The new Google Search Console gives a ton of information about which pages were excluded and why, but one that I'm struggling with is "crawled - currently not indexed". I have some clients that have fallen into this pit and I've identified one reason why it's occurring on some of them - they have multiple websites covering the same information (local businesses) - but others I'm completely flummoxed. Does anyone have any experience figuring this one out?
Reporting & Analytics | | brettmandoes2 -
Organic reports showing a URL that isn't in Search Ask Question
In the image I've attached you can see that I have pulled a source/medium > google organic report. I've also made "landing page" my secondary dimension. The first landing page that is showing up is /v3/?slug=fnl, that is this page (https://orders.freshnlean.com/v3/?slug=fnl). You can see that the page has 230 sessions from Sep 3 - 9 and 17 transactions during that same time frame. The only thing is, that landing page is nowhere to be found in the SERPs. So how is it showing up in this report as having received google organic visitors that converted if it's not even in search? 05OclDp
Reporting & Analytics | | tdastru0 -
In Google Analytics > Acquisition > Queries. The top query just says (other), can someone fill me in on to what this means?
I have linked the account to the Google Search Console. In top spot is says (other) followed by other keywords the (other) data is the majority.
Reporting & Analytics | | GSAstonMoz0 -
Having Issue with Site Search in Analytics
Hi Mozzers, We launched a website in October 2012 and have added in the settings(Google analytics) of that profile "Do Track Site Search" since we have a search box on the website. The site search report worked for 10 days and it was over(from end of december till beginning of January 2013). Since then I have been trying to understand this issue. I have added all the query search terms possible, but still not showing any signs of life. At this point I am not sure what to do? Some Help would be appreciated! Search URL= subdomain.example.com**/search/node/**.... Thanks! z93cGUZ.png
Reporting & Analytics | | Ideas-Money-Art0 -
Filter out IP address of Site Search analytics
Hi Mozzers, I have a filter that excludes all internal traffic from my sites. But this does not seem to work on site search > Search Terms See here:- http://productforums.google.com/forum/#!searchin/analytics/filter$20site$20search/analytics/pO18L31hEO4/tJ3lKVNT3YYJ Any ideas? Or is it a bug, etc Thanks S
Reporting & Analytics | | Metropolis0 -
My own brand searches stopped appearing in google
Hi everyone, I've never asked a question here before, so go easy on me 🙂 I've noticed recently that my site is not showing up for my own brand searches anymore, and that my organic google results are way down. For example, it used to be if you would search for "grouvee kings quest 6" you would either see http://www.grouvee.com/news/kings-quest-6-retro-rerun-playthrough-part-1/63/ or http://www.grouvee.com/games/kings-quest-vi-heir-today-gone-tomorrow/69/ show up first in google. Now they're nowhere to be found. Here's another example. If you would search for "grouvee mass effect 3" you'd either see http://www.grouvee.com/games/mass-effect-3/75/ or http://www.grouvee.com/news/mass-effect-3-lets-take-the-earth-back/136/ show up first in google. Once again, nowhere to be found. I could go on with several other examples, but I don't really know what I've done. I 301 redirected 2 pages a few weeks ago because I added a /reviews/ structure to my site and needed to move a couple of pages that used to be under /news/ over to the new section, but those shouldn't have had anything to do with the pages I referenced above. I haven't done any link building really. I submit some of my articles occasionally to places like reddit or n4g, but that's about it. Anyone have any tips or things I should be looking at?
Reporting & Analytics | | petecorsaro0