Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Query on google analytic event report and customized report - Please help.
For my site I placed Black Friday offer.Now I have highlighted one product at homepage banner and tracking that with event. I am sharing with you actual figures of 1 day from Google analytic. Your proper response on this query will really really really helpful to me for understanding google analytic deeply and properly. So pls give your precious time for answering me, I will be thankful to you from bottom of my heart. I am checking event report from behavior, following is report for by that product event - Sessions - 30
Reporting & Analytics | | pragnesh9639
Unique Events - 30
Total Events - 34
Ecommerce Conversion Rate - 3.33% I am checking now report from All pages via Behaviour section - Page views - 97, Unique Page Views - 62, Entrances - 7, Bounce Rate - 57.14%, %Exit - 17.53% Now I create customize report for the same by adding
session -7
Ecommerce conversion rate - 0
unique events -4 MY query - when I added session in customized report then it is different from Event session why (7 and 30) ? same way Ecommerce conversion rate in customized report showing 0% why ( 0% and 3.33% ) ? same way unique events 4 and 30 ? do you suggest what is the best way to analysis such page or what will be best customize report for such analysis? Thanks0 -
What is the "UPDATE" indicate in the Google Search Console Query Reports?
We recently noticed an update note in the Google Search Console that happened on April 27th. Does this denote an algorithm update? Any feedback or article would really be helpful. Thanks! gfQ8FG9.jpg
Reporting & Analytics | | RosemaryB0 -
New GSC Search Analytics report: position mixes web and image
Dear all, I am auditing a site in Google Seach Console (GSC, formerly Google Webmaster Tools) and find the Position data in the new Search Analytics report very, very improbable. I suspect that even if you filter by "SearchType = web", the Position data does count the ranking of images in the Image search widget as a search position. Has anybody observed this as well? Here is the case: the site targets a quite broad search query in the bath room domain. I have made a number of searches with private browser sessions, different browsers, alternative IP address via a VPN, etc, and the look of the search result in the relevant geographical market is consistently the following. Three Adwords ads #1 organic result Images universal results widget #2-10 organic results The site’s first page ranks consistently around #15 of the organic results, hence on the second SERP. But it also consistently has an image in the Images universal results widget (usually #2 or #3). This is consistent with the data I have in Moz Analytics. Yet, the GSC Search Analytics report shows 2.2 as average position with the default SearchType=Web setting. I have done the search over and over, and never has a PAGE of the site ranked that high. Is there any public information how exactly the position is calculated? I mean, something more precise than the very general information on https://support.google.com/webmasters/answer/6155685?hl=en Is there any way to get the correct position/ranking? Thanks for sharing your experience!
Reporting & Analytics | | QRN0 -
Can not divide in different properties a domain in Search Console (Webmaster Tools)
Dear Moz Community, I hope you can give me a hand with the following questions. Im in charge of SEO of an ecommerce site in LATAM. It´s service is available in several countries, therefore each country has it subdirectory Eg. /ar /pe /co /bo /cl /br,etc... (in the future we will move to differente ccTLDs). I have been recomended to split or create different Search Console or Webmaster Tools properties (one for each subdirectory) but when Im creating a new property with a subdirectory, lets say www.domain.com/ar, Webmaster tools starts creating a property for www.domain.com/ar/ (NOTICE THE LAST SLASH) and it returns since that page doesn´t exist, what do you recomend me to do? Best wishes, Pablo Lòpez C
Reporting & Analytics | | pablo_carrara0 -
Has anyone seen their GWT Search Queries data resolve?
My GWT data seems to be have resolved through 9/28, but still looking low for the week starting 9/29 through present.
Reporting & Analytics | | EmpireToday0 -
Search Traffic Drops Before It Improves?
I'm working on a site with tons of great, useful content....the owners of the site implemented a new site layout and design (complete overhaul) and they were lacking basics such as meta descriptions, 301 redirects, and, shockingly, they had the same Title tag for every single page on a site with thousands of unique how-to articles. Unsurprisingly their traffic dropped by about 300%. They generate most of their traffic from people learning how to build stairs, how to install crown molding, and other related matters. Beginning last Thursday I've been performing basic on-site SEO, things like having unique titles for each page and similar tasks. The week from Thursday when I began until yesterday (Wednesday), Google traffic dropped -29.73% - 17,715 vs 25,210 I believe this is a normal part of the "Google Shuffle" -- does anyone have a Matt Cutts link or similar proof that this is a normal part of the process?
Reporting & Analytics | | wattssw0 -
Reasons for drop in URLs Receiving Entrances Via Search
Hi I'm having trouble understanding why I'm getting the results I am for my organic traffic data. I've been focussing on a few keywords throughout my website and the most recent results show that there is a big increase in the Organic Search Visits and the Non-Paid Keywords Sending Search Visits for both Branded Keywords and Non-branded Keywords, but the results for URLs Receiving Entrances Via Search are the complete opposite. Down by a few percent. I don't understand why this would happen and was hoping that someone could maybe explain and give a few reasons for why this is happening and maybe give some tips on how to stop it from happening in the future if possible. Thanks.
Reporting & Analytics | | Bonx0 -
SERPS different based on location of search even with non-personalization
Hello Mozzers, Our agency's website, www.kenta.ro - ranked for a long time at #1 for "ann arbor seo" and similar keywords. For the past several (4-5) months we've been sitting around #5. My guess was that this was Google playing around with the results but I'm not sure why we have been at this position for such a long time. I have a vpn that I use for checking rankings overseas and if I connect to a server in Chicago, LA, Ontario, etc we show up as #1 - only when you search for "ann arbr seo" in our area do we get a lower ranking. All rank checking programs including seomoz show us at the #1 position because of this. What this means for us is that all of the traffic we target with this keyword sees the poor result, while the rest of the world sees the great result (should they search for it). How can we ensure that our target market finds us at #1 like the rest of the world does? Thank you in advance. weabi.png
Reporting & Analytics | | kentaro-2569290