Ways to analyze a 1M rows dataset of search queries
-
Hi,
I have this large dataset, about 1 million search queries with visits, bounce rate and a few other metrics. I'm trying to explore this data to find keyword "buckets" (such as include product name, location name, transactional objective, informational, etc.), as well as explore the density of certain keywords (keywords as in instances of a single word amongst all queries)
My idea was to use Excel and a macro to split all queries in separate words (also clearing punctuation and uppercase/lowercase), then storing this word in a new worksheet, adding to another column the visit counts from the row where the word was extracted (as to give a sense of weight). Before adding the word to the new worksheet, the script will look if the word already existed, if so it would just add the current value of visits to the existing visit counts etc.
In the end it will create sort of a "dictionary" of all the keywords in all search queries ranked by weight (= visits from search query including this keyword)
This would help me get started I believe, because I can't segment and analyze 1M raw search queries...
My issue is: this VBA has been running on my (fast) PC for the last 24hr and it doesn't seem to get to an end. Obviously excel+VBA is not the best way to do text mining and manipulation in such a large dataset (although it's just a 30mb file)
What would you do if you had this dataset and would like to mine the text/semantic as I am doing? Any idea of tools? process?
I'm considering dumping this data into a MySQL db and doing the processing through PHP (the only backend language I'm versed in), and getting the "summified" data stored into another table, which I'll then be able to export to a Excel for analysis. But I'm afraid that I'll be facing memory limit issues and such...
In the meantime, I'm definitely interested into knowing what you guys would do if you had this data and wanted to simply start exploring its constituencies
Thanks!
-
Yeah, Access can process any number of rows. It's Microsoft's database program. You can upload data, and then create queries. They have a design view where you can construct queries in a WYSIWYG fashion, or if you want, you can write your own SQL.
-
Thanks a lot John!
I'm going to try this out tonight!
So, I assume, Access won't have the same processing limitations with 1 million rows, will it?
Once I'll be done with the "discovery phase" I'm going through with this keyword list, I'll definitely use Advanced filters (in Excel) as you recommend to understand keyword groups in details
-
I had a similar problem going through my search query reports. If you're already familiar with VB you could do this with a Microsoft Access database rather than setting up a MySQL one w/PHP. I've been working on creating an Access database that I can import my data into, and have it spit out all sorts of useful info (for example negative keywords and placements), but it's only in its early stages right now.
If you just want to see it for a few terms and don't mind doing it one at a time, in the past I've filtered data like this in Excel without VB using advanced filters. I found that using advanced filters rather than VB sped up the process quite a bit; I'd imagine because it's an innate Excel function. Using 4 filters you can match whole words in the queries. For example, to find queries with "blah", you'd set a filter for "blah", "* blah", "blah " and " blah *". Then you can use the Subtotal command to do calculations over the visible rows and calculate the data.
More about advanced filters: http://office.microsoft.com/en-us/excel-help/filter-by-using-advanced-criteria-HP005200178.aspx
Got a burning SEO question?
Subscribe to Moz Pro to gain full access to Q&A, answer questions, and ask your own.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why would on-site search queries show up as referral traffic?
The site analytics have been set up for over a year and suddenly last month there was a huge spike in referral traffic (1100+ sessions). Upon further investigation, the majority of it was coming directly from internally, either as mysite.com or search.mysite.com and the landing pages from the referrals are all /search.html?query=* This was never an issue before so I'm trying to understand what could have changed. I'm following up with the client to find out if their dev team may have changed anything related to their search engine but I'm wondering if there may be another explanation. A few notes: previously mysite.com / search.mysite.com were not in the Referral Exclusion list. I've added them now but this was never an issue before. Thanks in advance!
Reporting & Analytics | | SEMnMs0 -
Google Analytics SEO Queries Not Showing
Hi All, This might be a silly question, but for all the properties I monitor in Google Analytics, I'm now showing no data for SEO Queries under Acquisition for the past 6 days. Normally I would expect a few day delay in queries, but nothing for 6 days is somewhat peculiar especially as it was functioning fine prior to November 12th. Does anyone have insight into what might be going on? Thanks! URaNMa3
Reporting & Analytics | | amichaels0 -
Google Web Master Tools show that my site has been crawled, but search results show old title tags, ect.
Does the index report reflect what will be displayed on Google results? Google seems to be indexing my site every Sunday...
Reporting & Analytics | | 928shopper0 -
Cross-Domain Tracking Urgent Query :-(
Hi Mozzers! One of my clients is having an issue with cross-domain tracking, in other words their own domain is seen as a huge referrer. When you land on their site, which is www.sunway.ie, and then choose a holiday to book it then takes you to another domain which is www.sunwayholidays.ie, during the booking process. I'm just wondering if there is a Google Analytics genius out there who may be able to take a quick look and let me know if there is any obvious solution to this within the Google Analytics code? Thanks in advance everyone! Gavin
Reporting & Analytics | | strategemilabs0 -
Has anyone seen their GWT Search Queries data resolve?
My GWT data seems to be have resolved through 9/28, but still looking low for the week starting 9/29 through present.
Reporting & Analytics | | EmpireToday0 -
What is the best way to track mobile sites in Google Analytics?
Hello! I am wondering what the pros and cons of using the regular Google Analytics tracking code on a mobile site versus the tracking documentation from Google specifically on it found at http://code.google.com/mobile/analytics/docs/web/ which is still in labs mode. Does the mobile specific tracking have the same features as the regular one to be able to track events and report the same statistics? Thanks for the help on this one!
Reporting & Analytics | | CabbageTree0 -
Google Analytics - April Search Report Data Gone For All Clients - Anyone Else Seeing This?
Hey everyone, In the last hour I was doing some SEO referral reportingin Google Analytics and discovered that my April data was completely gone (flat lined to zero). Specifically, if i select a date range for any point in april and select search from the advanced segments area I am seeing this issue. This is happening for ALL of my active clients... anyone else seeing this?
Reporting & Analytics | | dpeddle0 -
How to measure number of visits from Google News coming from Google Universal Search (NOT referral coming directly coming from news.google.com) with google analyitcs
I'm running a news site, and I have a problem of accuratly measuring which traffic is REALLY coming from google news. I analyzed a lot of individual articles and I come to the conclusion, that the visits, that come from the google news section in the universal search results are counted as "normal" search engine traffic in google analytics. So if you do a Google search for a topic that includes links from Google news, you don't get an accurate referral count. As an example, if you do a search for "eBay", incorporated into the page 1 search results you may also see Google news results as well.
Reporting & Analytics | | Mulle
If someone clicks on that Google news link that appears in Google search, it shows up in Google analytics as a referral from Google search, when it was actually from a Google news referral. I was already checking google analytics and google news help forums and searched SEO blogs for this. But I wasn't able to find a working solution. Can anybody help me out with this problem? Thanks so much, Matthias0