Goal: Needed to do some simple analytics on a large set of data (plenty of columns and rows) stored in an Excel sheet. But my Excel skills are very limited, so I couldn’t easily figure out how to group data by ranges in different columns, or how to count the number of people who had specific values in specific columns. For example, how many rabbits are in age range 5-10? Or to make it a two column problem, how many rabbits are white and weight more than 10kg? Assuming color, age, and weight would be columns.
My cunning plan. Import the excel sheet into an SQL database and run queries. At least I can do something with that. So to give it a try using MariaDB:
1. Save the Excel sheet as .CSV file. By default the columns are separated by “;” which works fine for me.
2. Check the column names and data types and create a matching DB schema.
3. Start MariaDB with option to allow reading data from file
mysql --local-infile -u root. This allows reading data from a file into the database, as otherwise it is forbidden (some security feature?)
4. Run a command in the DB console along the lines of:
LOAD DATA LOCAL INFILE 'mydata.csv' INTO TABLE mydb.mytable FIELDS TERMINATED BY ';' ENCLOSED BY '' LINES TERMINATED BY '\r\n' (color, age, weight);
5. Run my queries. For example:
SELECT SUM(CASE WHEN age < 5 THEN 1 ELSE 0 END) AS 'Under 5', SUM(CASE WHEN age BETWEEN 6 AND 10 THEN 1 ELSE 0 END) AS '6-10', SUM(CASE WHEN age > 10 THEN 1 ELSE 0 END) AS 'Over 10' FROM mytable;
6. Dump the results into another file to process them again in Excel: add
INTO OUTFILE '/mypath/myfile.csv' to the end of the query above. Note that the path is better set as absolute, otherwise the file seems to go somewhere else than current directory..
Or maybe I could just dump all this into Elasticsearch and Kibana as in my previous posts for Minecraft/Prism, and play with that as well 🙂