Elasticsearch and Kibana (or add Logstash and call it the ELK stack)
seems to be getting quite popular. So I figured to try it out and see what it is about.
A common use case seems to be monitoring syslog entries by using Logstash to push the log data into Elasticsearch (ES), ES to index and analyze it, and Kibana to visualize the results. I wanted to try something a bit different, and use it to analyze performance data for a server instance. I had previously done some Minecraft server setups for other purposes so I used that as a test target.
The goal here is to see how different player actions and usage profiles impact the server resource consumption.
I did this at home so the setup was pretty simple. This figure illustrates what I had running:
Briefly summarizing, this contains the following components:
- Mac/OSX. The physical compewter hosting all this.
- Ubuntu VM. A VirtualBox instance hosting Ubuntu Desktop 14.04LTS in a virtual machine.
- Spigot: A Minecraft server version capable of hosting the required plugin (Prism). For Minecraft 1.7.10 since this is what Prism supports and I (thought I) needed Prism for my experiments.
- Prism: A Minecraft server plugin that tracks a number of events happening inside the server.
- Resource probes: A set of probes collection information on resource use inside the Ubuntu VM.
- MariaDB: The database where Prism dumps its event logs. Prism supports MySQL and MariaDB is compatible so I use that.
- ES importer: Since I want to analyze the data through ElasticSearch, I need to get the Prism data there. This component dumps the data of choice from the MariaDB to ES.
- ElasticSearch: ElasticSearch v1.4.4.
- Kibana: Kibana webserver v4.0.0.
- Browser: Any regular browser to set up and view the visualizations.
Goal is to isolate as much as possible the monitored server from the rest of the stuff, so I run it inside the VM along with only the components needed to collect monitoring data.
ES integration for the data collection
Everything else here is your regular open source components but the resource probes and and the ES importer components I wrote myself to get the data of interest into ES. The resource probes component uses a Python library (psutil) to capture resource usage data and directly dumps it into ES. The ES Python API/library is pretty simple to use so no problem there. The resource probes could probably use some more configuration as it currently dumps a lot of information that is not used in this experiment. But I managed to run my experiment so..
The ES importer component is something I also set up myself after spending a few evenings wondering why can’t I manage to configure the ES river/feeder component to do the same. Finally figured I would spend less time writing my own rather than trying to debug something that seemed to lack in documentation. At least then I also know how to use it since I made it :).
To make a pointless rant, the ES Java API/library is another instance of the typical Java Maven approaches of “insert this dependency into your Maven pom”… And the pom for the ES API itself is what, hundreds of lines to go through trying to figure out all the transitive dependencies and configurations. As I just wanted single a jar file I could drop in and run the importer by itself, forget that.
So the ES importer component is actually just two small classes packaged as a single jar file, with zero dependencies outside the standard JRE. Except to require your JDBC driver. Well, that is how I like it, I am sure most of the Java community disagrees and loves the Maven way but I digress..
What the ES importer now does is to use the ES REST API to do bulk inserts into ES. Initially it reads everything it is configured to read from the DB and dumps that into ES. It then polls the DB for new data at given intervals and sends any new items to ES when found.
To visualize our data in Kibana, we need to be able to associate the different types of data with each other. Since Kibana uses the Elasticsearch query features to access the data, we need to be able to figure out how to get the data in there in a suitable format to visualize it nicely. Timestamps would be nice.
To have Kibana visualize our data as time-series based, the data needs to have a field of type “date”. That is, we must configure ES to recognize our “timestamp” as this type. Internally, ES stores timestamps as Epoch time. That is, milliseconds since 1.1.1970 00:00. There are two problems related to this.
First is, if we just dump the data into ES, ES will automatically create the mapping (which is pretty much same as a schema for a traditional database), where it thinks the timestamp is a long integer value and not a “date” value. So we have to explicitly tell it to set the field type to date. The schema definition to do this comes with the resource probes component, so we can just use that. A command such as “curl -XPOST ‘http://localhost:9200/session1’ –data-binary @schema.json” should do it.
And the second problem is, ES stores the timestamps internally as milliseconds. While Prism also timestamps the events using Epoch time, it does so at one second precision. So there is a difference in the factor of 1000 in the values, and ES will not properly recognize the the timestamp value if given in seconds. To get this right, we have to tell the ES importer component to use a specific SQL query to do the conversion before dumping the data to ES. The SQL I used is:
SELECT prism_data.id, prism_data.epoch*1000 as time, prism_data.action_id, prism_data.player_id, prism_data.x, prism_data.y, prism_data.z, prism_data.block_id, prism_players.player, prism_actions.action FROM prism_data INNER JOIN prism_players on prism_data.player_id = prism_players.player_id INNER JOIN prism_actions on prism_data.action_id = prism_actions.action_id WHERE prism_data.id > ?;
Since the ES importer picks the field names from the query results, this will result in storing a field named “time” with the epoch value multiplied by 1000 to turn seconds into milliseconds as expected by Elasticsearch. The JOIN statements are simply used to give the visualizations more readable names. The WHERE statement is used to provide a way for the importer to query only the latest values that have been added since the previous run of the importer. You could use the timestamp here as well as it is a long integer. Since prism adds an autoincrementing ID value, I just use that.
And then off we go to build some visualizations using Kibana (in the next post).