Recently I was upgrading some stuff on my search app which makes use of Elasticsearch. The good people at ES had been promoting ES 5.0 for a long time as it was in beta, and now it was out of beta so I figured I might as well upgrade that as well. Did not turn out quite so simple. Some pointers from along the way.
There is a large set of breaking changes listed on their website. Only had to mess with a few. But there were a few points not clearly explained there. My experiences:
Few basic notes:
- “string” mapping is now “keyword” or “text”. This is rather straightforward, although might take a few re-indexes.
- “index” property in type mappings. The breaking changes list this as only supporting “true” or “false” as opposed to “analyzed” etc. from before. But the old style of “analyzed” still seems to work (at least no error). Not sure if I should investigate more but it seems to work for me still.
A bit more complicated:
Accessing fields in the Java API. I used to be able to query specific fields with something like client.prepareSearch().addFields(“a”, “b”).. and get the results from a SearchHit object by hit.getFields(). The addFields() methods are not completely gone but there is something called addStoredFields() . Which did not work on my old mappings, just returns null for the fields.
So now I need to mark my fields either as “stored” in the mapping or use source filtering to get the values. I guess in 2.X it was implicitly using source filtering. And if I mark the fields as “stored” then the addStoredFields() methods start to work.
So what is the difference between using stored fields and source filtering? The ES docs seem to discourage setting “stored” to true, but it does not always seem so clear. My understanding is that stored fields require separate reads from disk per field, whereas source filtering loads the whole document source in one go, and filters the fields from that. This can be good or bad, for example, if you have some very large content fields it may cause high overhead to just load some metadata. But if not, using stored fields might add more overhead. So it depends I guess.
I also guess this might be a good change as it makes the rationale for schema design more explicit.
Accessing dates in Java API. Using the old approach of addFields() I could access dates stored as long values of epoch milliseconds with just “long time = fields.get(“doc_date”).value()”. It does not work anymore, as apparently ES uses a different format on disk, and the source filtering just gives me the value as stored. I thought it was how ES stored it on disk (as epoch long). Not sure if it ever was so or just my assumptions. Well, the docs say something in that direction but it is a bit up to interpretation.
So to access the date as epoch long, some conversions are needed now.
Plugin API is largely changed. So if you depend on some custom plugin, you might be out of luck or you have to port the plugin yourself. I ended up porting one by myself. I found it helpful to be able to look at some examples on github. The source tree has several even if that direct link is just the Polish analyzer.
Security manager cannot be disabled. In ES 1.x, it was not used. In 2.x, it was an option to disable it. In 5.x, the ES option to disable it is removed. So if you use a plugin that needs to access JNA or some other lib that is already loaded by ES, you have to do tricks. Well, at least for the security policy you have to either unpack the ES jar file, modify the policy in it, and repack it. Or you have to modify the JRE policy file for the JRE you use to run ES with. If your plugin needs special permissions that is.. Such as loading some specific native library.
That is all I remember for now.. In a few weeks I might not remember even this much, which is why I am writing this down usually 🙂