Finnish Topic Modelling

Previously I wrote about a few experiments I ran with topic-modelling. I briefly glossed over having some results for a set of Finnish text as an example of a smaller dataset. This is a bit deeper look into that.. I use two datasets, the Finnish wikipedia dump, and the city of Oulu board minutes. Same […]

Read More Finnish Topic Modelling

Word2Vec with some Finnish NLP

To get a better view of the popular Word2Vec algorithm and its applications in different contexts, I ran experiments on Finnish language and Word2vec. Let’s see. I used two datasets. First one is the traditional Wikipedia dump. I got the Wikipedia dump for the Finnish version from October 20th. Because I ran the first experiments […]

Read More Word2Vec with some Finnish NLP

Finnish POS tagging part 2

Previously I wrote about Building a Finnish POS tagger. This post is to elaborate a bit on training with OpenNLP, which I skimmed last time, put the code for it out, and do some additional tests on it. I am again using the Finnish Treebank to get 4.4M pre-tagged sentences to train on. Start with […]

Read More Finnish POS tagging part 2

Topicmodels, topicmodels, …

I have previously done some topic modelling using LDA (Latent Dirilech Allocation). Back then I used a nice video from some nice guy but somehow could not find the video with search engines anymore. Too bad. Implemented LDA in Java back then based on that tutorial. I learned how it works, not why it works. […]

Read More Topicmodels, topicmodels, …