Latent variable models in Java.
Latent variable models (LVMs) are well-established statistical models where some of the variables are not observed. lvm4j
implements popular LVMs in the Java
programming language. For the sake of simplicity I refer to every model as latent if it consists of two disjoint sets of variables, one that is observed and one that is hidden (e.g. we don't have data or they are just not observable at all).
With new versions I will try to cover more latent variable models in lvm4j
.
One of the most famous and magnificient of them all, the Hidden Markov Model, is applicable to a diverse number of fields (e.g. for secondary structure prediction or alignment of viral RNA to a reference genome).
Principal Component Analysis is a simple (probably the simplest) method for dimension reduction. Here you try to find a linear orthogonal transformation onto a new feature space where every basis vector has maximal variance. It's open to debate if this is a true latent variale model.
You can either install the package by hand if you do not want to use maven (why would you?) or just use the standard waytogo installation using a maven project (and pom.xml).
If you use Maven just put this into your pom.xml:
<dependency>
<groupId>net.digital-alexandria</groupId>
<artifactId>lvm4j</artifactId>
<version>0.1</version>
</dependency>
You can also build the jar
and then include it in your package.
-
Clone the github repository:
$ git clone https://github.com/dirmeier/lvm4j.git
-
Then build the package:
$ mvn clean package -P standalone
-
This gives you a
lvm4j-standalone.jar
that can be added to your project (make sure to call this correctly).
Here, we briefly describe how the lvm4j
libary is used. Also make sure to check out the javadocs
.
So far the following latent variable models are implemented:
- HMM (a discrete-state-discrete-observation latent variable model)
- PCA (a dimension reduction method with latent loadings and observable scores)
Using an HMM (in v0.1) involves two steps: training of emission and transition probabilities and prediction of the latent state sequence.
First initialize an HMM using:
char[] states = new char[]{'A', 'B', 'C'};
char[] observations = new char[]{'X', 'Y', 'Z'};
HMM hmm = HMMFactory.instance().hmm(states, observations, 1);
It is easier though to take the constructor that takes a single string only that contains the path to an XML-file.
String xmlFile = "/src/test/resources/hmm.xml";
HMM hmm = HMMFactory.instance().hmm(xmlFile);
Having the HMM initialized, training is done like this:
Map<String, String> states = new HashMap<>(){{
put("s1", "ABCABC");
put("s2", "ABCCCC");
}};
Map<String, String> observations = new HashMap<>(){{
put("s1", "XYZYXZ");
put("s2", "XYZYXZ");
}};
hmm.train(states, observations);
Take care that states
and observations
have the same keys and equally long values. You can write your trained HMM to a file using:
String outFile = "hmm.trained.xml";
hmm.writeHMM(outFile);
That is it!
First initialize the HMM again:
String xmlFile = "/src/test/resources/hmm.trained.xml";
HMM hmm = HMMFactory.instance().hmm(xmlFile)
Make sure to use the hmm.trained.xml
file containing your trained HMM. Then make a prediction using:
Map<String, String> observations = new HashMap<>(){{
put("s1", "XYZYXZ");
put("s2", "XYZYXZ");
}};
Map<String, String> pred = hmm.predict(states, observations);
Congrats! That concludes the tutorial on HMMs.
TODO
- Simon Dirmeier [email protected]