We have seen how observations can be used to condition the outputs of
a model.
In fact, IBAL also allows observations to be used to learn the
parameters of a model.
To start this section, let's first look at an example with no
learning, in nolearn.ibl.
This example is based on the students/professors/courses example we
saw earlier.
The example demonstrates how one can describe a data set.
After defining field, professor, course, student and performance
classes,
the code defines a set of instances of each of these classes.
The instances are interrelated. For example, performance perf1
corresponds to student1 taking course1, which was on
field1 and taught by prof1, who by the way specializes
in field1.
After defining the set of instances and their interrelationships, the
example goes on to make some observations, in our case about the
performances of students in different courses.
Such a set of interrelated instances of probabilistic classes, with a
set of observations about them, called a probabilistic
relational model.
One can ask for the probability of the evidence for this model.
Note however that all of the instances have been made private. This
is important, because if they were not private, IBAL would have
attempted to create a complete joint probability distribution over
them, which would have been used. This is because the nolearn
library defines a block, and the value of a block is the joint value
of all the variables that are not private inside the block.
The variables inside the block should be thought of as part of the
definition of the model of the block; in particular, as we will see in
a moment, they can be used as a training set for learning the
parameters of the model. If one wants to define particular instances
and reason about them and query them, one can create them in the IBAL
interpreter. Alternatively, one can create an expression defining
those instances and the query and place them in a file,
e.g. myinstances.ibl. The file can then be evaluated using the
command
ibal -b nolearn myinstances.
This is not an issue for lazy evaluation, by the way. With lazy
evaluation, IBAL will only produce a distribution over those variables
defined in the block that are needed to answer a query. So, if the
query is student1, it will only produce a distribution over
student1.
One can also imagine using the data in the probabilistic relational
model to learn about the probabilistic classes.
To do that, we turn the probabilities in the flip and
dist expressions into parameters.
The file learn1 shows the model.
It begins by declaring two parameters p1 and p2 to be
used in the field class.
The declaration param p1 2 means that p1 is a
two-dimensional parameter, i.e. p1 defines a probability
distribution over two possible values.
To perform a stochastic choice using a parameter, the pdist
expression form is used. For example,
hard = pdist p1 [ false, true ]means that the value of
hard is either false or
true, with the choice determined according to probabilistic
parameter p1.
The rest of the example is speficied similarly.
Now the data in the example can be used both to condition the values
of variables and also to learn the parameters.
No learning is done when you load the file into IBAL,
but when you issue the query student1, IBAL will first learn
the parameter values before answering the query.
In the previous example, IBAL's learning algorithm computes the
maximum likelihood (ML) parameter values, i.e., the values of
parameters that maximize the probability of the data.
It is also possible to specify prior knowledge over the parameter
values.
These take the form of Dirichlet priors, which can most easily be
understood as the imagined number of times a branch associated with
the parameter came out each possible way.
The file learn2 shows learning using prior knowledge.
The declaration param p1 = [ 1.0, 1.0 ] says that our prior for
p1 is as if we have previously seen one example of each branch.
This is not very much prior knowledge.
In contrast, param p7 = [ 10.0, 90.0 ] says that we have seen
100 previous examples, of which 90 took the second branch, so we are
quite confident that the true value of p7 is close to 0.9.
When a parameter has a prior specified, the learning algorithm takes
the prior into account, and computes maximum a-posteriori (MAP)
parameter values, that maximize the product of the prior and the
likelihood of the data.