Thus far, we have presented a rich variety of constructs for defining probability distributions, by describing a computational process that generates samples from the distribution. In order to ask interesting queries about the models, it is useful to be able to describe observations that one has about the result of the generative process. Such observations can be used to condition the probability distribution. The answers to a query will then be the conditional probability distribution over the query outcome, given that the observations hold.
Observations in IBAL normally take the form of declarations.
Given a sequence of declarations, one can introduce an observation in
the form
observe <condition> or the shorthand obs <condition>.
Every condition mentions a variable, which could be a simple variable
name like x, or a chain like x.a.b.
The condition could simply name the variable, in which case the
observation asserts that the variable is true. It could have the form
~v, in which case it asserts that v is false. Most
generally, it has the form v=p, where p is any pattern,
and asserts that the value of v matches the pattern p.
To illustrate, and to review some of the concepts from previous
sections, scfg.ibl provides an example of a stochastic context
free grammar (SCFG).
It begins with general-purpose code for SCFGs
(the code assumes grammars in Chomsky normal form).
data Tree = Leaf (Symbol, Symbol) | Branch (Symbol, Tree, Tree) terminal (x,y) = < string : y, parse : Leaf (x,y) > production (x,y,z) = let < string : s1, parse : p1 > = y() in let < string : s2, parse : p2 > = z() in < string : s1 ^ s2, parse : Branch (x, p1, p2) >
The code begins by defining the type of parse trees:
In our construction, a tree is always labeled by a non-terminal symbol
of type Symbol. There are two cases. In the leaf case, the
non-terminal is simply associated with a terminal symbol (the second
argument to the Leaf constructor). In the branching case, the
non-terminal is associated with two subtrees.
Generating a sentence simultaneously produces two things: a string,
and a parse tree of that string. There are two basic functions for
generating a sentence.
terminal(x,y) implements the rule where a non-terminal is
transformed into a terminal.
It takes a non-terminal symbol x, and a terminal symbol
y. The associated string is simply y, while the parse
tree is the leaf associating x with y.
The second function, production(x,y,z) corresponds to a rule
where a non-terminal is transformed into two new non-terminals. The
argument x is a non-terminal symbol as before. However, the
arguments y and z are functions, making
production a higher-order function. We will define a function
for each non-terminal in a grammar, defining the generation of
sentences from that non-terminal.
The production function takes the two functions y and
z and applies them, getting two substrings and parses for each
of them. It then concatenates the two substrings to obtain the full
string (^ is the string concatentation operator), and creates a
new parse tree with the two generated trees as subtrees.
The file scfg.ibl continues with the following code:
// S -> X Y (1) // X -> a (0.4) // X -> A B (0.6) // Y -> c (0.2) // Y -> B C (0.8) // A -> a (1) // B -> b (1) // C -> c (1)
This is a comment. In our case, the comment describes a particular SCFG. Note that the grammar is ambiguous: the string ``abc'' has two different parses. The next six lines of code implement the grammar.
a() = terminal('A,'a)
b() = terminal('B,'b)
c() = terminal('C,'c)
x() = dist [ 0.4 : terminal('X,'a), 0.6 : production('X,a,b) ]
y() = dist [ 0.2 : terminal('Y,'c), 0.8 : production('Y,b,c) ]
s() = production('S,x,y)
Finally, we define a sentence to be generated by the grammar, and then observe the string to be ``abc''.
sentence = s() obs sentence.string = "abc"
Now, we can ask the query sentence.parse, to obtain a
probability distribution over parse trees associated with the observed
sentence. Unfortunately, when we do that we see the result is rather
hard to read. There are so many variables in the result it does not
fit on the screen. One way to get around this is to call IBAL with
the -u option (or set the boring_fields variable in
global.ml to false). This tells IBAL not to display any
``boring'' variables that can only have one value.
When we do this we see a comprehensible result. There are
only two key variables, corresponding to the tags of the left and
right children of the root.
Just as a variable can be defined at the IBAL prompt, an observation
can also be made at the prompt. This has the effect of conditioning
variables previously defined. Evidence entered at the IBAL prompt is
cumulative. Evidence can also be retracted:
retract x retracts the last item of evidence entered about
x. This feature allows a series of ``what if'' queries to be
asked.
Observations can be used to describe constraints about the outcome of
a model. More generally, they can be used to describe Markov
networks. The following example (in map1.ibl) describes a
``noisy'' map-coloring problem.
left = dist [ 1.0 : 'red, 1.0 : 'blue, 1.0 : 'yellow ] middle = dist [ 1.0 : 'red, 1.0 : 'blue, 1.0 : 'yellow ] right = dist [ 1.0 : 'red, 1.0 : 'blue, 1.0 : 'yellow ] test1 = (left == middle) & flip 0.9 test2 = (middle == right) & flip 0.9 obs ~test1 obs ~test2
There are three locations:
left, middle and right. Under the generative
part of the model, the colors of the locations are uniformly
distributed. (Note that the probabilities in dist expressions
do not need to sum to 1, but are normalized.)
Two constraints are then defined by their conditions of violation.
The constraints are noisy, so that even if their conditions of violation
are satisfied, there is a small probability that the constraint is not
violated.
The first constraint says that the left and middle colors should be
different, and the second says that the middle and right should be
different.
We then observe that neither constraint is violated.
When we query <left,middle,right>, we see that there are three
levels of combination. Those in which the middle is different from
the two side colors are ten times more likely than those in which the
middle is the same as one of the sides, which are in turn ten times
more likely than those in which all three colors are the same.
The probability returned by IBAL for each combination of values is the
joint probability of the combination and the observations
holding. We can simply find out the probability of the observations
holding by querying true, and find that it is 0.49.
In general, it is an error to make observations that have zero probability of being true.