next up previous
Next: Learning Up: IBAL Tutorial Previous: Higher-order Functions Revisited

Observations

Thus far, we have presented a rich variety of constructs for defining probability distributions, by describing a computational process that generates samples from the distribution. In order to ask interesting queries about the models, it is useful to be able to describe observations that one has about the result of the generative process. Such observations can be used to condition the probability distribution. The answers to a query will then be the conditional probability distribution over the query outcome, given that the observations hold.

Observations in IBAL normally take the form of declarations. Given a sequence of declarations, one can introduce an observation in the form observe <condition> or the shorthand obs <condition>. Every condition mentions a variable, which could be a simple variable name like x, or a chain like x.a.b. The condition could simply name the variable, in which case the observation asserts that the variable is true. It could have the form ~v, in which case it asserts that v is false. Most generally, it has the form v=p, where p is any pattern, and asserts that the value of v matches the pattern p.

To illustrate, and to review some of the concepts from previous sections, scfg.ibl provides an example of a stochastic context free grammar (SCFG). It begins with general-purpose code for SCFGs (the code assumes grammars in Chomsky normal form).

data Tree = Leaf (Symbol, Symbol) | Branch (Symbol, Tree, Tree)

terminal (x,y) = < string : y, parse : Leaf (x,y) >

production (x,y,z) = 
  let < string : s1, parse : p1 > = y() in
  let < string : s2, parse : p2 > = z() in
  < string : s1 ^ s2, parse : Branch (x, p1, p2) >

The code begins by defining the type of parse trees: In our construction, a tree is always labeled by a non-terminal symbol of type Symbol. There are two cases. In the leaf case, the non-terminal is simply associated with a terminal symbol (the second argument to the Leaf constructor). In the branching case, the non-terminal is associated with two subtrees.

Generating a sentence simultaneously produces two things: a string, and a parse tree of that string. There are two basic functions for generating a sentence. terminal(x,y) implements the rule where a non-terminal is transformed into a terminal. It takes a non-terminal symbol x, and a terminal symbol y. The associated string is simply y, while the parse tree is the leaf associating x with y.

The second function, production(x,y,z) corresponds to a rule where a non-terminal is transformed into two new non-terminals. The argument x is a non-terminal symbol as before. However, the arguments y and z are functions, making production a higher-order function. We will define a function for each non-terminal in a grammar, defining the generation of sentences from that non-terminal. The production function takes the two functions y and z and applies them, getting two substrings and parses for each of them. It then concatenates the two substrings to obtain the full string (^ is the string concatentation operator), and creates a new parse tree with the two generated trees as subtrees. The file scfg.ibl continues with the following code:

// S -> X Y (1)
// X -> a (0.4)
// X -> A B (0.6)
// Y -> c (0.2)
// Y -> B C (0.8)
// A -> a (1)
// B -> b (1)
// C -> c (1)

This is a comment. In our case, the comment describes a particular SCFG. Note that the grammar is ambiguous: the string ``abc'' has two different parses. The next six lines of code implement the grammar.

a() = terminal('A,'a)
b() = terminal('B,'b)
c() = terminal('C,'c)
x() = dist [ 0.4 : terminal('X,'a), 0.6 : production('X,a,b) ]
y() = dist [ 0.2 : terminal('Y,'c), 0.8 : production('Y,b,c) ]
s() = production('S,x,y)

Finally, we define a sentence to be generated by the grammar, and then observe the string to be ``abc''.

sentence = s()

obs sentence.string = "abc"

Now, we can ask the query sentence.parse, to obtain a probability distribution over parse trees associated with the observed sentence. Unfortunately, when we do that we see the result is rather hard to read. There are so many variables in the result it does not fit on the screen. One way to get around this is to call IBAL with the -u option (or set the boring_fields variable in global.ml to false). This tells IBAL not to display any ``boring'' variables that can only have one value. When we do this we see a comprehensible result. There are only two key variables, corresponding to the tags of the left and right children of the root.

Just as a variable can be defined at the IBAL prompt, an observation can also be made at the prompt. This has the effect of conditioning variables previously defined. Evidence entered at the IBAL prompt is cumulative. Evidence can also be retracted: retract x retracts the last item of evidence entered about x. This feature allows a series of ``what if'' queries to be asked.

Observations can be used to describe constraints about the outcome of a model. More generally, they can be used to describe Markov networks. The following example (in map1.ibl) describes a ``noisy'' map-coloring problem.

left = dist [ 1.0 : 'red, 1.0 : 'blue, 1.0 : 'yellow ]
middle = dist [ 1.0 : 'red, 1.0 : 'blue, 1.0 : 'yellow ]
right = dist [ 1.0 : 'red, 1.0 : 'blue, 1.0 : 'yellow ]

test1 = (left == middle) & flip 0.9
test2 = (middle == right) & flip 0.9

obs ~test1
obs ~test2

There are three locations: left, middle and right. Under the generative part of the model, the colors of the locations are uniformly distributed. (Note that the probabilities in dist expressions do not need to sum to 1, but are normalized.) Two constraints are then defined by their conditions of violation. The constraints are noisy, so that even if their conditions of violation are satisfied, there is a small probability that the constraint is not violated. The first constraint says that the left and middle colors should be different, and the second says that the middle and right should be different. We then observe that neither constraint is violated.

When we query <left,middle,right>, we see that there are three levels of combination. Those in which the middle is different from the two side colors are ten times more likely than those in which the middle is the same as one of the sides, which are in turn ten times more likely than those in which all three colors are the same. The probability returned by IBAL for each combination of values is the joint probability of the combination and the observations holding. We can simply find out the probability of the observations holding by querying true, and find that it is 0.49.

In general, it is an error to make observations that have zero probability of being true.


next up previous
Next: Learning Up: IBAL Tutorial Previous: Higher-order Functions Revisited
Avi Pfeffer 2006-11-19