next up previous
Next: Defining Variables Up: IBAL Tutorial Previous: Basic Concepts

Types and Data Structures

Every expression in IBAL has a type. The type of the expression determines the set of legal outcomes the expression can have. The type of an expression is distinct from its support. The support is the set of all possible values the expression can take in some execution of the experiment, while the type is the set of all structurally legal values of the expression, which is a superset of the support.

So far we have seen three types, Symbol, Bool and Int. The type Symbol consists of all symbolic constants such as hello world, while Bool consists of the elements true and false, and Int consists, obviously, of the integers.

You can check the type of any expression using the :t command at the IBAL prompt. For example, try

IBAL> :t dist [ 0.3 : 'hello, 0.7 : 'world ]

The type of an expression determines how it can be used in other expressions. IBAL automatically infers the type of every expression, using Hindley-Milner type inference, in a manner similar to ML. For each form of an expression, there are rules that stipulate how the types of the subexpressions must relate to each other, and how the type of the whole expression is determined from the types of the subexpressions. For example, in a dist expression, the subexpressions must all have the same type (or, more precisely, there must be a type that generalizes all of them). The type of the dist expression is the type of the subexpressions. Try entering the following:

dist [ 0.3 : 'hello, 0.7 : 1 ]
You get the message
Type error: cannot unify Symbol and Int
Line 1, character 32: Type error
Let's try to decipher this error message. At the bottom, it tells you the location of the code at which the type error occured. This is at the end of the number 1. The message tells you that IBAL tried to unify Symbol and Int, i.e., to find a type that generalizes Symbol and Int. The reason it did that is because the first subexpression of the dist expression has type Symbol, while the second has type Int. Since there is no type that generalizes Symbol and Int, IBAL complained that it cannot unify Symbol and Int and produced an error.

Other useful typing rules you should know about, using the expressions we have discussed so far:

A warning: sometimes the reported place at which a type error occurs is different from what you would think of as the location of the error. This is a standard problem with languages that do automatic type inference. Debugging type errors grows easier with experience.

The types we have seen so far are primitive -- they define the basic entities with which IBAL works. IBAL currently provides only the primitive types Symbol, Bool, and Int, though there are plans to provide a Real type in the future. The primitive types are a subset of the atomic types -- those describing values that cannot be decomposed. Another kind of atomic type is the function type, which describes a function from one set of values to another. (Note on terminology: the use of atomic with regard to function types is because function values cannot be decomposed, even though the function type itself is made up of simpler types.) In contrast to atomic types, compound types describe values that are made up of simpler values. Examples of compound types are tuple types, which provide the basis for creating data structures, and algebraic data types (or ADTs). We will discuss tuple types here, while postponing the discussion of other non-primitive types until later.

A tuple type has the form <n_1 : t_1, ..., n_m : t_m> where n_i an identifier (i.e., a sequence of letters, digits and underscores) that begins with a letter or underscore, and t_i is a type. n_i is the name of the i-th component, and t_i is its type. For example, <a : Bool, b : Symbol>, is a tuple type in which the component named a has type Bool, and the component named b has type Symbol.

An element of the tuple type <n_1 : t_1, ..., n_m : t_m> is a tuple <n_1 : v_1, ..., n_m : v_m>, where each v_i is an element of the corresponding type t_i. For example, <a : true, b : 'hello> is an element of the type <a : Bool, b : Symbol>.

One can create a tuple with a tuple expression, which has the form <n_1 : e_1, ..., n_m : e_m>. Such an expression defines the experiment ``In parallel, run each e_i and assign its outcome to component n_i''. For example, enter

<a : 1==1, b : 'hello>
at the IBAL prompt. The result is:
      *.a      |      *.b      
     [True]    |    [hello]    
---------------|---------------
     [True]    |    [hello]    | 1
This is the first result with more than one column in the table. Recall that the table has a column for each variable in the result. In this case there are two variables. The first variable, *.a, corresponds to the component named a of the outcome, while *.b corresponds to the component named b. Each row of the table below the horizontal line corresponds to a joint assignment of values to the variables. In this case, the table tells us that the assignment of True to *.a and hello to *.b has probability 1.

An aside about the display. IBAL uses a fixed field width for each column, regardless of how much room is actually needed to display the contents. If the contents are too big to fit in the column, they will be truncated. By default, IBAL uses a field width of 15 characters. You can change this default by changing the definition of field_width in global.ml and recompiling IBAL. Alternatively, you can override the default at the command line with the -w flag followed by the desired field width, with no space after the -w.

Tuples can be embedded. Try

:t <a : true, b : <c : 'hello, a : false>>
The result is <a : Bool, b : <c : Symbol, a : Bool>>. In this case, the component named b itself has a tuple type. If we evaluate
<a : true, b : <c : 'hello, a : false>>
we get the result
      *.a      |     *.b.c     |     *.b.a     
     [True]    |    [hello]    |    [False]    
---------------|---------------|---------------
     [True]    |    [hello]    |    [False]    | 1

The general rule is that the result of evaluating an expression of tuple type has a variable for every simple embedded component of the type. A component is simple if its associated variable is atomic, i.e., not a compound type. In our case, the component named a is simple but b is not. However, b itself has two simple components c and a. These components are embedded components of the top-level type, referred to as b.c and b.a. Thus we have the variables *.b.c and *.b.a.

This ``dot notation'' can also be used to access components of a tuple. In general, if e is an expression whose type is a tuple type with component n, then e.n is an expression that runs e and then extracts n from the outcome. For example, try

<a : true, b : <c : 'hello, a : false>>.b
and make sure you understand the variable names in the result.

Note by the way that it is perfectly legal for the same name a to be used in two different components at two different levels. It is an error to use the same name for two different components of the same tuple.

The order in which components appear in a tuple type is not significant. Only the set of names and the type associated with each name matter. Thus <a : Bool, b : Symbol> and <b : Symbol, a : Bool> are the same type. Try

<a : true, b : 'hello> == <b : 'hello, a : true>

The syntax of patterns that can appear in case statements is enhanced to allow for tuples. If each p_i is a pattern that matches values of type t_i, then <n_1 : p_1, ..., n_m : p_m> matches values of type <n_1 : t_1, ..., n_m : t_m>. Such a pattern matches a particular value <n_1 : v_1, ..., n_m : v_m> if each p_i matches the corresponding v_i. Recall that the pattern _ matches any value. It is also considered to match values of any type. Thus the pattern <a : false, b : _> matches any value that is a tuple with components a and b, in which component a has value false.

As with values, the order of the components does not matter in a pattern, only the pattern associated with each name. Try the following:

IBAL> case <a : true, b : 'hello> of \
    > # <a : false, b : _> : false \
    > # <b : 'hello, a : _> : true

There is a shorthand notation for when you don't want to bother with component names. In an anonymous tuple, the elements are identified by their position -- order does matter. An anonymous tuple is created by an expression <e_0, ..., e_n>. This expression is just shorthand for <0 : e_0, ..., n : e_n>. For example, try <true, 'hello>, and observe the names of the variables in the result. There are anonymous tuple patterns that match anonymous tuples. In fact, the most common use of anonymous tuples may be in case statements.

While IBAL automatically computes the type of every expression, one can also provide an explicit type for an expression. This can be useful for documentation, or for making sure an expression has the type you expect. The syntax e : t is used, where e is an expression and t is a type expression. The expression e : t is the same as the expression e, except that it is constrainted to have type t. If IBAL cannot reconcile the type of e with t, an error is signalled.

Some examples of type expressions are the primitive types Symbol, Bool and Int, and the tuple type expression <n_1 : t_1, ..., n_m : t_m> where the t_i are type expressions. There is also an anonymous tuple type expression. We will see other forms of type expression later.


next up previous
Next: Defining Variables Up: IBAL Tutorial Previous: Basic Concepts
Avi Pfeffer 2006-11-19