Scaling up: adding computational "effects"
At this point, it becomes natural to start introducing more realistic
features into the language including (1) recursive functions, (2)
recursive types, and (3) references/state. Let's start with recursive
functions in a call-by-value setting, because we've seen most of what
needs to happen there in the simpler setting of IMP.
The usual approach is to simply add a term:
exprs e ::= ... | fix f(x:t1):t2.e
to the language with typing rule:
G,f:t1->t2 |- \x:t1.e : t1->t2
------------------------------
G |- fix f(x:t1):t2.e : t1->t2
Note that to keep type-checking syntax-directed, we need to know the
return type of the function (consider fix f(x).f(x)).
As far as evaluation is concerned, I've seen two basic approaches.
The first says that fix-terms are not values, but rather reduce to
lambdas:
fix f(x:t1):t2.e --> \x:t1.(e[fix f(x:t1):t2.e / f])
The nice things about this approach is that you don't have to re-work
the rest of your semantics (i.e., beta is the way all functions get
applied.) An alternative is to treat fix-terms as values themselves
and introduce a new reduction rule:
fix f(x:t1):t2.e v --> e[fix f(x:t1):t2.e / f][v/x]
It might then make sense to treat lambda as a derived form where the
function name f does not occur free with the body of the function.
Both approaches are reasonable and I don't see a strong case for one
over the other, so I'll go with the former for now. (However, some
distinctions become clearer when you start to think about
mutually-recursive functions. Good exercise!)
Of course, the addition of fix breaks the termination properties
of the language, but it's instructive to see how the proof actually
breaks. Recall that to prove termination for the simply-typed lambda
calculus, we constructed a unary logical relation C[t] representing
those expressions that terminate (with a value of type t) and then
proved that every well-typed expression was in C[t] and thus every
well-typed expression terminates. We defined:
C[t] = { e | e ->* v }
V[int] = { i }
V[t1->t2] = { v | All v' in V[t1].v v' in C[t2] }
V[G] = { g | All x in Dom(G).g(x) in V[G(x)] }
Note that these sets still make sense in our language with fix, so now
we're going to try to prove that if G |- e : t, then for all g in
V[G], g(e) in C[t]. Note that the proof goes through for all of the
typing rules we had before, so we really just have to consider the new
typing rule for fix. So suppose our typing derivation ends with:
G,f:t1->t2 |- \x:t1.e : t1->t2
------------------------------
G |- fix f(x:t1):t2.e : t1->t2
Pick a g in V[G]. We must show g(fix f(x:t1):t2.e) in C[t1->t2].
We know:
g(fix f(x:t1):t2.e) = fix f(x:t1):t2.g(e) (by alpha-variance)
--> \x:t1.(g(e))[fix f(x:t1):t2.e/f] (by evaluation)
= \x:t1.g[fix f(x:t1):t2.e/f]e (by defn. of subst.)
So it suffices to show \x:t1.g[fix f(x:t1):t2.e/f]e in V[t1->t2].
Now by induction, we know that for all g' in V[G,f:t1->t2], that
g'(\x:t1.e) in C[t1->t2] (and in fact V[t1->t2] since g'(\x:t1.e) =
\x:t1.g'(e) and is a value already.)
At first blush, it appears that picking g' to be
g[fix f(x:t1):t2.e/f] and invoking the induction hypothesis would
do the trick. But alas, we need to show that
g[fix f(x:t1):t2.e/f] in V[G,f:t1->t2]. Now it's clear from
assumptions regarding g that for all y != f, g[...](y) in V[(G,f:t1->t2)(y)]
But we also need to establish that fix f(x:t1):t2.e in
V[t1->t2] in order to show that the induction hypothesis truly
applies. But then we see that this is exactly what we started off
trying to prove!
So the point is that when constructing logical relations-style
arguments, it's often the case that you can write down well-founded
relations, but that you can't prove the fundamental theorem relating
the proof-system to the relations. In this case, the proof broke down
in a very subtle way, and it's all to easy to miss this problem.
Of course in this case, it's clear that we shouldn't be able to
prove that every well-typed term terminates because:
(fix f(x:int):int.f(x)) 0
certainly diverges. But in other cases, it's not so easy to see
that you've got a problem.
Moral: be very careful when involing your induction hypothesis...
Suppose we weaken our definition of C[t], for instance by writing:
C[t] = { e | e diverges or else e ->* v and v in V[t] }
Intuitively, this definition says that a well-typed expression is
one that either diverges or else evaluates to a value of the right
type (i.e., the language is type-safe). Now we know this is true,
but as the above argument shows, we cannot easily construct a proof
of this using the technique above.
So there are 3 basic ways that people handle this sort of problem:
1) Do a syntactic proof of type soundness: That is, prove that
(a) [progress] if |- e : t and e -> e', then |- e : t
(b) [preservation] if |- e : t then either e = v or else e -> e'
(c) [canonical forms] if |- v : int then v = i for some i
and if |- v : t1->t2 then v = \x:t1.e for some \x:t1.e.
Note that big-step is no help here since we cannot distinguish
between an expression that diverges and one that gets stuck. But
a syntactic proof that uses the small-step semantics ensures that
either a program diverges or it evaluates to a value of the right
type. This is relatively easy to do and scales to other features
(e.g., refs, concurrency, etc.) which is why you see it a lot.
On the other hand, it doesn't help when you want to prove something
more interesting than type-soundness (e.g., some extensional notion
of equality.)
2) Construct a denotational semantics and prove the property in the
model. The primary advantage, especially for recursive functions, is
that the model will give us a way to address recursive functions in an
inductive fashion.
3) Use a step-indexed relational model. This is a relatively new
technique that has only popped up in the last couple of years, but it
combines aspects of the previous two approaches.
I'm assuming that you folks can do 1) so we'll concentrate on 2) and
3). They have the advantage that they extend to other kinds of properties
besides type-soundness. They have the unfortunate property that they
don't scale as well to other language features (e.g., concurrency.)
---------------------------------------------------------------------
Denotational Semantics for Recursive Functions:
The basic idea here is similar to what we did for IMP. We'll model
lambda-calculus functions as set-theoretic functions and we'll use a
fixed-point calculation to construct the meaning of recursive
functions (just as we did for while-loops.) To make the development a
little more general, it's helpful to introduce a few abstractions,
notably CPO's and continuous functions.
A *partial order* (PO) is a set S equipped with an ordering <= that is
reflexive, transitive, and anti-symmetric.
When we have a subset S of a PO and an element x of that PO such that
for all y in S, y <= x, then we say x is an *upper bound* of S. If
for all upper bounds z of S, x <= z then we say x is the *least upper
bound* (lub) of S. [The dual holds for lower bounds and greatest-lower
bounds (glb).]
A function f from a PO C to another PO D is *monotone* if for all x <= y,
f(x) <= f(y).
A *chain* in a PO is a (possibly infinite) sequence of elements that
is totally ordered (e.g., x1 <= x2 <= ... ).
A function f from C to D is *continuous* if it is monotone and
preserves limits. That is, given a chain S in C, f(lub(S)) =
lub(map(f,S)). We write [C -> D] to denote the space of continuous
functions from C to D.
A *complete partial order* (CPO) is a PO where every chain has a least
upper-bound in the PO.
A *pointed* CPO is a CPO with a least element (usually denoted _|_
and pronounced "bottom".)
Whew! A few intuitions behind how we're going to use the definitions:
The partial order business is going to correspond to our notion of
information theoretic approximation. Remember when we built up the
meaning of the while-loop as the limit of a sequence of approximations?
That's what we're going to do here, except that we have to deal with
more domains than stores (functions from locations to integers) and
the key thing is that our fixed-point construction works on any pointed
CPO. Fortunately, the space [C -> D] forms a pointed CPO when D is a
pointed CPO.
For base types, we're going to use the natural set coupled with a
discrete ordering (i.e., x <= y iff x == y). For instance, our
interpretation V[int] will be the set of all integers equipped with
the discrete order. Note that using the usual order on integers
does not result in a CPO since the limit of the chain {1,2,3,...}
is not in the set of integers. However, for any set, the discrete
order makes a CPO since the only chains consist of all the same
elements.
For product types, we're going to just define V[t1 * t2] to be
the cartesian product of V[t1] and V[t2]. For the ordering, we'll
define:
(x1,y1) <= (x2,y2) in V[t1*t2] iff x1 <= x2 in V[t1] and
y1 <= y2 in V[t2].
For computations at type t, we're going to use the value space of
t lifted so as to include an extra bottom element. This will ensure
that the computation space is always a pointed CPO. In particular,
C[int] will be V[int] + {_|_} with the ordering:
_|_ <= i
i <= i
Again, the only chains we have are of finite height, so it's clear
that this forms a CPO.
For functions of type t1 -> t2, we're going to define V[t1->t2] to
be the space of all continuous functions from V[t1] to computations
of C[t2]. That is, V[t1 -> t2] = [V[t1] -> C[t2]]. The ordering
that we'll use on this function space is inherited from the domain
and co-domain as follows:
f <= g in V[t1 -> t2] iff for all x <= y in V[t1],
f(x) <= g(y) in C[t2].
Note that for the space C -> D, where D is pointed, there is a
least element: the function f that maps each element of C to
D's bottom.
In summary, we have:
C[t] = V[t] + {_|_}
V[int] = Z
V[unit] = {()}
V[t1*t2] = V[t1] * V[t2]
V[t1 -> t2] = [V[t1] -> C[t2]]
For type environments, let us define:
V[.] = V[unit]
V[G,x:t1] = V[G] * V[t1]
Why did we restrict the interpretation of functon types to be
continuous functions? The answer is that we want to define
the meaning of fix as the limit of some chain of approximations.
But that limit isn't guaranteed to exist unless we restrict our
attention to continuous functions.
Now we need to define our semantic interpretation for expressions:
E[G |- i : int] = fn r:V[G] => i
E[G,x:t |- x : t] = fn r:V[G] => pi_2 r
E[G,y:t' |- x : t] = fn r:V[G,y:t'] => (E[G |- x:t](pi_1 r)
E[G |- e1 e2 : t] = fn r:V[G] => app(E[G |- e1]r,E[G |- e2]r)
E[G |- \x:t1.e : t1->t2] = fn r:V[G] => fn v:V[t1] => E[G,x:t1 |- e:t2](r,v)
E[G |- fix f(x:t1):t2.e : t1->t2] = fn r:V[G] => lub((F r)^i(_|_))
where F =
fn r:V[G] =>
fn f:[V[t1]->C[t2]] => E[G,f:t1->t2 |- \x:t1.e:t1->t2](r,f)
I'm using (fn x:D => ...) and app(...,...) as meta-level notation
for constructing set-theoretic functions here. Notice that in general,
we have E[G |- e : t] yielding a meta function of type V[G] -> C[t].
In fact, I claim it yields a continuous function [V[G] -> C[t]] and
in fact, we really need to prove that whenever we can show G |- e : t,
then E[G |- e : t] yields a continuous function [V[G] -> C[t]].
To show this, we would need to prove a few theorems about the monotonicity
and continuity of the pairing, projection, and application operations
used at the meta-level. (Good exercise, or see Winskell for details.)
In addition, we need to show that the definition of fix makes sense.
Fortunately, the fact that the functional (F r) is continuous saves us
here, as does a general fixed-point theorem:
If D is a pointed CPO and F : [D -> D], then lub(F^i(_|_)) =
F(lub(F^i(_|_))).
As a corrollary, we have that if |- e : t, then E[|-e:t]() in C[t].
Of course, we still need to prove the adequacy of this model:
If e -> v then E[e] = E[v] and if e diverges, then E[e] = _|_.
The key is that we can appeal to the inductive definition of fix
in the model.
---------------------------------------------------------------------
Next time: step-indexed model...