Scaling up: adding computational "effects" At this point, it becomes natural to start introducing more realistic features into the language including (1) recursive functions, (2) recursive types, and (3) references/state. Let's start with recursive functions in a call-by-value setting, because we've seen most of what needs to happen there in the simpler setting of IMP. The usual approach is to simply add a term: exprs e ::= ... | fix f(x:t1):t2.e to the language with typing rule: G,f:t1->t2 |- \x:t1.e : t1->t2 ------------------------------ G |- fix f(x:t1):t2.e : t1->t2 Note that to keep type-checking syntax-directed, we need to know the return type of the function (consider fix f(x).f(x)). As far as evaluation is concerned, I've seen two basic approaches. The first says that fix-terms are not values, but rather reduce to lambdas: fix f(x:t1):t2.e --> \x:t1.(e[fix f(x:t1):t2.e / f]) The nice things about this approach is that you don't have to re-work the rest of your semantics (i.e., beta is the way all functions get applied.) An alternative is to treat fix-terms as values themselves and introduce a new reduction rule: fix f(x:t1):t2.e v --> e[fix f(x:t1):t2.e / f][v/x] It might then make sense to treat lambda as a derived form where the function name f does not occur free with the body of the function. Both approaches are reasonable and I don't see a strong case for one over the other, so I'll go with the former for now. (However, some distinctions become clearer when you start to think about mutually-recursive functions. Good exercise!) Of course, the addition of fix breaks the termination properties of the language, but it's instructive to see how the proof actually breaks. Recall that to prove termination for the simply-typed lambda calculus, we constructed a unary logical relation C[t] representing those expressions that terminate (with a value of type t) and then proved that every well-typed expression was in C[t] and thus every well-typed expression terminates. We defined: C[t] = { e | e ->* v } V[int] = { i } V[t1->t2] = { v | All v' in V[t1].v v' in C[t2] } V[G] = { g | All x in Dom(G).g(x) in V[G(x)] } Note that these sets still make sense in our language with fix, so now we're going to try to prove that if G |- e : t, then for all g in V[G], g(e) in C[t]. Note that the proof goes through for all of the typing rules we had before, so we really just have to consider the new typing rule for fix. So suppose our typing derivation ends with: G,f:t1->t2 |- \x:t1.e : t1->t2 ------------------------------ G |- fix f(x:t1):t2.e : t1->t2 Pick a g in V[G]. We must show g(fix f(x:t1):t2.e) in C[t1->t2]. We know: g(fix f(x:t1):t2.e) = fix f(x:t1):t2.g(e) (by alpha-variance) --> \x:t1.(g(e))[fix f(x:t1):t2.e/f] (by evaluation) = \x:t1.g[fix f(x:t1):t2.e/f]e (by defn. of subst.) So it suffices to show \x:t1.g[fix f(x:t1):t2.e/f]e in V[t1->t2]. Now by induction, we know that for all g' in V[G,f:t1->t2], that g'(\x:t1.e) in C[t1->t2] (and in fact V[t1->t2] since g'(\x:t1.e) = \x:t1.g'(e) and is a value already.) At first blush, it appears that picking g' to be g[fix f(x:t1):t2.e/f] and invoking the induction hypothesis would do the trick. But alas, we need to show that g[fix f(x:t1):t2.e/f] in V[G,f:t1->t2]. Now it's clear from assumptions regarding g that for all y != f, g[...](y) in V[(G,f:t1->t2)(y)] But we also need to establish that fix f(x:t1):t2.e in V[t1->t2] in order to show that the induction hypothesis truly applies. But then we see that this is exactly what we started off trying to prove! So the point is that when constructing logical relations-style arguments, it's often the case that you can write down well-founded relations, but that you can't prove the fundamental theorem relating the proof-system to the relations. In this case, the proof broke down in a very subtle way, and it's all to easy to miss this problem. Of course in this case, it's clear that we shouldn't be able to prove that every well-typed term terminates because: (fix f(x:int):int.f(x)) 0 certainly diverges. But in other cases, it's not so easy to see that you've got a problem. Moral: be very careful when involing your induction hypothesis... Suppose we weaken our definition of C[t], for instance by writing: C[t] = { e | e diverges or else e ->* v and v in V[t] } Intuitively, this definition says that a well-typed expression is one that either diverges or else evaluates to a value of the right type (i.e., the language is type-safe). Now we know this is true, but as the above argument shows, we cannot easily construct a proof of this using the technique above. So there are 3 basic ways that people handle this sort of problem: 1) Do a syntactic proof of type soundness: That is, prove that (a) [progress] if |- e : t and e -> e', then |- e : t (b) [preservation] if |- e : t then either e = v or else e -> e' (c) [canonical forms] if |- v : int then v = i for some i and if |- v : t1->t2 then v = \x:t1.e for some \x:t1.e. Note that big-step is no help here since we cannot distinguish between an expression that diverges and one that gets stuck. But a syntactic proof that uses the small-step semantics ensures that either a program diverges or it evaluates to a value of the right type. This is relatively easy to do and scales to other features (e.g., refs, concurrency, etc.) which is why you see it a lot. On the other hand, it doesn't help when you want to prove something more interesting than type-soundness (e.g., some extensional notion of equality.) 2) Construct a denotational semantics and prove the property in the model. The primary advantage, especially for recursive functions, is that the model will give us a way to address recursive functions in an inductive fashion. 3) Use a step-indexed relational model. This is a relatively new technique that has only popped up in the last couple of years, but it combines aspects of the previous two approaches. I'm assuming that you folks can do 1) so we'll concentrate on 2) and 3). They have the advantage that they extend to other kinds of properties besides type-soundness. They have the unfortunate property that they don't scale as well to other language features (e.g., concurrency.) --------------------------------------------------------------------- Denotational Semantics for Recursive Functions: The basic idea here is similar to what we did for IMP. We'll model lambda-calculus functions as set-theoretic functions and we'll use a fixed-point calculation to construct the meaning of recursive functions (just as we did for while-loops.) To make the development a little more general, it's helpful to introduce a few abstractions, notably CPO's and continuous functions. A *partial order* (PO) is a set S equipped with an ordering <= that is reflexive, transitive, and anti-symmetric. When we have a subset S of a PO and an element x of that PO such that for all y in S, y <= x, then we say x is an *upper bound* of S. If for all upper bounds z of S, x <= z then we say x is the *least upper bound* (lub) of S. [The dual holds for lower bounds and greatest-lower bounds (glb).] A function f from a PO C to another PO D is *monotone* if for all x <= y, f(x) <= f(y). A *chain* in a PO is a (possibly infinite) sequence of elements that is totally ordered (e.g., x1 <= x2 <= ... ). A function f from C to D is *continuous* if it is monotone and preserves limits. That is, given a chain S in C, f(lub(S)) = lub(map(f,S)). We write [C -> D] to denote the space of continuous functions from C to D. A *complete partial order* (CPO) is a PO where every chain has a least upper-bound in the PO. A *pointed* CPO is a CPO with a least element (usually denoted _|_ and pronounced "bottom".) Whew! A few intuitions behind how we're going to use the definitions: The partial order business is going to correspond to our notion of information theoretic approximation. Remember when we built up the meaning of the while-loop as the limit of a sequence of approximations? That's what we're going to do here, except that we have to deal with more domains than stores (functions from locations to integers) and the key thing is that our fixed-point construction works on any pointed CPO. Fortunately, the space [C -> D] forms a pointed CPO when D is a pointed CPO. For base types, we're going to use the natural set coupled with a discrete ordering (i.e., x <= y iff x == y). For instance, our interpretation V[int] will be the set of all integers equipped with the discrete order. Note that using the usual order on integers does not result in a CPO since the limit of the chain {1,2,3,...} is not in the set of integers. However, for any set, the discrete order makes a CPO since the only chains consist of all the same elements. For product types, we're going to just define V[t1 * t2] to be the cartesian product of V[t1] and V[t2]. For the ordering, we'll define: (x1,y1) <= (x2,y2) in V[t1*t2] iff x1 <= x2 in V[t1] and y1 <= y2 in V[t2]. For computations at type t, we're going to use the value space of t lifted so as to include an extra bottom element. This will ensure that the computation space is always a pointed CPO. In particular, C[int] will be V[int] + {_|_} with the ordering: _|_ <= i i <= i Again, the only chains we have are of finite height, so it's clear that this forms a CPO. For functions of type t1 -> t2, we're going to define V[t1->t2] to be the space of all continuous functions from V[t1] to computations of C[t2]. That is, V[t1 -> t2] = [V[t1] -> C[t2]]. The ordering that we'll use on this function space is inherited from the domain and co-domain as follows: f <= g in V[t1 -> t2] iff for all x <= y in V[t1], f(x) <= g(y) in C[t2]. Note that for the space C -> D, where D is pointed, there is a least element: the function f that maps each element of C to D's bottom. In summary, we have: C[t] = V[t] + {_|_} V[int] = Z V[unit] = {()} V[t1*t2] = V[t1] * V[t2] V[t1 -> t2] = [V[t1] -> C[t2]] For type environments, let us define: V[.] = V[unit] V[G,x:t1] = V[G] * V[t1] Why did we restrict the interpretation of functon types to be continuous functions? The answer is that we want to define the meaning of fix as the limit of some chain of approximations. But that limit isn't guaranteed to exist unless we restrict our attention to continuous functions. Now we need to define our semantic interpretation for expressions: E[G |- i : int] = fn r:V[G] => i E[G,x:t |- x : t] = fn r:V[G] => pi_2 r E[G,y:t' |- x : t] = fn r:V[G,y:t'] => (E[G |- x:t](pi_1 r) E[G |- e1 e2 : t] = fn r:V[G] => app(E[G |- e1]r,E[G |- e2]r) E[G |- \x:t1.e : t1->t2] = fn r:V[G] => fn v:V[t1] => E[G,x:t1 |- e:t2](r,v) E[G |- fix f(x:t1):t2.e : t1->t2] = fn r:V[G] => lub((F r)^i(_|_)) where F = fn r:V[G] => fn f:[V[t1]->C[t2]] => E[G,f:t1->t2 |- \x:t1.e:t1->t2](r,f) I'm using (fn x:D => ...) and app(...,...) as meta-level notation for constructing set-theoretic functions here. Notice that in general, we have E[G |- e : t] yielding a meta function of type V[G] -> C[t]. In fact, I claim it yields a continuous function [V[G] -> C[t]] and in fact, we really need to prove that whenever we can show G |- e : t, then E[G |- e : t] yields a continuous function [V[G] -> C[t]]. To show this, we would need to prove a few theorems about the monotonicity and continuity of the pairing, projection, and application operations used at the meta-level. (Good exercise, or see Winskell for details.) In addition, we need to show that the definition of fix makes sense. Fortunately, the fact that the functional (F r) is continuous saves us here, as does a general fixed-point theorem: If D is a pointed CPO and F : [D -> D], then lub(F^i(_|_)) = F(lub(F^i(_|_))). As a corrollary, we have that if |- e : t, then E[|-e:t]() in C[t]. Of course, we still need to prove the adequacy of this model: If e -> v then E[e] = E[v] and if e diverges, then E[e] = _|_. The key is that we can appeal to the inductive definition of fix in the model. --------------------------------------------------------------------- Next time: step-indexed model...