Recursive types:
Last time, we saw that adding recursive functions to the language
makes it somewhat difficult to construct logical relations. We had to
build a denotational model of the language, based on CPOs and
continuous functions, prove the model adequate, and then use the model
to argue that if all the finite approximates of a recursive function
are in a relation, then so is the limit. In my opinion, this is a
painful detour, especially when you consider that the syntactic proof
of type soundness is so simple.
Worse, things get harder when we generalize the language to include
refs and/or recursive types. To understand the issue, it helps to
first understand that recursive types really underly both refs and
recursive functions. That is, if you give me recursive types, then I
can faithfully simulate these other features.
Consider, for instance, how to write an interpreter for core-ML
with refs:
e ::= x | i | () \x.e | e1 e2 | ref e | !e | e1 := e2
If I was writing an interpreter for this language in ML, and I didn't
want to use refs, then I could write:
datatype D = Int of int | Unit | Lam of D -> store -> (store * D)
| Ref of int
withtype store = {heap: int -> D, next:int}
fun return (v:D) (s:store) : (store*D) = (s,v)
val op >>= = fn (f:store->store*D) -
fun alloc (v:D) ({heap,next}) =
let val heap2 = fn j => if j = next then v else heap j
in ({heap=heap2,next=next+1},Ref next)
end
fun deref (Ref i) (s:store) =
(s,(#heap s) i)
fun assign (Ref i) (v:D) ({heap,next}) =
let val heap2 = fn j => if j = i then v else heap j
in ({heap=heap2,next=next},Unit)
end
fun interp (e:exp) (env:var->D) (store -> store*D) =
case e of
Var_e x => env x
| Int_e i => return (Int i)
| Unit_e => return Unit
| Lam_e(x,e) =>
return (Lam (fn v => interp e (fn z => if z = x then v else env z)))
| App_e(e1,e2) =>
(interp e1) >>= (fn (Lam f) => (interp e2) >>= f)
| Ref_e(e) => (interp e) >>= alloc
| Deref_e(e) => (interp e) >>= deref
| Assign_e(e1,e2) =>
(interp e1) >>= (fn v1 =>
(interp e2) >>= (fn v2 => assign v1 v2))
Notice that our interpreter's definition depends crucially upon the
recursively defined type D.
Furthermore, it's clear that the language we're interpreting supports
recursive functions in the sense that we can always do something like:
let val r = ref (fn (x:int) => x)
val loop = fn (i:int) => (!r)(i)
in
r := loop;
(!r)(0)
end
That is, recursion can be encoded in terms of refs, and the
interpreter above shows that refs can be encoded in a language that
only provides recursive types. (Although the interp function is
"recursive", in some sense, it's really inductive since the size of
the expression only goes down. That is, in principle, we can
represent expressions as trees and write interp using only a "fold" on
those trees.)
So to some degree, to really tackle the problems of recursive
functions or refs, you really have to tackle the problem of recursive
types. Another way to see this is that if you can write an
interpreter for the *untyped* lambda calculus, then you're dealing
with someting that's touring complete. But fundamentally, the untyped
lambda calculus consists of functions, which take untyped-lambda
calculus functions as arguments, and return them as results. That is,
the type we need for this language is really:
datatype D = Fn of D->D
Now building a set-theoretic interpetation of this equation is
problematic. To see this note that we need some set D such that D =
D->D. But for a set D with more than 1 element, the functions in D->D
cannot be put in one-to-one correspondence with D. So a simple
counting argument tells us that from a simple set theoretic point of
view, this equation is unsolvable.
It's important to note that not all "recursive" types are really
recursive. Consider for instance:
datatype int_list = Nil | Cons of int * int_list
We can build a simple set-theoretic model of this by defining:
INT_LIST[0] = { Nil }
INT_LIST[i+1] =
INT_LIST[i] + {Cons(i,v) | i in Int & v in INT_LIST[i]}
and then taking the meaning of int_list to be the union of INT_LIST[i]
for all i >= 0. The problem comes up when we have a *negative*
occurrence of the recursive type variable. (Another problem arises if
the recursive type is non-expansive -- e.g., datatype d = D of d.)
Of course, there is a way to deal with this by finding a D and some
subset of D->D (corresponding to the "computable" functions on D) such
that D is *isomorphic* to the subset of D->D. This is the famous
construction by Dana Scott and requires doing for types essentially
what we were doing for recursive functions: building up approximates
and then taking limits. This requires considerably more technical
material. (Winskel uses one formulation based on information systems,
Gunter uses another based on category theory, etc.)
It's not that all of this math is that hard, but rather, it seems like
we're throwing big hammers at what should be a reasonably simple task:
modeling the core features of programming languages.