[Im]Predicativity: ------------------ Suppose we had the following type system, which extends System F (aka F2) with a base type of "ints": t ::= int | t -> t | 'a | All 'a.t e ::= i | x | \x:t.e | e1 e2 | /\'a.e | e t and suppose we wish to define a set-theoretic, denotational semantics for the language. As with the simply-typed lambda calculus, we might start off writing: T[int] = Z -- interpret int as the set of integers T[t1->t2] = T[t1] -> T[t2] -- interpret arrow as functions Oops. How do we interpret type variables and All-types? One idea is to parameterize the interpretation T of types by an environment mapping type variables to sets. That is, we will interpret judgments: D |- t : * under a type environment d such that for all 'a in D, d('a) yields a set. Then we might rework our definition as: T[D |- int : *]d = Z T[D |- t1 -> t2 : *]d = (T[D |- t1 : *]d) -> (T[D |- t2 : *]d) T[D |- a : *]d = d(a) How about All-types? Remember that I said one idea is that we could think of All 'a.e as an infinite product indexed by types (in the same way that t -> t' is an infinite product of t' values, indexed by t values)? So we might try this: T[D |- All 'a.t : *]d = Pi S.(T[D |- t : *]d['a|->S]) The problem with this definition is that we have not specified what S can range over. Intuitively, it needs to range over all interpretations of types. But that means we can't define the interpretation of All 'a.t without first defining the interpretation of all types, which alas, includes All 'a.t. As an example of the kind of "circular" reasoning that can be done note that given: id : All 'a.'a -> 'a id = /\'a.\x:'a.x it's possible to pass the identity function to itself: id (All 'a.'a->'a) : (All 'a.'a->'a) -> (All 'a.'a->'a) id (All 'a.'a->'a) id : All 'a.'a->'a That should make you very, very suspicious. In particular, it makes me feel like there ought to be some way to build a looping construct in the language. Thus, it is very surprising to discover that no, we can't construct non-terminating functions in F2. But the key thing is that there's not going to be a constructive way to build up the interpretation of types in a fashion like what we were trying to sketch above. It's this kind of circularity that is termed "impredicative". (For historical notes on this term, see Pierce's book.) Intuitively, whenever you have a quantified type, and the quantifier ranges over things that include the quantified type itself, then the language is going to be impredicative-- you're going to run into this kind of circular reasoning. There are at least two possible ways to fix this conundrum: 1. Stratify the language so that there's no circularity. This simplifies the semantics but loses some polymorphism. Constructive type theorists like this approach because they are somewhat suspicious of the alternative, based on philisophical arguments (that I don't necessarily appreciate.) Perhpas the best argument for this approach is that it's *simple* (and thus somewhat trustworthy.) 2. Define a universe U that's *bigger* than all of the interpretations from the beginning, but doesn't appeal to itself or the interpretations in its definition. Then when we encounter a quantifier, let it range over the already defined U. Finally, show that all of your resulting interpretations fit within U. Let's talk about the stratified approach first. In particular, let's split the types into two levels, the first of which I'll call "monotypes" (types without quantifiers) and the second of which I'll call "polytypes" (types with quantifiers). The key is that the quantifiers will range only over mono-types and not poly-types: t ::= int | 'a | t -> t s ::= t | All 'a.s | s -> s Now let us define an interpretation for *closed* monotypes: M[int] = Z M[t1 -> t2] = M[t1] -> M[t2] Now we can define: U1 = union of all M[t] s.t. t is a monotype Next, let us define an interpretation for polytypes, this time parameterized to support type variables. Note, however, that our d will map type variables to mono-types: S[int]d = Z S['a]d = d('a) S[t1 -> t2]d = (S[t1]d) -> (S[t2]d) S[All 'a.s]d = Pi S:U1.S[s]d['a|->S] S[s1 -> s2]d = S[s1]d -> S[s2]d Now the definition of S is well-founded, and it relies crucially upon the fact that we established U1 first, and then defined the meaning S[All 'a.s] in terms of U1. Note that we can define U2 = union of all S[s] s.t. s is a closed polytype. And of course, we could build up a U3 such that its quantifiers range over U2 types, and so on and so on. The stratification that you see here is typical of ML. Indeed, Harper, Moggi and Mitchell created a core language called XML (not to be confused with today's XML) that was meant to capture the essence of ML-style polymorphism. In fact, ML goes a bit further and restricts polytypes to be of the form: s ::= t | All 'a.s That is, it restricts the quantifiers to prenex positions. So, in particular, you can't write a function which takes a polymorphic function as an argument, nor write a function that returns a polymorphic function as a result. So, for instance, you can't pass the identity function to itself. What you can do is pass an *instantiation* of the identity function to itself, but that instantiation must reduce the number of quantifiers. Another way to say this is that polymorphic functions are second-class in ML. The restriction to prenex form is really an issue of type-inference and not a semantic issue. In particular, the way the Damas-Milner algorithm works forces the prenex restriction. Some languages, such as the current version of GHC, get rid of the prenex restriction using a combination of user-supplied annotations coupled with more advanced type inference. However, when GHC added support for this, they crossed a semantic divide because they actually allowed type variables to range over quantified types. That is, GHC is actually impredicative. Personally, I hate the fact that ML is so limited in expressive power and applaud GHC for pushing the boundaries. Part of the reason is that many of the encodings we talked about last time (products, sums, ...) require first-class polymorphism. As another great example, if you have first-class All-types, then you can simulate Exist-types: Consider: T[Exists 'a.t] = All 'b.(All 'a.t -> 'b) -> 'b It's perhaps easiest to read this logically: Exists 'a.t =~= not(All 'a.not(t)) =~= (All 'a.not(t)) => false =~= (All 'a.t => false) => false => All 'b.(All 'a.t => 'b) => 'b As we'll see later on, existential types are the key to understanding abstract data types (ADTs) and objects. Regardless, note that an existential needs to take as an argument a polymorphic function. With the prenex restriction, there's no way to express this! My conclusion: don't let the type inference of today restrict the language design. Tomorrow, we're likely to come up with better inference algorithms, and there are very, very clever encodings (read useful abstractions) that you're likely to prevent by imposing such a restriction on the language. -------------------------------------------------------------------- So stratifying the types is one way to solve the apparent circularity involved with an interpretation of polymorphism. The price paid is that we can't instantiate a quantifier at level k+1 except with types from level k. Is there not some way to collapse all of the levels? The answer is yes and the solution is really found in Girard's PhD thesis, but was simplified and perfected by many others, including Statman and Tait. There are two parts to the trick: First, we are going to interpret All-types as *intersections* instead of dependent products. Second, we are going to use sets of the underlying untyped lambda terms (obtained by erasing all of the capital-lambda's and types in the code) as our big, already defined universe. Along the way, what we're going to end up doing is proving that reduction is strongly normalizing. First let me define: erase(i) = i erase(x) = x erase(e1 e2) = erase(e1) erase(e2) erase(\x:t.e) = \x.erase(e) erase(/\'a.e) = erase(e) erase(e t) = erase(e) Recall that we consider an expression e to be strongly normalizing if there is no infinite sequence of reductions starting with e. I'll write SN for the set of all strongly normalizing terms. Defn: a set S of (erased) lambda terms is *saturated* if: 1. S <= SN (i.e., all lambda terms are strongly-normalizing) 2. if e1,...,en are in SN then x e1 e2 ... en is in S for all variables x. 3. if e[u/x] e1 ... en is in S, and u is in SN, then (\x.e) u u1 ... un is in S. Note that condition 1 is what we're trying to prove. The other conditions are going to make it possible to push through the proof of strong normalization. Let SAT be the set of all saturated sets. Note that SN is an element of SAT: 1. SN <= SN (trivial) 2. let e1,...,en in SN, then x e1 ... en is SN since the only reductions we can do are within the ei. 3. [exercise] and is in fact the largest element of SAT. Since SAT is non-empty, we can treat it as a complete lattice that is closed under arbitrary non-empty intersections and unions. Now we can define: T[t] so that it yields a saturated set of terms. In particular, let us assume d maps type variables to saturated sets: T['a]d = d('a) T[t1 -> t2]d = { e1 | All e2 in T[t1]d.e1 e2 in T[t2]d } T[All 'a.t]d = intersect(S in SAT):(T[t]d['a->S]) T[G]d = g s.t. for all x in Dom(G).g(x) in T[G(x)]d Theorem: If D;G |- e : t, then for all d and g in T[G]d, g(erase(e)) in T[t]d. Proof: exercise. Corrollary: If D;G |- e : t, then e is strongly normalizing. (Pick G to map x to x!) The key in all of this is that we already had a definition of SAT lying around before we started constructing the interpretation of types. As a side note, let's consider: T[All 'a.'a] = intersect(S in SAT):(T['a]d['a->S]) = intersect(S in SAT):S So T[All 'a.a] is the least element of SAT. At first blush, you might think this is the empty set, but it's not! Rather, by condition 2, it must include: x e1 ... en such that the ei's are strongly normalizing But notice that this is not a closed term. A more careful examination will reveal that the bottom of SAT must include a free variable! Thus, we can conclude that there are no closed terms that are in the type T[All 'a.'a]. Of course, it's actually easier to argue this by appealing to the proof rules of the type system, but nonetheless, this is an appealing model for the language. Next up: representation independence.