[Im]Predicativity:
------------------
Suppose we had the following type system, which extends
System F (aka F2) with a base type of "ints":
t ::= int | t -> t | 'a | All 'a.t
e ::= i | x | \x:t.e | e1 e2 | /\'a.e | e t
and suppose we wish to define a set-theoretic, denotational
semantics for the language. As with the simply-typed lambda
calculus, we might start off writing:
T[int] = Z -- interpret int as the set of integers
T[t1->t2] = T[t1] -> T[t2] -- interpret arrow as functions
Oops. How do we interpret type variables and All-types?
One idea is to parameterize the interpretation T of types
by an environment mapping type variables to sets. That is,
we will interpret judgments:
D |- t : *
under a type environment d such that for all 'a in D, d('a)
yields a set. Then we might rework our definition as:
T[D |- int : *]d = Z
T[D |- t1 -> t2 : *]d = (T[D |- t1 : *]d) -> (T[D |- t2 : *]d)
T[D |- a : *]d = d(a)
How about All-types? Remember that I said one idea is that
we could think of All 'a.e as an infinite product indexed
by types (in the same way that t -> t' is an infinite product
of t' values, indexed by t values)? So we might try this:
T[D |- All 'a.t : *]d = Pi S.(T[D |- t : *]d['a|->S])
The problem with this definition is that we have not specified
what S can range over. Intuitively, it needs to range over
all interpretations of types. But that means we can't define
the interpretation of All 'a.t without first defining the
interpretation of all types, which alas, includes All 'a.t.
As an example of the kind of "circular" reasoning that can
be done note that given:
id : All 'a.'a -> 'a
id = /\'a.\x:'a.x
it's possible to pass the identity function to itself:
id (All 'a.'a->'a) : (All 'a.'a->'a) -> (All 'a.'a->'a)
id (All 'a.'a->'a) id : All 'a.'a->'a
That should make you very, very suspicious. In particular, it
makes me feel like there ought to be some way to build a looping
construct in the language. Thus, it is very surprising to
discover that no, we can't construct non-terminating functions
in F2.
But the key thing is that there's not going to be a constructive
way to build up the interpretation of types in a fashion like
what we were trying to sketch above. It's this kind of circularity
that is termed "impredicative". (For historical notes on this
term, see Pierce's book.) Intuitively, whenever you have a quantified
type, and the quantifier ranges over things that include the
quantified type itself, then the language is going to be impredicative--
you're going to run into this kind of circular reasoning.
There are at least two possible ways to fix this conundrum:
1. Stratify the language so that there's no circularity.
This simplifies the semantics but loses some polymorphism.
Constructive type theorists like this approach because they
are somewhat suspicious of the alternative, based on philisophical
arguments (that I don't necessarily appreciate.) Perhpas the
best argument for this approach is that it's *simple* (and
thus somewhat trustworthy.)
2. Define a universe U that's *bigger* than all of the interpretations
from the beginning, but doesn't appeal to itself or the interpretations
in its definition. Then when we encounter a quantifier, let it
range over the already defined U. Finally, show that all of your
resulting interpretations fit within U.
Let's talk about the stratified approach first. In particular,
let's split the types into two levels, the first of which I'll
call "monotypes" (types without quantifiers) and the second
of which I'll call "polytypes" (types with quantifiers). The
key is that the quantifiers will range only over mono-types and
not poly-types:
t ::= int | 'a | t -> t
s ::= t | All 'a.s | s -> s
Now let us define an interpretation for *closed* monotypes:
M[int] = Z
M[t1 -> t2] = M[t1] -> M[t2]
Now we can define:
U1 = union of all M[t] s.t. t is a monotype
Next, let us define an interpretation for polytypes, this
time parameterized to support type variables. Note, however,
that our d will map type variables to mono-types:
S[int]d = Z
S['a]d = d('a)
S[t1 -> t2]d = (S[t1]d) -> (S[t2]d)
S[All 'a.s]d = Pi S:U1.S[s]d['a|->S]
S[s1 -> s2]d = S[s1]d -> S[s2]d
Now the definition of S is well-founded, and it relies crucially
upon the fact that we established U1 first, and then defined
the meaning S[All 'a.s] in terms of U1. Note that we can define
U2 = union of all S[s] s.t. s is a closed polytype. And of course,
we could build up a U3 such that its quantifiers range over U2
types, and so on and so on.
The stratification that you see here is typical of ML. Indeed,
Harper, Moggi and Mitchell created a core language called XML
(not to be confused with today's XML) that was meant to capture
the essence of ML-style polymorphism. In fact, ML goes a bit
further and restricts polytypes to be of the form:
s ::= t | All 'a.s
That is, it restricts the quantifiers to prenex positions. So,
in particular, you can't write a function which takes a polymorphic
function as an argument, nor write a function that returns a polymorphic
function as a result. So, for instance, you can't pass the
identity function to itself. What you can do is pass an
*instantiation* of the identity function to itself, but that
instantiation must reduce the number of quantifiers. Another way
to say this is that polymorphic functions are second-class in ML.
The restriction to prenex form is really an issue of type-inference
and not a semantic issue. In particular, the way the Damas-Milner
algorithm works forces the prenex restriction. Some languages, such
as the current version of GHC, get rid of the prenex restriction using
a combination of user-supplied annotations coupled with more advanced
type inference. However, when GHC added support for this, they
crossed a semantic divide because they actually allowed type variables
to range over quantified types. That is, GHC is actually
impredicative.
Personally, I hate the fact that ML is so limited in expressive
power and applaud GHC for pushing the boundaries. Part of the
reason is that many of the encodings we talked about last time
(products, sums, ...) require first-class polymorphism. As another
great example, if you have first-class All-types, then you
can simulate Exist-types:
Consider:
T[Exists 'a.t] = All 'b.(All 'a.t -> 'b) -> 'b
It's perhaps easiest to read this logically:
Exists 'a.t =~= not(All 'a.not(t))
=~= (All 'a.not(t)) => false
=~= (All 'a.t => false) => false
=> All 'b.(All 'a.t => 'b) => 'b
As we'll see later on, existential types are the key to understanding
abstract data types (ADTs) and objects. Regardless, note that
an existential needs to take as an argument a polymorphic function.
With the prenex restriction, there's no way to express this!
My conclusion: don't let the type inference of today restrict
the language design. Tomorrow, we're likely to come up with
better inference algorithms, and there are very, very clever
encodings (read useful abstractions) that you're likely to prevent
by imposing such a restriction on the language.
--------------------------------------------------------------------
So stratifying the types is one way to solve the apparent circularity
involved with an interpretation of polymorphism. The price paid
is that we can't instantiate a quantifier at level k+1 except with
types from level k. Is there not some way to collapse all of the
levels?
The answer is yes and the solution is really found in Girard's PhD
thesis, but was simplified and perfected by many others, including
Statman and Tait. There are two parts to the trick: First, we are
going to interpret All-types as *intersections* instead of dependent
products. Second, we are going to use sets of the underlying untyped
lambda terms (obtained by erasing all of the capital-lambda's and
types in the code) as our big, already defined universe. Along
the way, what we're going to end up doing is proving that reduction
is strongly normalizing.
First let me define:
erase(i) = i
erase(x) = x
erase(e1 e2) = erase(e1) erase(e2)
erase(\x:t.e) = \x.erase(e)
erase(/\'a.e) = erase(e)
erase(e t) = erase(e)
Recall that we consider an expression e to be strongly normalizing
if there is no infinite sequence of reductions starting with e.
I'll write SN for the set of all strongly normalizing terms.
Defn: a set S of (erased) lambda terms is *saturated* if:
1. S <= SN (i.e., all lambda terms are strongly-normalizing)
2. if e1,...,en are in SN then x e1 e2 ... en is in S for all variables x.
3. if e[u/x] e1 ... en is in S, and u is in SN, then (\x.e) u u1 ... un is
in S.
Note that condition 1 is what we're trying to prove. The other
conditions are going to make it possible to push through the proof
of strong normalization.
Let SAT be the set of all saturated sets.
Note that SN is an element of SAT:
1. SN <= SN (trivial)
2. let e1,...,en in SN, then x e1 ... en is SN since the only reductions
we can do are within the ei.
3. [exercise]
and is in fact the largest element of SAT.
Since SAT is non-empty, we can treat it as a complete lattice
that is closed under arbitrary non-empty intersections and unions.
Now we can define:
T[t] so that it yields a saturated set of terms.
In particular, let us assume d maps type variables to saturated sets:
T['a]d = d('a)
T[t1 -> t2]d = { e1 | All e2 in T[t1]d.e1 e2 in T[t2]d }
T[All 'a.t]d = intersect(S in SAT):(T[t]d['a->S])
T[G]d = g s.t. for all x in Dom(G).g(x) in T[G(x)]d
Theorem: If D;G |- e : t, then for all d and g in T[G]d,
g(erase(e)) in T[t]d.
Proof: exercise.
Corrollary: If D;G |- e : t, then e is strongly normalizing.
(Pick G to map x to x!)
The key in all of this is that we already had a definition of SAT
lying around before we started constructing the interpretation of
types.
As a side note, let's consider:
T[All 'a.'a] = intersect(S in SAT):(T['a]d['a->S])
= intersect(S in SAT):S
So T[All 'a.a] is the least element of SAT. At first blush,
you might think this is the empty set, but it's not! Rather,
by condition 2, it must include:
x e1 ... en such that the ei's are strongly normalizing
But notice that this is not a closed term. A more careful
examination will reveal that the bottom of SAT must include
a free variable! Thus, we can conclude that there are no
closed terms that are in the type T[All 'a.'a]. Of course,
it's actually easier to argue this by appealing to the proof
rules of the type system, but nonetheless, this is an appealing
model for the language.
Next up: representation independence.