Polymorphism:
------------
There are at least 3 kinds of polymorphism that show up in programming
languages:
* subtype polymorphism
* parametric polymorphism
* so-called `ad hoc' polymorphism
and these days, languages such as Java and C# include all of these.
The primary goal of polymorphism is to make it possible to (a) re-use
code (i.e., support code abstraction) and (b) introduce user-defined
type abstractions. The addition of parametric polymorphism, in
particular, is just as important as the addition of procedural
abstraction in that just as we can define new functions or commands
that are application-specific, we can also define types (and
interpretations of those types) that are application-specific.
We're going to examine the type theory behind these features to at
least some degree, but it's a very fertile ground for research
even today.
Subtype Polymorphism:
---------------------
Let me begin by assuming that we've augmented the STLC with n-tuples:
t ::= ... |
e ::= ... | | #i e
#i -> vi
The thing to note is that in some sense every is
also a in the sense that the operations we can perform
on the second type are perfectly valid when applied to the first type.
To me, it's annoying that in ML:
val snd = fn (x:'a*'b) => #2 x
can only be applied to pairs. If I want to use the snd function on
triples, I'm out of luck in spite of the fact that there's no good
reason for this. This violates a cardinal principle of language
design: never force the programmer to write things twice. That is,
give them the tools to factor out the common bits of code.
In a language with subtyping, this problem goes away. In particular,
we might add the following two rules to our type system:
<=
G |- e : t' t' <= t
------------------------ (subsumption)
G |- e : t
The first rule is a subtyping relation that says an n+1 tuple can be
used in lieu of an n tuple (with components of the same type
otherwise.) This is sometimes called "width-subtyping".
The second rule augments our normal typing judgment with a subsumption
rule that says whenever we can prove t' is a subtype of t, we can
treat a t' expression as if it were a t expression. It's fairly easy
to extend a subject reduction-style argument to show that even with
these rules added, programs can't go "wrong" (i.e., get stuck because
we attempt to apply a primitive operation to values of the wrong
type.)
Usually, the subtyping relation is a partial order meaning that we
also have rules witnessing reflexivity and transitivity of subtyping:
t <= t t1 <= t2 t2 <= t3
--------------------
t1 <= t3
Additionally we may be able to lift the subtyping through type-
constructors. For instance, because our n-tuples are immutable we can
also allow "deep subtyping":
t1 <= t1' t2 <= t2' ... tn <= tn'
-------------------------------------
<=
Of course, you shouldn't take my word that this is actually type-safe,
but rather should try to push through a proof of soundness.
For function types, a curious thing happens:
t1' <= t1 t2 <= t2'
-------------------------------------
t1 -> t2 <= t1' -> t2'
To understand the rule, consider a function f : t1->t2 and consider a
context of the form:
let x:t2' = [] (a:t1') in e
Is it safe to plug f into the hole? f expects a t1 as an argument but
we're feeding it a t1'. So it had better be the case that any
operation we can perform on t1 values can also be performed on t1'
values. That's why we require t1' <= t1. f returns a t2 value, but
the context wants to use the value as if it were a t2'. So it had
better be the case that any operation we can perform on a t2 can also
be performed on a t2'. That's why we have t2 <= t2'.
We say that the arrow type constructor is co-variant in the result
type and contra-variant in the argument type: contra- because the
order gets flipped around. Tuples are co-variant in their type
components.
Exercise: what is the right subtyping rule for n-ary sums?
(i.e., non-recursive ML datatypes?)
Semantically, there are two ways that we can treat sub-typing.
The first interpretation is a "sub-types as sub-sets" principle
with the idea that if t1 <= t2, then the set denoted by
V[t1] is a subset of V[t2]. This is sometimes called subtyping
by inclusion because V[t2] includes all of the elements of V[t1].
Inclusion subtyping demands a sort of uniform representation for t1
and t2 values but justifies certain mathematical subtyping relations
that we'd like to see, such as V[int] <= V[real]. Of course, math and
CS never quite align and in reality, we choose very different
representations for values of these two types. On the other hand,
this can and does make sense for things like n-tuples.
An alternative interpretation of subtyping is a coercion-based
approach. The idea is that whenever t1 <= t2, then there exists
a function f : t1 -> t2 that can be used to coerce t1 values to
t2 values. Indeed, we might rewrite the typing rules to witness
the actual coercion that tells us how to go from t1 to t2:
t <= t : \x.x t1 <= t2 : f t2 <= t3 : g
-----------------------------
t1 <= t3 : f o g
<= : \x.<#1 x,#2 x,...,#n x>
So reflexivity is witnessed by the identity function, transitivity
by composition, and for width subtyping, we have a coercion that
takes an n+1 tuple as an argument and builds the n-tuple out of it.
Exercise: what's the coercion for depth subtyping on tuples?
what's the coercion for n-ary sums?
In some sense, the coercion approach is really more permissive
than the inclusion approach, because it appears that we can
have an arbitrary coercion from any two types. But there are
limits on what coercions make sense. In particular, we would
like to have *coherence*. This is the idea that no matter how
we construct a typing derivation for a given program, the
meaning of the program doesn't change.
Notice that when we added the subtyping rules to the proof system
that there now are many ways to type the same program. For
instance, even the simple expression "3" can be typed with a
number of different proofs:
1. directly using an axiom: |- 3 : int
2. |- 3 : int int <= int
------------------------
|- 3 : int
...
Now consider what happens if we do something like add:
int <= real : \x.Real.fromInt(x)
as a subtyping axiom and consider a program:
MAXINT + 4
One possible typing derivation is:
|- MAXINT : int |- 4 : int
-----------------------------
|- MAXINT + 4 : int |- int <= real
-------------------------------------------
|- MAXINT + 4 : real
Another possible typing derivation is:
|- MAXINT : int int <= real |- 4 : int int <= real
----------------------------- -------------------------
|- MAXINT : real |- 4 : real
---------------------------------------------------------
|- MAXINT + 4 : real
The first corresponds to doing integer addition first, and then
converting to a floating-point number. The second corresponds to
coercing the two values to a floating point number and then using
floating point addition to generate the result. In all likelyhood,
we'll get different results, so this treatment of subtyping is not
coherent. One way to establish coherence is to give a (faithful)
model based on inclusion polymorphism and show that, in the model, all
of the coercions are equivalent to the identity function.
There are two more issues with coercions: First, a coercion-based
approach is fundamentally incompatible with mutable references.
The problem is that we're making a *copy* of the value when we
do a coercion. If we make a copy of a mutable value, then we
have to maintain coherence among the copies when an update is done.
Another problem is that these coercions can be expensive: for
big data structures (e.g., a list) we have to map the element coercion
across the components and generate a whole new list. For these
reasons, coercion-based interpretations are minimized in many
realistic programming languages.
The issue of multiple typing derivations for a given term presents
another challenge: type-checking is no longer syntax directed. At a
minimum, we can use the subsumption rule, coupled with reflexivity
anytime we like. In practice, it turns out that we can "normalize"
the proofs so that we restrict subsumption to certain places in
the proof (e.g., function application). If we can come up with
such a normalized proof system, then we are obligated to show that
if we can derive |- e : t using the original, non-deterministic
rules, then there exists a derivation using the normalized rules
of the form |- e : t' where t' <= t. The non-deterministic rules
serve as a sort of declarative specification and are often easier
for the user to reason about. The normalized rules serve as an
algorithmic specification that corresponds more closely to what
the type-checker will do in practice.
For more details on how to do type-checking with subytping,
I suggest looking at Pierce's chapters 15 and 16.
For type inference reasons, it's often useful to formulate the
subtype relation as a complete lattice so that every pair of
types has a least upper bound and greatest lower bound. For
Java's reference types, we can think of Object as the top-most
element of the subtype relation (i.e., every reference type
can be coerced to Object.) When doing type inference, we
generate a system of inequations of the form:
t1 <= a <= t2
where a is a variable and t1 and t2 are lower and upper bounds
on that variable respectively. For instance, for an if-then-else
we might generate:
e : a e1 : a1 e2 : a2
------------------------ (a <= bool && a1 <= a && a2 <= a)
if e then e1 else e2 : a
with the understanding that a <= t is shorthand for Bot <= a <= t
where Bot is the least type (usually void/empty/0).
Often, we can simplify the constraints while preserving solutions.
For instance, if we end up with:
t1 <= a <= t2 && t1' <= a <= t2'
then we can simplify this to:
lub(t1,t1') <= a <= glb(t2,t2')
It's the existence of lub's and glb's that ensure we can always
simplify the constraints for a given variable to a single lower and
upper bound. A type error occurs when we end up with a constraint of
the form:
t1 <= a <= t2
and t1 is not less-than-or-equal to t2. One problem with a complete
lattice is that, in some sense, it tends to make fragments type-check
that really shouldn't. For instance, consider the fragment:
let x = if e then 1 else "foo"
in
x + 2
end
In ML, we'd get an error that says 1 has type int, whereas "foo" has
type string. In a system based on subtyping, the if-then-else would
be well-typed with the type Top. So we'd get an error at the point
where we try to add x to 2.
In realistic languages, we tend to see subtyping arise on records.
For instance, it seems to make sense to have:
{x:int,y:int,z:int} <= {x:int,y:int}
However, this adds a serious implementation constraint. First,
consider a language such as C where the order of fields in a
record matter. In particular, it is not the case that values
of type:
struct {int x; double y;}
also have the type
struct {double y; int x;}
In such languages, (an inclusion-based) approach to subtyping only
makes sense when we drop fields on the right. That is, it does
make sense in C (from a representation perspective) to have
struct {int x; int y; int z;} <= struct {int x; int y;}. This
is essentially what we did with the tuples above, and essentially
what Java/C# doe for classes.
Now consider a language like ML where the order of members doesn't
matter. In particular, as far as ML is concerned, {y:int,z:int,x:int} =
{x:int,y:int,z:int}. Usually, an ML implementation will sort the
field names of a record to get a canonical form. If we do this,
then we need to know the *whole* record type to determine the offset
of a given field. So, it's not the case that we can just
treat an {x:int,y:int,z:int} <= {y:int,z:int} because in the
first case, we think y is at offset 1 word, whereas in the second
case, we think it's at offset 0 words.
If we wanted to support (inclusion) subtyping for ML, we'd need
another strategy. One idea is to have a "wrapper dictionary" around the
record that holds the actual offsets for the fields. For instance,
if the underlying real record r is {x=3,y=4,z=5} but we're treating
it as if it has type {y:int,z:int}, then the wrapper would tell
us that y is at offset 1 word and z is at offset 2 words:
+----+----+---+ +---+---+---+
| 1 | 2 | o------->| 3 | 4 | 5 |
+----+----+---+ +---+---+---+
Everytime we use subsumption, we would build a new wrapper
dictionary but share the same underlying record. This is
the essence of what interfaces in Java/C# do. Of course, a
good compiler will track the *actual* runtime type as accurately
as possible and avoid the overheads of constructing the wrapper.
But the point is that this actually utilizes a combination of
coercion and inclusion subtype polymorphism. Furthermore, it
affords more code reuse because we can use the same function
regardless of the layout of the fields.
Finally, you may be thinking about *nominal* subtyping. This
is the idea that we should declare types (i.e., give them names)
and only consider N1 <= N2 when the programmer declares that
it should be so. As we'll argue, this is simply an instance
of the more general idea of bounded, parametric polymorphism.
Regardless, we still need a notion of *structural* subtyping
to ensure that whatever types N1 and N2 map to, they are
subtype compatible.
Parametric Polymorphism:
------------------------
Subtype polymorphism is extremely useful but it's also forgetful.
Consider:
pt2d = {x:int,y:int}
pt3d = {x:int,y:int,z:int}
bumpx : pt2d -> pt2d
If we call bumpx on a pt3d, then we get out a pt2d. In contrast,
parametric polymorphism lets us track input/output relationships.
And of course, the problem with pure parametric polymorphism is
that the types are completely abstract. It's the combination of
parametric and subtype polymorphism (known as bounded polymorphism)
that really gives us the kind of expressfulness that we want.
But before we get there, let's examine parametric polymorphism
more carefully.
We'll start by augmenting our language as follows:
t ::= 'a | t -> t | All 'a.t
v ::= \x:t.e | /\'a.e
e ::= x | v | e1 e2 | e t
with the added evaluation rules:
e -> e'
-----------
e t -> e' t
(/\'a.e) t -> e{t/'a}
This is the polymorphic lambda calculus or F2 and was co-discovered
by Girard (a logician) and Reynolds (a computer scientist.)
We're going to modify our typing judgment for F2 to be of the form:
D;G |- e : t
where D is a set of type-variables (those in scope) and G is a
set of value variables along with their types. Additionally,
we're going to add a judgment:
D |- t : *
meaning t is a well-formed type under D. The rules are as follows:
D |- 'a : * ('a in D)
D |- t1:* D |- t2:*
--------------------
D |- t1 -> t2 : *
D+{'a} |- t : *
-----------------
D |- All 'a.t : *
The rules for terms look like this:
D;G |- x : G(x)
D |- t : * D;G,x:t |- e : t'
--------------------------------
D;G |- \x:t.e : t->t'
D;G |- e : t'->t D;G |- e' : t'
----------------------------------
D;G |- e e' : t
D+{'a};G |- e : t
------------------------
D;G |- /\'a.e : All 'a.t
D;G |- e : All 'a.t D |- t' : *
----------------------------------
D;G |- e t' : t{t'/'a}
They're surprisingly simple -- in essence, /\'a.e lets us parameterize
a function by a type in the same way that \x:t.e lets us parameterize
a function by a value. The term "e t" lets us call a type-abstraction,
passing t as the actual type argument. The key thing is that when
we do call a type-abstraction, the type becomes *specialized* to
the argument type.
It's also important to note that there are no operations on values
of variable type. That is, they are absract.
Let's think about how we might encode some higher-language features
using the facilities of F2. First, we know that we can encode let:
let x:t = e1 in e2 ==> (\x:t.e2) e1
Next, I claim that we can encode pairs t1 * t2. We want some
encoding T[a * b] such that:
fst : All a,b.T[a * b] -> a
snd : All a,b.T[a * b] -> b
pair : All a,b.a -> b -> T[a * b]
The trick is to make the constructor for pairs incorporate the
elimination forms. In particular, let us define:
T[a * b] = All c.(a->b->c) -> c
The idea is that when we want to use a pair, we want to get at
its two components to produce some result. Now we can define
fst = /\a,b.\p.p b (\x.\y.x)
snd = /\a,b.\p.p a (\x.\y.y)
pair = /\a,b.\x.\y./\c.\f.f x y
and you can check for yourself that these are type-correct
and furthermore:
fst a b (pair a b v1 v2) ->* v1
snd a b (pair a b v1 v2) ->* v2
We can also encode sums. How do we use a sum? We pattern
match on it. So we want to take:
T[a + b] = All c.(a -> c) -> (b -> c) -> c
That is, a case expression takes one branch for the a->c
case, and another branch for the b->c case, and depending
upon whether we have an a or b, calls the right branch
to generate the c. So now we can define:
inl : All a,b.a -> T[a + b]
inr : All a,b.b -> T[a + b]
case : All a,b,c.T[a + b] -> (a -> c) -> (b -> c) -> c
as follows:
case = /\a,b,c.\s.\f.\g.s c f g
inl = /\a,b.\x./\c.\f.\g.f x
inr = /\a,b.\y./\c.\f.\g.g y
and you can confirm that these have the right types and:
case a b c (inl a b v) f g ->* f v
case a b c (inr a b v) f g ->* g v
Of course, if you've got sums, then you've got booleans and
if-then-else.
What shall we take for unit and zero? How about:
T[unit] = All a.a->a
T[zero] = All a.a
It turns out that semantically, there's only one (math) function that
occupies the type All a.a->a: the identity function. Or put another
way, every (closed) F2 function of type All a.a->a can be normalized
to the identity function. And, mathematically, the type All a.a is
empty. That is, there is no normal form that has the type All a.a.
We can also define natural numbers. The thing you use a natural
number for is to loop. So let us define:
T[nat] = All a.(a -> a) -> a -> a
The idea is that when we take a natural n, and apply it to a function
f, we'll get back a function equivalent to n iterations of f.
0 = /\a.\f.\x.x
n = /\a.\f.\x.f(f(...(f x)...))
Now we can define inc : T[nat] -> T[nat] as follows:
inc = \n./\a.\f.\x.f(n a f x)
Once we have inc, it's easy to define plus:
plus : T[nat] -> T[nat] -> T[nat]
plus = \n.\m.n _ inc m
(Exercise: fill in the "_" with the right type.)
So plus just increments m n times.
Once you have plus, you can define times and so on.
Fun Exercises: try to define comparison n < m and then define
subtraction n - m so that when m < n, we get n-m but when m > n, we
get zero.
It turns out that naturals are a special case of the more
general inductive type of lists:
T[list a] = All c.(a -> c -> c) -> c -> c
The type is the type of a foldr for lists of a elements.
Exercise: define cons, foldr, and map for lists.
In general, any inductive datatype can be coded up using this style of
encoding for F2. The fact that we can iterate using naturals, and do
things like fold/map, etc. makes it seem like we might have lost
control of this language. But in fact, just like the simply-typed
lambda calculus, every F2 program terminates and every F2 program has
a unique normal form (and thus a sound equational theory.) Nonetheless,
it's a very powerful programming language and one that deserves
a lot of study because it is surprisingly useful.
Nonetheless, we *do* lose something when we move from F1 to F2.
In particular, there is no simple set-theoretic semantics for
F2. That is, we can't just interpret types as sets of values.
Next time, we'll see why.