Polymorphism: ------------ There are at least 3 kinds of polymorphism that show up in programming languages: * subtype polymorphism * parametric polymorphism * so-called `ad hoc' polymorphism and these days, languages such as Java and C# include all of these. The primary goal of polymorphism is to make it possible to (a) re-use code (i.e., support code abstraction) and (b) introduce user-defined type abstractions. The addition of parametric polymorphism, in particular, is just as important as the addition of procedural abstraction in that just as we can define new functions or commands that are application-specific, we can also define types (and interpretations of those types) that are application-specific. We're going to examine the type theory behind these features to at least some degree, but it's a very fertile ground for research even today. Subtype Polymorphism: --------------------- Let me begin by assuming that we've augmented the STLC with n-tuples: t ::= ... | e ::= ... | | #i e #i -> vi The thing to note is that in some sense every is also a in the sense that the operations we can perform on the second type are perfectly valid when applied to the first type. To me, it's annoying that in ML: val snd = fn (x:'a*'b) => #2 x can only be applied to pairs. If I want to use the snd function on triples, I'm out of luck in spite of the fact that there's no good reason for this. This violates a cardinal principle of language design: never force the programmer to write things twice. That is, give them the tools to factor out the common bits of code. In a language with subtyping, this problem goes away. In particular, we might add the following two rules to our type system: <= G |- e : t' t' <= t ------------------------ (subsumption) G |- e : t The first rule is a subtyping relation that says an n+1 tuple can be used in lieu of an n tuple (with components of the same type otherwise.) This is sometimes called "width-subtyping". The second rule augments our normal typing judgment with a subsumption rule that says whenever we can prove t' is a subtype of t, we can treat a t' expression as if it were a t expression. It's fairly easy to extend a subject reduction-style argument to show that even with these rules added, programs can't go "wrong" (i.e., get stuck because we attempt to apply a primitive operation to values of the wrong type.) Usually, the subtyping relation is a partial order meaning that we also have rules witnessing reflexivity and transitivity of subtyping: t <= t t1 <= t2 t2 <= t3 -------------------- t1 <= t3 Additionally we may be able to lift the subtyping through type- constructors. For instance, because our n-tuples are immutable we can also allow "deep subtyping": t1 <= t1' t2 <= t2' ... tn <= tn' ------------------------------------- <= Of course, you shouldn't take my word that this is actually type-safe, but rather should try to push through a proof of soundness. For function types, a curious thing happens: t1' <= t1 t2 <= t2' ------------------------------------- t1 -> t2 <= t1' -> t2' To understand the rule, consider a function f : t1->t2 and consider a context of the form: let x:t2' = [] (a:t1') in e Is it safe to plug f into the hole? f expects a t1 as an argument but we're feeding it a t1'. So it had better be the case that any operation we can perform on t1 values can also be performed on t1' values. That's why we require t1' <= t1. f returns a t2 value, but the context wants to use the value as if it were a t2'. So it had better be the case that any operation we can perform on a t2 can also be performed on a t2'. That's why we have t2 <= t2'. We say that the arrow type constructor is co-variant in the result type and contra-variant in the argument type: contra- because the order gets flipped around. Tuples are co-variant in their type components. Exercise: what is the right subtyping rule for n-ary sums? (i.e., non-recursive ML datatypes?) Semantically, there are two ways that we can treat sub-typing. The first interpretation is a "sub-types as sub-sets" principle with the idea that if t1 <= t2, then the set denoted by V[t1] is a subset of V[t2]. This is sometimes called subtyping by inclusion because V[t2] includes all of the elements of V[t1]. Inclusion subtyping demands a sort of uniform representation for t1 and t2 values but justifies certain mathematical subtyping relations that we'd like to see, such as V[int] <= V[real]. Of course, math and CS never quite align and in reality, we choose very different representations for values of these two types. On the other hand, this can and does make sense for things like n-tuples. An alternative interpretation of subtyping is a coercion-based approach. The idea is that whenever t1 <= t2, then there exists a function f : t1 -> t2 that can be used to coerce t1 values to t2 values. Indeed, we might rewrite the typing rules to witness the actual coercion that tells us how to go from t1 to t2: t <= t : \x.x t1 <= t2 : f t2 <= t3 : g ----------------------------- t1 <= t3 : f o g <= : \x.<#1 x,#2 x,...,#n x> So reflexivity is witnessed by the identity function, transitivity by composition, and for width subtyping, we have a coercion that takes an n+1 tuple as an argument and builds the n-tuple out of it. Exercise: what's the coercion for depth subtyping on tuples? what's the coercion for n-ary sums? In some sense, the coercion approach is really more permissive than the inclusion approach, because it appears that we can have an arbitrary coercion from any two types. But there are limits on what coercions make sense. In particular, we would like to have *coherence*. This is the idea that no matter how we construct a typing derivation for a given program, the meaning of the program doesn't change. Notice that when we added the subtyping rules to the proof system that there now are many ways to type the same program. For instance, even the simple expression "3" can be typed with a number of different proofs: 1. directly using an axiom: |- 3 : int 2. |- 3 : int int <= int ------------------------ |- 3 : int ... Now consider what happens if we do something like add: int <= real : \x.Real.fromInt(x) as a subtyping axiom and consider a program: MAXINT + 4 One possible typing derivation is: |- MAXINT : int |- 4 : int ----------------------------- |- MAXINT + 4 : int |- int <= real ------------------------------------------- |- MAXINT + 4 : real Another possible typing derivation is: |- MAXINT : int int <= real |- 4 : int int <= real ----------------------------- ------------------------- |- MAXINT : real |- 4 : real --------------------------------------------------------- |- MAXINT + 4 : real The first corresponds to doing integer addition first, and then converting to a floating-point number. The second corresponds to coercing the two values to a floating point number and then using floating point addition to generate the result. In all likelyhood, we'll get different results, so this treatment of subtyping is not coherent. One way to establish coherence is to give a (faithful) model based on inclusion polymorphism and show that, in the model, all of the coercions are equivalent to the identity function. There are two more issues with coercions: First, a coercion-based approach is fundamentally incompatible with mutable references. The problem is that we're making a *copy* of the value when we do a coercion. If we make a copy of a mutable value, then we have to maintain coherence among the copies when an update is done. Another problem is that these coercions can be expensive: for big data structures (e.g., a list) we have to map the element coercion across the components and generate a whole new list. For these reasons, coercion-based interpretations are minimized in many realistic programming languages. The issue of multiple typing derivations for a given term presents another challenge: type-checking is no longer syntax directed. At a minimum, we can use the subsumption rule, coupled with reflexivity anytime we like. In practice, it turns out that we can "normalize" the proofs so that we restrict subsumption to certain places in the proof (e.g., function application). If we can come up with such a normalized proof system, then we are obligated to show that if we can derive |- e : t using the original, non-deterministic rules, then there exists a derivation using the normalized rules of the form |- e : t' where t' <= t. The non-deterministic rules serve as a sort of declarative specification and are often easier for the user to reason about. The normalized rules serve as an algorithmic specification that corresponds more closely to what the type-checker will do in practice. For more details on how to do type-checking with subytping, I suggest looking at Pierce's chapters 15 and 16. For type inference reasons, it's often useful to formulate the subtype relation as a complete lattice so that every pair of types has a least upper bound and greatest lower bound. For Java's reference types, we can think of Object as the top-most element of the subtype relation (i.e., every reference type can be coerced to Object.) When doing type inference, we generate a system of inequations of the form: t1 <= a <= t2 where a is a variable and t1 and t2 are lower and upper bounds on that variable respectively. For instance, for an if-then-else we might generate: e : a e1 : a1 e2 : a2 ------------------------ (a <= bool && a1 <= a && a2 <= a) if e then e1 else e2 : a with the understanding that a <= t is shorthand for Bot <= a <= t where Bot is the least type (usually void/empty/0). Often, we can simplify the constraints while preserving solutions. For instance, if we end up with: t1 <= a <= t2 && t1' <= a <= t2' then we can simplify this to: lub(t1,t1') <= a <= glb(t2,t2') It's the existence of lub's and glb's that ensure we can always simplify the constraints for a given variable to a single lower and upper bound. A type error occurs when we end up with a constraint of the form: t1 <= a <= t2 and t1 is not less-than-or-equal to t2. One problem with a complete lattice is that, in some sense, it tends to make fragments type-check that really shouldn't. For instance, consider the fragment: let x = if e then 1 else "foo" in x + 2 end In ML, we'd get an error that says 1 has type int, whereas "foo" has type string. In a system based on subtyping, the if-then-else would be well-typed with the type Top. So we'd get an error at the point where we try to add x to 2. In realistic languages, we tend to see subtyping arise on records. For instance, it seems to make sense to have: {x:int,y:int,z:int} <= {x:int,y:int} However, this adds a serious implementation constraint. First, consider a language such as C where the order of fields in a record matter. In particular, it is not the case that values of type: struct {int x; double y;} also have the type struct {double y; int x;} In such languages, (an inclusion-based) approach to subtyping only makes sense when we drop fields on the right. That is, it does make sense in C (from a representation perspective) to have struct {int x; int y; int z;} <= struct {int x; int y;}. This is essentially what we did with the tuples above, and essentially what Java/C# doe for classes. Now consider a language like ML where the order of members doesn't matter. In particular, as far as ML is concerned, {y:int,z:int,x:int} = {x:int,y:int,z:int}. Usually, an ML implementation will sort the field names of a record to get a canonical form. If we do this, then we need to know the *whole* record type to determine the offset of a given field. So, it's not the case that we can just treat an {x:int,y:int,z:int} <= {y:int,z:int} because in the first case, we think y is at offset 1 word, whereas in the second case, we think it's at offset 0 words. If we wanted to support (inclusion) subtyping for ML, we'd need another strategy. One idea is to have a "wrapper dictionary" around the record that holds the actual offsets for the fields. For instance, if the underlying real record r is {x=3,y=4,z=5} but we're treating it as if it has type {y:int,z:int}, then the wrapper would tell us that y is at offset 1 word and z is at offset 2 words: +----+----+---+ +---+---+---+ | 1 | 2 | o------->| 3 | 4 | 5 | +----+----+---+ +---+---+---+ Everytime we use subsumption, we would build a new wrapper dictionary but share the same underlying record. This is the essence of what interfaces in Java/C# do. Of course, a good compiler will track the *actual* runtime type as accurately as possible and avoid the overheads of constructing the wrapper. But the point is that this actually utilizes a combination of coercion and inclusion subtype polymorphism. Furthermore, it affords more code reuse because we can use the same function regardless of the layout of the fields. Finally, you may be thinking about *nominal* subtyping. This is the idea that we should declare types (i.e., give them names) and only consider N1 <= N2 when the programmer declares that it should be so. As we'll argue, this is simply an instance of the more general idea of bounded, parametric polymorphism. Regardless, we still need a notion of *structural* subtyping to ensure that whatever types N1 and N2 map to, they are subtype compatible. Parametric Polymorphism: ------------------------ Subtype polymorphism is extremely useful but it's also forgetful. Consider: pt2d = {x:int,y:int} pt3d = {x:int,y:int,z:int} bumpx : pt2d -> pt2d If we call bumpx on a pt3d, then we get out a pt2d. In contrast, parametric polymorphism lets us track input/output relationships. And of course, the problem with pure parametric polymorphism is that the types are completely abstract. It's the combination of parametric and subtype polymorphism (known as bounded polymorphism) that really gives us the kind of expressfulness that we want. But before we get there, let's examine parametric polymorphism more carefully. We'll start by augmenting our language as follows: t ::= 'a | t -> t | All 'a.t v ::= \x:t.e | /\'a.e e ::= x | v | e1 e2 | e t with the added evaluation rules: e -> e' ----------- e t -> e' t (/\'a.e) t -> e{t/'a} This is the polymorphic lambda calculus or F2 and was co-discovered by Girard (a logician) and Reynolds (a computer scientist.) We're going to modify our typing judgment for F2 to be of the form: D;G |- e : t where D is a set of type-variables (those in scope) and G is a set of value variables along with their types. Additionally, we're going to add a judgment: D |- t : * meaning t is a well-formed type under D. The rules are as follows: D |- 'a : * ('a in D) D |- t1:* D |- t2:* -------------------- D |- t1 -> t2 : * D+{'a} |- t : * ----------------- D |- All 'a.t : * The rules for terms look like this: D;G |- x : G(x) D |- t : * D;G,x:t |- e : t' -------------------------------- D;G |- \x:t.e : t->t' D;G |- e : t'->t D;G |- e' : t' ---------------------------------- D;G |- e e' : t D+{'a};G |- e : t ------------------------ D;G |- /\'a.e : All 'a.t D;G |- e : All 'a.t D |- t' : * ---------------------------------- D;G |- e t' : t{t'/'a} They're surprisingly simple -- in essence, /\'a.e lets us parameterize a function by a type in the same way that \x:t.e lets us parameterize a function by a value. The term "e t" lets us call a type-abstraction, passing t as the actual type argument. The key thing is that when we do call a type-abstraction, the type becomes *specialized* to the argument type. It's also important to note that there are no operations on values of variable type. That is, they are absract. Let's think about how we might encode some higher-language features using the facilities of F2. First, we know that we can encode let: let x:t = e1 in e2 ==> (\x:t.e2) e1 Next, I claim that we can encode pairs t1 * t2. We want some encoding T[a * b] such that: fst : All a,b.T[a * b] -> a snd : All a,b.T[a * b] -> b pair : All a,b.a -> b -> T[a * b] The trick is to make the constructor for pairs incorporate the elimination forms. In particular, let us define: T[a * b] = All c.(a->b->c) -> c The idea is that when we want to use a pair, we want to get at its two components to produce some result. Now we can define fst = /\a,b.\p.p b (\x.\y.x) snd = /\a,b.\p.p a (\x.\y.y) pair = /\a,b.\x.\y./\c.\f.f x y and you can check for yourself that these are type-correct and furthermore: fst a b (pair a b v1 v2) ->* v1 snd a b (pair a b v1 v2) ->* v2 We can also encode sums. How do we use a sum? We pattern match on it. So we want to take: T[a + b] = All c.(a -> c) -> (b -> c) -> c That is, a case expression takes one branch for the a->c case, and another branch for the b->c case, and depending upon whether we have an a or b, calls the right branch to generate the c. So now we can define: inl : All a,b.a -> T[a + b] inr : All a,b.b -> T[a + b] case : All a,b,c.T[a + b] -> (a -> c) -> (b -> c) -> c as follows: case = /\a,b,c.\s.\f.\g.s c f g inl = /\a,b.\x./\c.\f.\g.f x inr = /\a,b.\y./\c.\f.\g.g y and you can confirm that these have the right types and: case a b c (inl a b v) f g ->* f v case a b c (inr a b v) f g ->* g v Of course, if you've got sums, then you've got booleans and if-then-else. What shall we take for unit and zero? How about: T[unit] = All a.a->a T[zero] = All a.a It turns out that semantically, there's only one (math) function that occupies the type All a.a->a: the identity function. Or put another way, every (closed) F2 function of type All a.a->a can be normalized to the identity function. And, mathematically, the type All a.a is empty. That is, there is no normal form that has the type All a.a. We can also define natural numbers. The thing you use a natural number for is to loop. So let us define: T[nat] = All a.(a -> a) -> a -> a The idea is that when we take a natural n, and apply it to a function f, we'll get back a function equivalent to n iterations of f. 0 = /\a.\f.\x.x n = /\a.\f.\x.f(f(...(f x)...)) Now we can define inc : T[nat] -> T[nat] as follows: inc = \n./\a.\f.\x.f(n a f x) Once we have inc, it's easy to define plus: plus : T[nat] -> T[nat] -> T[nat] plus = \n.\m.n _ inc m (Exercise: fill in the "_" with the right type.) So plus just increments m n times. Once you have plus, you can define times and so on. Fun Exercises: try to define comparison n < m and then define subtraction n - m so that when m < n, we get n-m but when m > n, we get zero. It turns out that naturals are a special case of the more general inductive type of lists: T[list a] = All c.(a -> c -> c) -> c -> c The type is the type of a foldr for lists of a elements. Exercise: define cons, foldr, and map for lists. In general, any inductive datatype can be coded up using this style of encoding for F2. The fact that we can iterate using naturals, and do things like fold/map, etc. makes it seem like we might have lost control of this language. But in fact, just like the simply-typed lambda calculus, every F2 program terminates and every F2 program has a unique normal form (and thus a sound equational theory.) Nonetheless, it's a very powerful programming language and one that deserves a lot of study because it is surprisingly useful. Nonetheless, we *do* lose something when we move from F1 to F2. In particular, there is no simple set-theoretic semantics for F2. That is, we can't just interpret types as sets of values. Next time, we'll see why.