A few notes on implementing polymorphism:
We've discussed the semantics of polymorphic languages such as
System-F, but not the implementation issues. There are about
5 ways to handle polymorphism which I'll briefly discuss below:
(1) Monomorphize the code (C++, MLton, ...): In languages such
as SML, the type system forces a prenex restriction on quantifiers:
monotypes: t ::= a | t1 -> t2 | t1 * t2 | ...
polytypes: s ::= t | All a.s
which effectively makes the polymorphic definitions second-class
(i.e., note that functions can't take/return a polytype, and
that products only range over monotypes, etc.) Another way to
see this is that the only polymorphic definitions are introduced
by "let" expressions. We can simply inline/duplicate the
let declaration everywhere it is used and do the reductions
of the type applications at compile time:
let x = /\a1,...,an.e1 in e -> e[e1/x]
(/\a1,...,an.e) t1 ... tn -> e[ti/ai]
Of course, we risk blowing up the code (because we're duplicating it).
In practice, this can be avoided by doing CSE on the type applications
before doing the inlining/duplication and by specializing the
code once for each unique type instantiation, so this is not a
real problem in practice.
The key advantage of monomorphizing the code is that we can compile
the resulting language like Pascal -- in particular, we need not
pick a uniform representation for values. For instance, floats can
be represented as 64-bit values and passed in fp registers, while ints
can be represented as 32-bit values and passed in gp registers. If we
pass a float to "the" identity function, then we'll really end up passing
it to a specialized version of the identity function which expects
its argument in an fp register, and if we pass an int to "the" identity
function then we'll really pass it to a specialized version that
accepts the argument in a gp register.
The key disadvantage of monomorphizing code is that destroys separate
compilation. In particular, it's impossible to compile a polymorphic
library (say, the list library) separate from the clients. Another
way to say this is that we're forced to expose the implementation in
the interfaces (as in C++ and Ada) so that if the implementation
changes, we may have to recompile the clients. In practice, this
hurts if you change something very deep in the library dependency graph.
Note that one option that can minimize these costs is to cache the
instantiations of polymorphic code so that we don't necessarily have
to generate specilized code for each client. For instance, we might
specialize the list library to integers, floats, and (say) pointers
eagerly and then a client wouldn't have to worry about paying the
cost of generating & compiling these specializations.
Another disadvantage of monomorphization is that we're really forced
to have 2nd-class polymorphism which I believe is way too limiting.
For instance, it precludes things like existentials, GADTs, and data
structures like you'll find in Okasaki's book that demand support for
polymorphic recursion.
(2) Monomorphize @ JIT time (C#): To avoid the problems mentioned
above, an alternative is to use a JIT compiler to generate the
specialized code at run-time. Again, caching is important to avoid
generating the code over and over again. The key advantages of this
approach are that (a) we get all of the wins of static monomorphization
in terms of native representations, (b) we are no longer limited to
2nd class polymorphic functions, and (c) we can support the illusion
of separate compilation. The downside is that we don't really get
separate compilation, but rather just delay when we do the compilation
and that latency may be of concern. Another downside is that it's
complicated and only works in relatively "heavyweight" environments
where we can afford to have a JIT around at run-time. Nonetheless,
this is probably the best of the options in terms of raw performance.
(2) Use a uniform representation: (O'Caml, Java, Modula-3):
Consider the identity function: /\a.\x:a.x -- if we ensure that
all values passed to it have the same size and calling convention,
then we can use one piece of code. For instance, we may "box" floats
(i.e., represent them by a pointer to a cell holding the actual float)
to ensure that they can be represented in 32-bits and be passed in
a gp register.
There are obvious disadvantages to boxing in that we pay the price
of constructing the box, and reading out the real value. (There are
subtle disadvantages too, such as the fact that Java does not lift
"equality" on boxed floats in the right way...)
In languages such as Java and Modula-3, there is another disadvantage
in that we are forced to make a distinction at the source level
between the "uniform" types (i.e., reference types) and other
types (e.g., int, float, etc.) Programmers must explicitly coerce
non-uniform values to and from a uniform representation if they
want to pass such values to a polymorphic function.
In languages such as O'Caml, there are no non-uniform types. Rather,
the compiler is responsible for making sure that every type is
unifom. Now at first blush, you might think that we only need to
insert coercions around polymorphic functions, and indeed this is
true, but it's a bit more subtle than you think and doesn't always
work (see below.) So most compilers (including older versions of
SML/NJ) just made *every* value uniform (e.g., a 32-bit gp value.)
That meant that floats were always boxed for instance. The advantage
of this approach is that polymorphic code is fast and programmers
don't have to deal with anything. The disadvantage is that monomorphic
code pays the price...
(3) Coercions (See Leroy, POPL'92): To see the subtlety with
coercions, consider the following attempted translation from a source
language where types are meant to be uniform, to a target language
where they aren't.
Source types: t ::= a | int | float | t1 -> t2 | t1*t2
s ::= All a1,...,an.t
Target types: u ::= a | int | u1 -> u2 | u1*u2 | Float
t ::= u | float | t -> t | t1*t2
s ::= All a1,...,an.t
The intention is that in the target language, Float is a boxed float,
whereas float is an unboxed float. Furthermore, the type variables in
the language range only over uniform types (u). That allows the
compiler to know, for instance, sizeof(a) and regtype(a) when emitting
code.
Let us start our compiler by defining a type translation mapping
source types to target types. The goal of the translation is to
use floats by default and only use Floats when we have to. We'll
assume the target language has primitives:
box : float -> Float
unbox: Float -> float
to mediate the mismatch. The naive translation might look like this:
T[a] = a
T[int] = int
T[float] = float
T[t1 -> t2] = T[t1] -> T[t2]
T[t1 * t2] = T[t1] * T[t2]
T[All a1,...,an.t] = All a1,...,an.T[t]
The problem with this is that when we get to translating polymorphic
instantiation at the term level:
D;G |- e : All a.t D |- t1
------------------------------
D;G |- e t : t[t1/a]
something will break. In particular, consider the identity
function. At source level it has type /\a.\x:a.a so we can
conclude:
D;G |- id : All a.a->a D |- float
-----------------------------------
D;G |- id float : float -> float
Now we see what the problem is -- at the target level, we can't
instantiate a with float because it's not a uniform type. So
that suggests that we need another type translation which produces
a uniform type out of a source-level type:
U[a] = a
U[int] = int
U[float] = Float
U[t1 -> t2] = U[t1] -> U[t2]
U[t1 * t2] = U[t1] * U[t2]
U[All a1,...,an.t] = All a1,...,an.U[t]
Then we could translate instantiation:
E[e t] = E[e] U[t]
For example, E[id float] = E[id] U[float] = E[id] Float. Now
presumably, E[id] : T[All a.a->a] = All a.T[a->a], so we can
conclude that E[id] Float : T[a->a][Float/a] = Float -> Float.
More generally, if D;G |- e : All a1,...,an.t and D |- t1,
then E[e] U[t1] : T[t]([U[t1]/a]).
That seems good except that we want our translation to have
the property that if D;G |- e : t, then D;T[G] |- E[e] : T[t],
and it is *not* the case that T[t]([U[t1]/a]) = T[t[t1/a]].
For instance, at source level, id float : float -> float
so at the target level, it should have type T[float -> float] =
float -> float. But the translation we've used yields
Float -> Float. Oops!
Notice that we could fix this problem by demanding that
D;U[G] |- E[e] : U[t] because U[t](U[t1]/a) = U[t[t1/a]].
That is, the U translation commutes with substitution.
But also note that this would force:
E[3.14] = box 3.14
That is, every float expression would have to be boxed!
The fix, which Leroy noted, is that at polymorphic instantiation,
what we need to do is construct a coerion S of type:
S : T[t]([U[t1]/a]) -> T[t1/a]
Then we can use S to coerce the polymorphic object to the
right thing. For instance, to fix up the identity function
we want E[id float] to yield something like:
\x:float.(unbox(id Float (box float)))
This function has type float->float (i.e., T[float->float]).
It boxes the argument and then unboxes the result. We can
rewrite this as:
(\f:Float->Float.unbox o f o box) (id Float)
and see that S for this case should be (\f:Float->Float.unbox o f o box).
More generally, we can define S indexed by the substitution we're
trying to perform as follows:
S[a][Float/a] = unbox
S[a1][u/a2] = \x:a1.x (u != Float or a1 != a2)
S[int][u/a] = \x:int.x
S[float][u/a] = \x:float.x
S[t1 * t2][u/a] = \p.(S[t1][u/a](#1 p), S[t2][u/a](#2 p))
S[t1 -> t2][u/a] = \f:T[t1->t2].S[t2][u/a] o f o G[t1][u/a]
G[a][Float/a] = box
G[a1][u/a2] = \x:a1.x (u != Float or a1 != a2)
G[int][u/a] = \x:int.x
G[float][u/a] = \x:float.x
G[t1 * t2][u/a] = \p.(G[t1][u/a](#1 p), G[t1][u/a](#2 p))
G[t1 -> t2][u/a] = \f:T[t1->t2].G[t2][u/a] o f o S[t1][u/a]
The "S" stands for specialization and the "G" stands for generalization.
Note that for functions, we need to be able to box arguments (i.e.,
generalize them) and unbox results. More generally, for positive
occurrences of the type variable we're specializing, we need to
use S to unbox, and for negative occurrences, we need to use G to
box. You can check that these two coercions have the properties:
S[T[t]][U[t1]/a] : (T[t][U[t1]/a]) -> T[t1/a]
G[T[t]][U[t1]/a] : T[t1/a] -> (T[t][U[t1]/a])
and that furthermore, S o G = id and G o S = id (i.e., S and G
form an isomorphism.) Given this, we can define the translation
on terms in a straightforward fashion. Technically, we need to
do this in a type-directed fashion (i.e., on the judgements) but
I'll assume we've decorated terms with types as needed:
E[i] = i (integer constants)
E[f] = f (floating-point constants)
E[x] = x
E[\x:t.e] = \x:T[t].E[e]
E[e1 e2] = E[e1] E[e2]
E[(e:All a.t) t1] = (S[T[t]][U[t1]/a] (E[e] U[t1]))
E[/\a.e] = /\a.E[e]
And you can verify that if D;G |- e : t, then D;T[G] |- E[e] : T[t].
The beauty of this translation is that if you don't use polymorphic
values, then you don't pay any boxing/unboxing overhead. Of course,
where we do use a polymorphic function, we'll be paying some overhead
and in all likelyhood, we'll want to inline and beta-reduce the hell
out of those coercions.
Unfortunately, there's a problem with this translation when it comes
to refs, inductive, or recursive types. For instance, consider
what needs to happen to a call to map, when we specializing it
to operate on a list of floating-point values. We end up constructing
a coercion S which maps list(float) to list(Float). Hopefully, a
really good compiler could do the deforestation and push the S/G
coercions into the map itself. But that's hard to do with
separate compilation, and without very aggressive optimization.
To avoid this overhead, Leroy's Gallium compiler forced values in
inductive data structures to be boxed.
For references, the problem is that the S/G coercions want to make
*copies* of the values. But you can't make a copy of the reference
cell because you'll end up destroying the sharing. (Technically,
you could represent the ref by a pair of functions to get/set
the unboxed contents and then coerce those functions, but doing so kills
all of the wins of the unboxed representations.)
A third subtle problem that arose with coercions when implemented
in SML/NJ is that they can pile up. In particular, it's possible
with certain kinds of loops to end up with a G o S wrapped around
a value each time you go around the loop.
Nonetheless, I find this one of the most interesting and somehow
satisfying translations that really demonstrates why type-directed
translation can be such a powerful tool for understanding compilers.
(4) A fourth approach to dealing with polymorphism is called
"intensional type analysis" or "runtime-type dispatch". The
basic idea was laid out in a POPL'95 paper by Harper and Morrisett
and was implemented in the TIL, TIL(T), and SML/NJ compilers.
(Actually, SML/NJ uses a combination of coercions and intensional
type analysis.) The observation is that we could translate
a polymorphic function, such as the identity so that it actually
took a representation of the type as a value at run-time. Then
it could look at this type-representation to decide what to do.
For instance, the identity function might get compiled to something
like this:
/\a.\t:R(a).
if (t == R(float))
\x:float.x
else
\x:a.x
Here, R(a) is a type corresponding to the representation of the
unknown type a. It turns out that you can compile ML-like languages
(really, full, predicative System-F) in this fashion and get all of
the benefits of the coercion-based approach without some of the
drawbacks. In particular, this technique is compatible with refs,
inductive data structures, recursive types, etc.
Of course, it has its own costs in terms of constructing, passing, and
testing representations of types at run-time. But those
representations can be used for other things (e.g., GC information, ad
hoc polymorphic operations such as ML's polymorphic equality, etc.)
I'm not sure this is really a win compared to say, the JIT approach.
It does have the advantage that like the coercion-based approach, you
only pay for polymorphism if you use it. Still, I think that it's
probably not worth the trouble.
(5) Polymorphism as products: Another approach, pioneered by the
Church project, is to really think of a polymorphic value as a
product. For a language without polymorphic recursion, the set of
types at which we can apply a polymorphic value can be
(conservatively) computed at compile time. Then we could build the
specialized versions and put them all in a big tuple. Then
polymorphic instantiation becomes a projection off of that tuple.
(Dually, for existentials, we can use a datatype, and unpacking
corresponds to doing a pattern-match.) I suspect that this technique
could be used to make MLton support 1st class polymorphism (though not
polymorphic recursion.)
In summary, there are a number of approaches for implementing
polymorphism all with different tradeoffs and limitations. The
issues start to multiply when you consider languages such as
C# and the latest versions of Java where you combine parametric
polymorphism with subtyping (and run-time downcasts, reflection,
etc.)