## CS252r: Advanced Functional Language Compilation## Fall 2012: Maxwell Dworkin 323, Mon-Wed-Fri 3-4pm## Greg Morrisett |

### Homework 2: Due Friday Oct 5

There are two major tasks for this homework, and one optional task for advanced students:

- Implement Common Sub-Expression Elimination (CSE)
- Implement Closure Conversion for Non-Recursive Functions
- Implement Closure Conversion for Recursive Functions

Common sub-expresson elimination amounts to replacing

let x = e1 in ... let y = e1 in e3with

let x = e1 in ... e3[x/y]That is, if we've already evaluated an expression

`e1`, and bound it to a variable

`x`, then we don't need to evaluate it again and bind it to

`y`, rather, we can just replace all the uses of

`y`with

`x`. This is particularly useful for avoiding reconstructing tuples multiple times, or avoiding projecting something from a record multiple times.

Of course, nothing limits CSE to just primitive computations. We can also perform CSE on function definitions (and even recursive blocks of declarations) but it becomes increasingly expensive in terms of compile time, so most compilers don't do more than CSE on primitive computations. I'll leave this choice up to you.

Closure conversion is the process of making a number of things explicit
that are otherwise implicit. In particular, we make the construction
of environments explicit (as tuples of values), the lookup of a variable
in an environment explicit (as a projection from the environment), and the
construction of closure objects explicit (as pairs consisting of an
environment data structure, and a pointer to a *closed* function).

When we have closure converted an expression, then all of its functions definitions become closed (except for references to other, possibly mutually defined functions.) This allows us to lift the functions out of nested expressions to the top-level, and treat them as if they were like C functions (i.e., no nested scopes.)

To closure convert, a function

let f (x1,..,xn) = e in ...you need to first calculate the set of free variables of the function. These are the variables that are used in

`e`but not defined in

`e`minus the arguments

`x1,...,xn`. In general, the free variables of a function will be a subset of those that are in scope.

Suppose the set of free variables in the function definition
is `{y1,y2,...,ym}`.
Then you need to replace the definition of `f` with
something that looks like this:

let f_code (env,x1,...,xn) = let y1 = #0 env in let y2 = #1 env in ... let ym = #m-1 env in e let f_env = mkTuple(y1,y2...,ym) in let f = mkTuple(f_code, f_env) in ...where

`f_code`,

`env`, and

`f_env`are all fresh variables.

Note that `f_code` no longer has any free variables in it.
Rather, we've made its environment explicit (as an extra argument
to the function) and extract the free variables from the
environment variable at the beginning of the function. Note also
that the value of `f`
is now a pair of a code pointer and environment pointer.

Dually, for each call site

f (v1,..,vn)we need to unpack the closure for the function, extract its environment and code, and invoke the code with the extra environment argument. For instance:

let f_code = #0 f in let f_env = #1 f in f_code(f_env,v1,...,vn)where of course,

`f_code`and

`f_env`are fresh variables.

Closure-converting recursive functions is a bit more difficult because we need the environment to include the closure for each function defined in the let-rec, but we can't build the closures without first building their environment. The way out of this problem is to use a recursive data structure (mutally recursive tuples).

Selective closure conversion is a slight optimization that avoids constructing closures for functions that do not *escape*.
A function escapes if it is used in any other way except for calling. For instance, if we pass a function to another function, it escapes. Or if we place a function in a tuple, then it escapes. In general, we need to build a closure for escaping functions.

But for functions that don't escape, we can avoid building the closure and instead just pass the environment in at the call site. The optimizer will actually take care of doing this for us in simple cases. For instance, once we've closure converted code according to the recepie above, we will have:

let f_code (env,x1,...,xn) = ... in let f_env = (y1,...,ym) in let f = (f_code, f_env) in ... let f_code' = #0 f in let f_env' = #1 f in f_code' (f_env', v1,...,vn)The optimizer should notice that

`f`is bound to a known tuple, and reduce the primitives

`#0 f`and

`#1 f`producing this:

let f_code (env,x1,...,xn) = ... in let f_env = (y1,...,ym) in let f = (f_code, f_env) in ... f_code (f_env, v1,...,vn)At this point, if

`f`does not escape, then the closure object becomes dead and we will optimize to this:

let f_code (env,x1,...,xn) = ... in let f_env = (y1,...,ym) in ... f_code (f_env, v1,...,vn)But if the environment is "small", we could go a bit further and unbox the environment, passing the free values

`y1,....,ym`directly as arguments:

let f_code (y1,...,ym,x1,...,xn) = ... in ... f_code (y1,...,ym,x1,...,xn)Whether and when to unbox the environment like this demands building the rest of the compiler and measuring to find out the tradeoffs.

Finally, for recursive functions, it can help to first eta-expand the recursive functions that escape like this:

let rec f1 xs1 = e1 f2 xs2 = e2 ... fn xsn = en f1' xs1 = f1 xs1 f2' xs2 = f2 xs2 ... fn' xsn = fn xsnand replace all of the escaping uses of

`fi`with

`fi'`. This will ensure that

`f1,...,fn`do not escape (they are now only called) and any direct calls will avoid constructing closures and, if you unbox them, any environments. Only the escaping functions will pay the overhead of closure constructions.