cs252r : Advanced Functional Programming - Fall 2006
These pages are a record of the in-class discussions for the graduate class "Advanced Functional Programming" given at Harvard University in the Fall of 2006.November 27, 2006
- The essence of compiling with continuations. Cormac Flanagan and Amr Sabry and Bruce F. Duba and Matthias Felleisen.
I know some of you have been having difficulty with your projects, so I thought last night I'd sit down and tackle an A-normalizer for the language we discussed in class Monday. What follows are some notes that may help you.
If you recall, in class we had something like the following:
data Val = Unit | Pair Val Val | Inl Val | Inr Val | Var Name
data ANF = Return Name
| Consume Name
| Send Name Val ANF
| Send2 Name Val Name Val ANF
| Split Name Name Name ANF
| Use1 Name Name Name ANF
| Use2 Name Name Name Name ANF
| Case Name (Name, ANF) (Name, ANF)
For reasons that may become clear below I've elected to call this type ANF rather than the M we used in class. Also, because I haven't implemented any support for typing, I used an untyped form here. You would be wise to use a typed form, where the alternatives in the case construct are suitable labelled, e.g., Case Name (Name, Ty, ANF) (Name, Ty ANF).
As noted, a Name corresponds to a wire and an ANF (the new M form) corresponds to a circuit. The Case construct is unique in that it connects to two, distinct downstream circuits, and on any execution, at most one of those circuits is activated. Both the Split and the Send2 have multiple output wires but those wires feed into a single circuit.
Now by itself this datatype is useless, because any value of this type is already A-normalized. (I will continue to talk about A-normalization and A-normal form although this is obviously a different language.) So you also need a source language. What does it need?
- You don't need lambda, because that's been defunctionalized away
-
Now because I am an old dog and have learned a trick or two, I observe that
the language above already contains Send, which is equivalent to let binding,
and it also contains Case. They just don't have the right types. So I used a
well-known trick: instead of defining a recursive datatype in the normal way,
I used open recursion. We'll discuss this tomorrow in the context of the
day's reading. Suffice it to say I defined a new type M, which is like ANF
but also includes:
- App1 Name M,
- App2 Name M M,
- Val Val
normalize :: M -> (ANF -> S ANF) -> S ANF nval :: M -> (Val -> S ANF) -> S ANF nname :: M -> (Name -> S ANF) -> S ANF deval :: Val -> (Val -> S ANF) -> S ANF
As you might guess, when the context above requires an ANF, you call normalize, when it requires a Val, you call nval, and when the context requires a name (wire), you call nname.
'deval' is a bonus function that establishes an additional invariant that is not captured in the static type definitions above: it ensures that any Val passed to a continuation contains at most two free variables. If you construct the contexts correctly, you can then guarantee that each box [== M-form == ANF constructor] contains at most two free variables, which can then be assigned to the north and west edges of the box on a later pass.
There are actually several ways to deal with this additional invariant:
- Be very stupid about Val: guarantee each Val has at most one free variable by aggressively A-normalizing all the Val forms.
- Be moderately clever and name only elements of pairs, ensuring at most two free variables per Val.
- Be extremely clever and count free variables, making values as large as possible without exceeding two free vars per box.
I chose the middle path. To minimize the number of boxes it probably makes sense to do either of the first two algorithms and then inline as many let-bound values as possible without violating the invariant of at most two free variables per box.
Note that it's perfectly safe to include the same variable multiple times in a single box, so it probably makes sense to linearize the code *after* A-normalizing it. Since the static type system enforces the normal form (including no merge after case), the Haskell compiler won't let you screw it up.
A few further notes:
- Watch out for infinite loops when A-normalizing values or variables; the rest of the code is pretty easy.
- Unlike the example of if0 in Figure 9 in the paper, you *must* duplicate the context of the case term. In other words, your code should be consistent with Figure 6, not with the output of Figure 9. (If you can't tell the difference, run the code I handed out and try some examples using if0.)
- A key thing to keep an eye on is to notice exactly which forms might have a continuation applied to them; understanding this invariant is critical to defining your nname function.
- My normalizer is about 100 lines of code including type definitions (but not including a big pile of Show instances). Writing the whole thing took about four hours, including several false starts. The most difficult part was fixing the type errors introduced by my use of open recursion and explicit fixed points in the type system. But I think the flexibility of the resulting target language is worth it.
BibTeX
@InProceedings{Cormac-Flanagan-and-Amr-Sabry-and-Bruce-F.-Duba-and-Matthias-Felleisen1993 , isbn = "0-89791-598-4" , author = "Cormac Flanagan and Amr Sabry and Bruce F. Duba and Matthias Felleisen" , year = 1993 , publisher = "ACM Press" , title = "The essence of compiling with continuations" , address = "New York, NY, USA" , location = "Albuquerque, New Mexico, United States" , url = "http://doi.acm.org/10.1145/155090.155113" , pages = "237--247" , booktitle = "Conference on Programming language design and implementation" }