CS252r: Advanced Functional Language Compilation
Fall 2012: Maxwell Dworkin 323, Mon-Wed-Fri 3-4pm
- Homework three is posted.
- Piazza class discussion site.
- code.seas git repository
- Coq Proof Assistant
- Proof General (emacs mode for Coq)
Background Reading and Resources
- Pierce et al's Software Foundations
Continuation Passing Style
- A.Kennedy's Compiling with Continuations, Continued
- A.Appel's Compiling with Continuations
- G.Steele's RABBIT: A Compiler for SCHEME
- D.Kranz et al's ORBIT: An Optimizing Compiler for Scheme
- A.Appel's and T.Jim's Shrinking Lambda Expressions in Linear Time.
- Appel and Jim's Continuation-Passing, Closure-Passing Style
- Matt Might's Closure Conversion: How to compile lambda
- Steckler and Wand's Lightweight Closure Conversion
- John Hannan's Type Systems for Closure Conversions
- Morrisett and Harper's Typed Closure Conversion for Recursively-Defined Functions
- Danvy and Nielsen's Defunctionalization at Work
- Appel's A Runtime System
- Appel's Garbage Collection Can Be Faster Than Stack Allocation
- Diwan, Tarditi, and Moss's Memory subsystem performance of programs using copying garbage collection
- Paul Wilson's Uniprocessor Garbage Collection Techniques
- Richard Jones's Garbage Collection Page
- Jones, Hoskin, and Moss's Garbage Collection Handbook
- Van Horn and Might's Abstracting Abstract Machines.
- Shivers' Control-Flow Analysis in Scheme.
- Shivers' Super-Beta: Copy, Constant, and Lambda Propagation in Scheme
- Midtgaard and Jensen's A Calculational Approach to Control-Flow Analysis by Abstract Interpretation
- Serrano's Inline expansion: when and how?
- Peyton Jones and Marlow's Secrets of the Glasgow Haskell Compiler Inliner
- Wadler's Deforestation: Transforming programs to eliminate trees
- Gill et al's A short cut to deforestation
- Chitil's Type Inference Builds a Short Cut to Deforestation
- Takano and Meijer's Shortcut deforestation in calculational form
- Gill's dissertation Cheap Deforestation for Non-strict Functional Languages
- Johann's A Generalization of Short-Cut Fusion and Its Correctness Proof
- Ghani et all's Monadic Augment and Generalised Short Cut Fusion
- Turchin's The concept of a supercompiler
- P.Jonsson's dissertation, Time and Size-Efficient Super-Optimization
This semester's version of CS252r will be a group-oriented, project course aimed at building a compiler for a functional language. In particular, the class as a whole will be building an optimizing compiler for the Coq Proof Assistant.
As a development environment, Coq lets you write functional programs with very rich, dependent types, and to write formal, machine-checked proofs about those programs. Furthermore, Coq lets you extract the functional bits into either Haskell, Scheme, or OCaml code that can then be executed.
Unfortunately, the quality of the extracted code is not that good, and existing compilers do a poor job of getting rid of the inefficiencies. In part, this is because extracted Coq code has some idioms that do not arise in hand-written Haskell, Scheme, or OCaml code, and in part, because the compilers for these languages lack deep knowledge about the semantics of the code. For instance, because all Coq functions are effect free and terminate, we can evaluate the function using call-by-value, call-by-name, or call-by-need, whereas Haskell, Scheme, and OCaml must pick only one of these evaluation strategies.
Along the way, students will learn the basics of how to build a functional language compiler (e.g., CPS and closure conversion), as well as key topics in program analysis and optimization. We are going to develop the compiler as a group, so students will also gain practical team skills, such as how to perform code reviews.
We will also be reading a number of classic functional langugage compiler papers on topics ranging from intermediate representations to run-time systems.
We will develop the compiler in Coq, though we will not attempt to prove anything about Coq. Writing code in Coq is similar to writing in OCaml, so most students will have no trouble adapting, as long as they have experience with some ML dialect. Ideally, students should've taken CS153 (Compilers), but exceptions will be entertained at the discretion of the instructor.
Format and Assessment
Programming tasks for the compiler will be categorized into one of two tiers. Tier 1 tasks must be implemented individually by each student, whereas Tier 2 tasks will be done by a group of students. Each student will work on at least two or three Tier 2 tasks.
Tier 1 tasks include the following:
- call-by-value CPS conversion,
- closure conversion,
- basic optimizations such as constant folding and function inlining,
- translation to LLVM intermediate code,
- a simple copying garbage collector.
Tier 2 tasks include at least the following:
- call-by-name and call-by-need CPS conversion,
- control-flow analysis,
- selective closure conversion,
- aggressive inlining heuristics,
- interprocedural dead code elimination,
- strictness analysis and transformations,
- representation changes, and
- a generational garbage collector