Computer Science 253r
Advanced Topics in Programming Language Compilation

Syllabus for Spring 2005

Traditionally, the goals of research on programming language implementation were high performance for software users and high productivity for software developers. Today, security has joined performance and productivity as a goal for language implementers. This year's offering of CS 253r has two themes: (1) the use of program analysis and transformation to optimize performance, and (2) the use of language-implementation techniques to establish and enforce security policies.

Performance

Rapid developments in computer architecture and in programming language design have kept compilation an exciting research area. We'll cover traditional approaches to code optimization, but we'll also show how compilation is changing.

As computers have come to power everything from cell phones to massive search engines, the design of processor hardware has evolved in many ways at once. To make computing machines faster and less power hungry, processor architects have increasingly relied on parallel computation, cached data access, and clever compilers. These trends have made it harder to generate effective machine-language programs. The compiler not only has to know the language of the machine it's targeting, but it also has to model the hardware implementation of that machine in order to create the best code for it.

Trends in modern programming languages are also making code optimization challenging. Liberal use of small functions and methods means that the compiler must work harder to create regions where concentrated optimization yields the most benefit. Web-oriented computing has led to languages like Java and C# in which programs can be extended while they're running. That means that some compilation must occur at run time, which makes the performance of the compiler itself critical.

A sampling of the specific topics we'll cover in this phase of the course are scalar optimization, register allocation, instruction scheduling, code and data layout, and dynamic optimization.

Security

For many modern software applications, the "network is the computer". Information is exchanged within and among such applications in executable form. In the wild and sometimes hostile environment of the Internet, such "mobile code" must be run with the smallest set of permissions possible. Since traditional OS access and network traffic controls are considered too coarse and too static in this domain, programming languages and run-time systems are being specified with the ability to express fine-grained security policies.

In this part of the course, we will read about compilation and run-time technologies used for both high performance and enforcing security policies. Some examples include:

The material is too new to be in textbooks, so we will read key research papers in the area of language- and compiler-based security.

Administrative Stuff

Class: T-Th at 10-11:30am in Maxwell Dworkin 135

Software Lab: Access to software systems will be announced.

Instructor:

Prof. Michael D. Smith
Maxwell Dworkin 329
617-496-5661
Email: smith@eecs.harvard.edu
Office hours: Fridays from 1-3pm, by appointment only.

Teaching Assistant:

Glenn Holloway
Maxwell Dworkin 308
617-495-4117
holloway@eecs.harvard.edu

Course Administrative Assistant:

MaryFran Skovera
Maxwell Dworkin 143
617-496-8360
mskovera@deas.harvard.edu

Prerequisites

A strong background in programming, a solid understanding of data structures and algorithms, and a basic knowledge of compiler design (CS 153).

Familiarity with UNIX, C, C++, and an assembly language is assumed.

Course Requirements

Weekly Readings

Each Thursday's class meeting will be spent discussing papers (typically two) assigned the previous week and posted on this web page. (See Discussion Thursdays below.) By 5pm on the Wednesday before each of these discussion meetings, you're expected to contribute to a wiki page for that week (also linked below). Add your comments about the reading and/or suggest questions that ought be touched on in our class discussion. Feel free to use the wiki page for pre-class discussion, i.e., to annotate or respond to the remarks of other class members.

Contribution to the wiki pages is considered part of class participation.

Homework Assignments

Three or four homework sets will be assigned. The primary purpose of these homework assignments is to re-enforce the concepts discussed in lecture. The homework assignments also provide you with hands-on experience with sophisticated optimization tools. You may elect to use a subset of these tools in your term project. Homework will be submitted electronically, and no credit will be given for late homework.

Examinations

The only exam in this class is a midterm. The midterm will take place in class on Tuesday, March 22, 2005. The midterm is open-book and open-notes (and probably multiple choice). There is no final exam in this course.

Term Project

Instead of a final exam, there is a term project. You can choose either to implement and demonstrate a significant optimization algorithm, or to write a conference-quality paper containing novel ideas for security enhancement.

The project may be done in groups of two. It will be graded both on the success of your design and on the clarity of your presentation and writing. Each student (or pair of students) is responsible for the following items:

The presentation schedule is on the following wiki page: PresentationSchedule

Cooperation

Please feel free to talk with other class members about the homework assignments, but always turn in only your own work (e.g., never copy someone else's code!). The midterm is open book, but it is not open neighbor. It is in your best interest to attend the lectures and to read the assigned papers. For the term project, again feel free to talk with other class members. It is also possible to cooperate with other groups to achieve a larger goal. For these efforts, you may jointly design and share common, low-level infrastructure; however, the individual components must be clearly defined. This sort of cooperation must be approved beforehand by Prof. Smith.

Grading

10% Midterm
20% Class participation
30% Homework assignments
40% Term project

Reading

Required Text (draft chapters available in class)

A. Aho, R. Sethi, J. Ullman, and M. Lam. 21st Century Compilers, Addison-Wesley, Boston, MA, 2005.

The following wiki page is for comments and corrections on the draft text: NewDragonBook

Recommended Supplemental Texts

A. Aho, R. Sethi, and J. Ullman. Compilers Principles, Techniques, and Tools, Addison-Wesley, Reading, MA, 1986.

A. Appel. Modern Compiler Implementation in C, Cambridge University Press, Cambridge, United Kingdom, 1997.

K. Cooper and L. Torczon. Engineering a Compiler, Morgan Kaufmann, San Francisco, CA, 2003.

S. Muchnick. Advanced Compiler Design and Implementation, Morgan Kaufmann, San Francisco, CA, 1997.

Other Readings
The following articles are recommended, but not required, background reading for lectures. Hardcopies will be provided on request. Send email to MaryFran Skovera for each paper that you'd like her to print. Pick up copies in MaryFran's office, MD 143, when they're ready.

L. Almagor, Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven W. Reeves, Devika Subramanian, Linda Torczon, Todd Waterman. Finding Effective Compilation Sequences, Proc. 2004 Conf. on Languages, Compilers, and Tools for Embedded Systems (LCTES'04), June 2004.

Cliff Click and Keith D. Cooper. Combining Analyses, Combining Optimizations, ACM Transactions on Programming Languages and Systems, March 1995, pp. 181-196.

Sorin Lerner, David Grove, and Craig Chambers. Composing Dataflow Analyses and Transformations, Proc. 29th Annual Symposium on Principles of Programming Languages (POPL), January 2002.

Homework Assignments

Homework 1 is an introduction to Machine SUIF. It's due at the start of class on Tuesday, March 1.

Homework 2 familiarizes you with the Machine SUIF data-flow framework. It's due at the start of class on Tuesday, March 15.

Homework 3 asks you to apply the result of data-flow analysis to perform dead-code elimination. It's due at the start of class on Thursday, April 7.

Discussion Thursdays

Reminder: By 5pm of the Wednesday before each discussion Thursday, please post your comments and questions about the papers on the wiki page for that week.

February 10

Gary McGraw, Greg Morrisett. Attacking Malicious Code: A Report to the Infosec Research Council, IEEE Software, September/October 2000 (Vol. 17, No. 5), pp. 33-41.

George Necula, et al. CCured: Type-Safe Retrofitting of Legacy Code, Proceedings of 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02), January 2002, pp. 128-139.

Jeremy Condit, et al. CCured in the Real World, Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation (PLDI'03), June 2003, pp. 232-244.

PDF links and wiki discussion page: WeekOne

February 17

George Necula, et al. CCured Documentation, version 1.3.2, January 2005.

George Necula, et al. Type-Safe Retrofitting of Legacy Code, to appear in ACM Transactions on Programming Languages and Systems, 2005

Document links and wiki discussion page: WeekTwo

February 24

Yichen Xie and Dawson Engler. Using Redundancies to Find Errors, to appear in IEEE Transactions on Software Engineering.

Mihai Christodorescu and Somesh Jha. Static Analysis of Executables to Detect Malicious Patterns, Usenix Security Symposium, August 2003.

Document links and wiki discussion page: WeekThree

March 3

Hao Chen and David Wagner. MOPS: an Infrastructure for Examining Security Properties of Software, Proceedings of the 9th ACM Conference on Computer and Communications Security, November 2002, pp. 235-244.

Dawson Engler and Madanlal Musuvathi. Static Analysis versus Software Model Checking for Bug Finding, Invited paper, Fifth International Conference on Verification, Model Checking and Abstract Interpretation (VMCAI04), January 2004.

Document links and wiki discussion page: WeekFour

March 10

Ulfar Erlingsson and Fred B. Schneider. SASI Enforcement of Security Policies: A Retrospective, Proceedings of the New Security Paradigm Workshop, Ontario Canada, September 1999, pp. 87-95.

David Evans and Andrew Twyman. Flexible Policy-Directed Code Safety, Proceedings of the 1999 IEEE Symposium on Security and Privacy, May 1999, pp. 32-45.

Document links and wiki discussion page: WeekFive

March 17

Dan S. Wallach and Edward W. Felten. Understanding Java Stack Inspection, Proceedings of 1998 IEEE Symposium on Security and Privacy, May 1998, pp. 52-63.

Martin Abadi and Cedric Fournet. Access Control based on Execution History, Network and Distributed System Symposium (NDSS'03), February 2003, pp. 107-121.

Document links and wiki discussion page: WeekSix

April 7

Sudheendra Hangal and Monica S. Lam. Tracking Down Software Bugs Using Automatic Anomaly Detection, Proceedings of the 24th International Conference on Software Engineering, May 2002, pp. 291-301.

Document links and wiki discussion page: WeekSeven

April 14

William R. Bush, Jonathan D. Pincus and David J. Sielaff. A Static Analyzer for Finding Dynamic Programming Errors, Software Practice and Experience, June 2000, pp. 775-802.

Thomas Ball and Sriram K. Rajamani. Automatically Validating Temporal Safety Properties of Interfaces, SPIN 2001, Workshop on Model Checking of Software, LNCS 2057, May 2001, pp. 103-122.

Document links and wiki discussion page: WeekEight

April 21

Galen C. Hunt, James R. Larus. Singularity Technical Report 1: Singularity Design Motivation, Technical report MSR-TR-2004-105, Microsoft Research, Microsoft Corporation, December 17, 2004.

Mark Aiken, Paul Barham, Manuel Fähndrich, Galen Hunt, Orion Hodson, James Larus, Steven Levi, Nick Murphy, Bjarne Steensgaard, David Tarditi, Brian Zill. Uniform Extensibility in Singularity using Software Isolated Processes, Draft submitted for publication, 2005.

Document links and wiki discussion page: WeekNine

April 28

For the final week before project presentations, we'll divide into pairs and play with tools of the sort we've been reading about. Sign up on this wiki page: WeekTen