My research bridges the boundary between hardware and software, focusing on improving performance and reducing costs by choosing the best place to perform work and making the best implementation trade-offs. My work to date concentrates on handling branches in modern microprocessors. In static correlated branch prediction (SCBP), I used path profiles to detect correlations between branches, then exploited this correlation by transforming the control flow graph (CFG) of the program. A follow-up paper explained the fundamental differences between SCBP and prior dynamic branch prediction schemes, suggesting ways to improve both hardware and software techniques. Later papers showed that SCBP improves performance on modern processors, and that it is important to measure full-system traces when evaluating hardware prediction schemes.
My thesis expands on the idea of path profiling. It shows how to collect path profiles efficiently and uses path profiling to improve other optimizations such as global instruction scheduling and loop unrolling. Making modern uniprocessors run efficiently requires a steady stream of useful instructions. Path profiles improve optimizations that produce linear control flow, improving the performance of these deeply pipelined, wide-issue machines.
I have also been working on optimal code layout. Good code layout reduces both branch penalties and cache miss penalties. By reducing the code layout problem to a Directed Traveling Salesman Problem (DTSP), we can reuse DTSP solver and lower bound machinery. Lower bounds tell us the best that any code layout can possibly perform. Solvers give us layouts that come close or meet those bounds. We have found a precise reduction for code layout applied to branch penalties (sometimes called branch alignment); recently-developed solvers can produce optimal layouts for real programs.
I have built a number of pieces of infrastructure for the SUIF compiler system. I helped to design Machine SUIF, extensions that support machine-specific compilation and optimization. In Machine SUIF, I built the CFG library, the HALT instrumentation and profiling tool, and the port to the PowerPC architecture.
I have also worked in computer networking, helping to design the CreditNet flow-controlled ATM switch, and building a mobile IP implementation.