Implementation II - Extended Discussion

Implementation session was chaired by Peter Druschel from Rice University and John Carter from University of Utah. It featured four presentations. Jay Lepreau from University of Utah presented the Flux OS Toolkit which provides a low-level OS infrastructure that allows OS code reuse and facilitates OS implementation. Patrick Tullmann described the use of formal methods to verify properties of the implementation of the IPC subsystem in the Fluke kernel.


Extended Discussion

The first speaker was Jay Lepreau(picture below) from University of Utah. The title of his talk was "The Flux OS Toolkit: Reusable Components for OS Implementation".

Jay started with the size of the Flux OS Toolkit. It's about 250,000 lines of code, of which they wrote about 15,000 (the rest they steal). There are basically three kinds of components. The first is low-level kernel support library in boot-strap that sets up the PC and the segment tables, a minimal C library, etc. This provides a subset of POSIX envrionment for applications that want to be kernel and allows low-level kernel code re-use. Another class of components are device drivers. Besides Linux and FreeBSD drivers, there is also a subset of COM interface (which they steal from Microsoft - good ideas come from everywhere). A third type of component is the high-level components such as networking and file system. The idea is that we provide the body parts, you provide the head and the brain.

The fundamental ideas behind the toolkit is that there is no structure and no abstraction - it takes exo-kernel's idea further. However, it provides a framework that has naming (COM), popular softwares (gnu) and even 300 hundred pages of documentation (which is largely blank).

Jay then gave a "hello world" kernel example. It's only about 5 lines of code. It boots and prints the message on the console. A more complicated example was shown which use device drivers and the file system. Function calls such as fdev_init(), fdev_linux_initall() etc. initialize the device drivers. fs_netbsd_init() initialize the NetBSD file system.

Jay then went on describing the design issues. The toolkit is designed to be usable either as a whole or in arbitrary subsets. You don't have to use the components if you don't want to. Also the OS Kit currently runs on x86 PCs, It is designed to be portable to other architectures.

There were quite a few success stories: A graduate stuent who didn't know anything abut OS and a part-time undergraduate programmer took over a month to get ML booting. SR took a couple of weeks, x-kernel took only one day, etc.

Finally Jay concluded his talk by quoting Gabriel's paper which argues that implementation simplicity is more important than interface perfection, or in other words, "Worse is Better". Jay and his collegues believe the OS toolkit idea is the essence of "worse is better". The OS Kit was evolved from 5 years of OS research. Its surprising popularity shows that pragmatic and client-driven approach to reuse software component is the right approach.

Discussion:


(In Jay's words) Jay has told us why "worse is better", now Patrick, the next speaker also from the University of Utah is going to tell us why "better is better". The title of his talk is "Formal Methods: A Practical Tool For OS Implementors". The idea is to bring mathematics and strong consistent logic into design of engineering to make sure the system is running inthe right direction.

Patrick started by pointing out that formal methods are very relevant to OS becuase operating systems are designed to be concurrent and non-deterministic. At the same time, they must be highly reliable and robust. Formal methods were designed to attack problems of complex systems that need to be done reliably. Patrick attributed the lack of formally verified OSs to three common believes: First, the tools that the formal methods community has come up with are complicated and inaccessible; Second, thre would be a gap between the abstract model to be verified and the actual implementation; Third, the OSs are just too big to be modeled this way.

Patrick then gave a brief overview of formal methods. There are two classes of formal method tools: theorem provers and model checkers. Theorem provers operate at a high level that take axioms, pre-conditions and generate proves. The disadvantage is that they require skilled intervention. Model checkers are more accessible tools. You provide a model, the model checker then tries everything the model can do and reports deadlocks if it finds any. The model checker they used is SPIN from New Jersy.

Their target is the IPC subsystem, a large (about a third to a quarter of the kernel source) and important part of the kernel. The IPC subsystem is also quite complicated. It supports multiple fine-grain locks and roll-back operations. It's also been optimized for speed a number of times. The goal is to get concurrency analysis of the system, e.g. finding dead-lock and live-lock issues, and apply these results to an SMP or a pre-emptive implementation of the kernel. Their approach is to manually translate the C code into Promela, the input specification for the SPIN model checker. They were able to find a couple of bugs in the IPC subsystem, one deadlock and one live lock.

Patrick then briefly described some issues relevant to using formal methods in operating systems. First is the scale of validated systems. They currently cannot verify the entire operating systems yet. This leads to the open issue of composition of separately validated sub-systems. The other issue is maintenance of the model with the implementation since the model is translated from the language the system is implemented in. Some methods to automatically generate both versions should be used in the future. The next issue is the cost vs. traditional methods and finally expoiting code structure that hinders effort of translation.

In conclusion, the formal methods are accessable (to naive grad students). The tools provided by the formal methods community are adequate. Significant components of an OS can be verified and checked. These results can apply directly to the implementation of the OS, not just the model you intend to implement.

Discussion:


Panel Discussion