Recent implementations of the UNIX Fast File System [McKusick84] have used clustering to improve file system throughput. Studies have shown that on an empty file system, these clustering enhancements provide two-three times improvement in sequential I/O performance [Seltzer93][McVoy91]. Unfortunately, most real-world file systems are not empty, and are subject to fragmentation, which decreases the opportunities for file clustering. Other researchers have suggested that after a year or more of active use, UNIX file systems become so fragmented that clustering provides little, if any, performance improvement.

The goals have this work are to understand the effects of fragmentation on UNIX file system performance, and to analyze the causes of this fragmentation in the hopes of discovering better algorithms for disk allocation and file layout.

In order to study the impact of fragmentation on UNIX file system performance, I have collected a series of snapshots from the file servers used by the Harvard University Division of Applied Sciences. Each snapshot is a summary of a file system's meta-data, including a list of the blocks allocated to each file and a map of the file system's free blocks. The complete data set includes nightly snapshots of 48 file systems over a period of a year.

Analyzing the snapshot data uncovered several trends in the on-disk layout of FFS:

I also used the snapshot data to create facsimiles of the the source file systems on the disk of a test machine. I used a simple benchmark, which created and read 32 MB of files, to compare the performance of the different replicated file systems. After each run of the benchmark program, I took a snapshot of the test file system to determine the fragmentation of the files created by the benchmark. Some of my findings were: The sources, tools, and data used for this research are also available on-line.


Background Reading
Keith A. Smith /