THE EFFECTS OF AGE AND FRAGMENTATION ON FILE SYSTEM PERFORMANCE
Recent implementations of the UNIX Fast File System [McKusick84] have used
clustering to improve file system throughput.
Studies have shown that on an empty file system, these clustering
enhancements provide two-three times improvement in sequential I/O
performance [Seltzer93][McVoy91].
Unfortunately, most real-world file systems are not empty, and are
subject to fragmentation, which decreases the opportunities for file
clustering.
Other researchers have suggested that after a year or more of active use,
UNIX file systems become so fragmented that clustering provides little,
if any, performance improvement.
The goals have this work are to understand the effects of fragmentation
on UNIX file system performance, and to analyze the causes of this
fragmentation in the hopes of discovering better algorithms for
disk allocation and file layout.
In order to study the impact of fragmentation on UNIX file system
performance, I have collected a series of snapshots from
the file servers used by the
Harvard University
Division of Applied Sciences.
Each snapshot is a summary of a file system's meta-data, including a
list of the blocks allocated to each file and a map of the file system's
free blocks.
The complete data set includes nightly snapshots of 48 file systems over
a period of a year.
Analyzing the snapshot data uncovered several trends in the on-disk
layout of FFS:
- Small files suffer from more fragmentation than large files do.
Fewer than 35% of the blocks in two block files are allocated contiguously.
In contrast, more than 80% of the blocks in files larger than 32 blocks
are contiguously allocated.
- A major cause of fragmentation is small files is the policy
that FFS uses when allocating fragments. (A fragment is a partial block
at the end of a file).
Fragments are seldom contigous with the previous block of the file
because the location of the previous block is not considered when
allocating a fragment to a file.
- Free space is unevenly distributed within FFS cylinder groups.
Most of the free space in a cylinder group is located toward the end of
the cylinder group. This free space is better clustered than the
free space at the beginning of the cylinder group.
- The file systems that serve usenet news articles suffer from
extreme fragmentation.
- On all of the file systems studied, except for the two news
partitions, at least 60% of the file data blocks were laid out
contiguously.
I also used the snapshot data to create facsimiles of the the source file
systems on the disk of a test machine.
I used a simple benchmark, which created and read 32 MB of files, to
compare the performance of the different replicated file systems.
After each run of the benchmark program, I took a snapshot of the test
file system to determine the fragmentation of the files created by the
benchmark. Some of my findings were:
- File system performance is closely correlated to the fragmentation
of the files that are being read and written on that file system.
- The amount of fragmentation in existing files on a file system
is not a good predictor for the fragmentation of newly created files.
- File fragmentation caused performance degradation of up to 30% (on
one of the news file systems). The greatest performance decrease on
any of the non-news file systems was 15%.
The sources, tools, and data used for
this research are also available on-line.
References
- Keith A. Smith and Margo Selzter, File
Layout and File System Performance. Harvard Computer Science Technical
Report TR-35-94.
This paper provides a complete description of the research described
on this page.
- Seltzer, M., Smith, K., Balakrishnan, H., Chang, J., McMains, S.,
Padmanabhan, V. File System Logging versus
Clustering: A Performance Comparison. Proceedings of the 1995 Usenix
Technical Conference.
This paper uses the results presented here as part of a comparison
between benefits of clustering (as used by FFS) and log-structure
(as used by the Log-Structured File System).
Background Reading
- Marshall K. McKusick, William N. Joy, Samuel J. Leffler, and
Robert S. Fabry,
"A Fast File System for UNIX,"
ACM Transactions on Computer Systems
Vol. 2, No. 3, August, 1984,, pp. 181-197.
This is the original paper describing the design and implementation
of the Berkeley Fast File System (FFS).
- L.W. McVoy and S.R. Kleiman,
Extent-like Performance from a UNIX File System,
Proceedings of the 1991 Winter Usenix Conference,
Dallas, TX, January 1991, pp. 33-44.
This paper describes the original implementation of clustering
under the SunOS version of FFS.
- Margo Seltzer, Keith Bostic, Marshall Kirk McKusick, and Carl Staelin,
An Implementation of a Log-Structured File System for UNIX,
Proceedings of the 1993 Winter Usenix Conference,
San Diego, CA, January 1993, pp. 307-326.
In addition to describing the 4.4BSD implementation of LFS,
this paper also discusses the design and performance of clustering
under BSD FFS.
Keith A. Smith /
keith@eecs.harvard.edu