Tracking Back References in a Write-Anywhere File System
Peter Macko, Margo Seltzer, Keith A. Smith
Many file systems reorganize data on disk, for example to
defragment storage, shrink volumes, or migrate data between
different classes of storage. Advanced file system
features such as snapshots, writable clones, and deduplication make
these tasks complicated, as moving a single
block may require finding and updating dozens, or even
hundreds, of pointers to it.
We present Backlog, an efficient implementation of explicit back references, to address this problem. Back references are file system meta-data that map physical block numbers to the data objects that use them. We show that by using LSM-Trees and exploiting the write-anywhere behavior of modern file systems such as NetApp R WAFL R or btrfs, we can maintain back reference meta-data with minimal overhead (one extra disk I/O per 102 block operations) and provide excel- lent query performance for the common case of queries covering ranges of physically adjacent blocks.