Provenance (also known as pedigree or lineage) refers to the
complete history of a document. In the scientific
community, provenance refers to the information that describes data in
sufficient detail to facilitate reproduction and enable validation of
results. In the archival community, provenance refers to the chain of
ownership and the transformations a document has undergone. However,
in most computer systems today, provenance is an after-thought,
implemented as an auxiliary indexing structure parallel to the actual
data.
Provenance, however, is merely a particular type of meta-data.
The operating system should be responsible for the collection of
provenance and the storage system should be responsible for its
management. We define a new class of storage system,
called a provenance-aware storage system (PASS), that supports the
automatic collection and maintenance of provenance. A PASS collects
provenance as new objects are created in the system and maintains that
provenance just as it maintains conventional file system meta-data. A
PASS, in addition to collecting and maintaining provenance, also
supports queries upon the provenance.
We have implemented 2 PASS prototypes. The latest prototype (v2)
has been implemented on Linux 2.6.23.17. The new features in v2 are:
- In addition to provenance collected by the system, applications can record
application specific provenance.
- v2 architecture is Network enabled and we can collect provenance across NFS.
- We also have designed and built a recovery scheme (that we call Write Ahead Provenance).
- We have designed a new query language for provenance (called PQL, see our IPAW 08 paper and poster).
- We are working on a provenance security model (we plan to publish a paper on this topic soon!)
The v1 prototype was implemented on Linux 2.4.29. The v1 prototype
recorded relevant system activity and stored it persistently in an
in-kernel database.
If you are interested in running a version of our system, please
send email.
Version 0.4.1 of the PQL query engine has been
released.
Thank you to IBM & NetApp who have made the workshops possible.
- Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer.
Provenance for the cloud. (PDF)
8th USENIX Conference on File and Storage Technologies (FAST '10), February 2010.
- Kiran-Kumar Muniswamy-Reddy and Margo Seltzer.
Provenance as First-Class Cloud Data. (PDF)
3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS'09), October 2009.
- Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.
Layering in Provenance Systems.
(PDF)
In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009.
- Dan Margo and Margo Seltzer
The Case for Browser Provenance. (PDF)
1st Workshop on the Theory and Practice of Provenance (TaPP'09), February 2009.
- Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer
Making a Cloud Provenance-Aware.
(PDF)
1st Workshop on the Theory and Practice of Provenance (TaPP'09), February 2009.
- Kiran-Kumar Muniswamy-Reddy and David A. Holland
Causality Based Versioning.
(PDF)
7th USENIX Conference on File and Storage Technologies (FAST '09), February 2009.
Selected as a top paper and forwarded to Nov. 2009 issue on Transactions on Storage (TOS).
- Kiran-Kumar Muniswamy-Reddy, Joseph Barillari, Uri Braun,
David A. Holland, Diana Maclean, Margo Seltzer, and Stephen D. Holland.
Layering in Provenance-Aware Storage Systems.
(PDF)
Harvard University Computer Science Technical Report TR-04-08.
- Uri Braun, Avraham Shinnar, and Margo Seltzer.
Securing Provenance.,
(PDF,
HTML)
In Proceedings of the 3rd USENIX Workshop on Hot Topics in Security (HotSec), San Jose, CA, July 2008.
- David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
Choosing a Data Model and Query Language for Provenance.
(PDF)
In proceedings of the 2nd International Provenance and Annotation Workshop, Salt Lake City, UT, Jun 2008.
- David A. Holland, Margo I. Seltzer, Uri Braun, and Kiran-Kumar Muniswamy-Reddy.
PASSing the provenance challenge.
(PDF)
In Concurrency and Control: Practice and Experience: 2008;20:531-540.
- Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer.
Provenance-Aware Storage Systems.
(PDF)
In proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, June 2006.
- Uri Braun, Simson Garfinkel, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Margo Seltzer.
Issues in Automatic Provenance Collection.
(PDF)
In proceedings of the 2006 International Provenance and Annotation Workshop, Chicago, IL, May 2006.
- Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
Provenance-Aware Storage Systems.
(PDF)
Harvard University Computer Science Technical Report TR-18-05, July 2005
- Jonathan Ledlie, Chaki Ng, David A. Holland, Kiran-Kumar
Muniswamy-Reddy, Uri Braun, and Margo Seltzer.
Provenance-Aware Sensor Data Storage.
(PDF,
HTML)
In Proceedings of NetDB 2005, Tokyo, Japan, April 2005.
- David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
Choosing a Data Model and Query Language for Provenance.
(pdf)
2nd International Provenance and Annotation Workshop (IPAW'08), June 2008.
- Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
Provenance-Aware Storage Systems.
(pdf)
Harvard Industrial Partnership (HIP) 2005, October 2005.
- Source of Support: NSF
CSR-PDOS (CNS-0614784)
Proposal Number: 0614784
Title: CSR---PDOS: Support for Atomic Sequences of File System Operations
Funded amount: $561,727
Period: 09/01/06 -- 08/31/09
Location: SUNY at Stony Brook
SubContract: Harvard University
Months: Cal:0; Acad:0; Sumr:1(?)