Provenance (also known as pedigree or lineage) refers to the
complete history of a document. In the scientific
community, provenance refers to the information that describes data in
sufficient detail to facilitate reproduction and enable validation of
results. In the archival community, provenance refers to the chain of
ownership and the transformations a document has undergone. However,
in most computer systems today, provenance is an after-thought,
implemented as an auxiliary indexing structure parallel to the actual
data.
Provenance, however, is merely a particular type of meta-data.
The operating system should be responsible for the collection of
provenance and the storage system should be responsible for its
management. We define a new class of storage system,
called a provenance-aware storage system (PASS), that supports the
automatic collection and maintenance of provenance. A PASS collects
provenance as new objects are created in the system and maintains that
provenance just as it maintains conventional file system meta-data. A
PASS, in addition to collecting and maintaining provenance, also
supports queries upon the provenance.
We have implemented 2 PASS prototypes. The latest prototype (v2)
has been implemented on Linux 2.6.23.17. The new features in v2 are:
- In addition to provenance collected by the system, applications can record
application specific provenance.
- v2 architecture is Network enabled and we can collect provenance across NFS.
- We also have designed and built a recovery scheme (that we call Write Ahead Provenance).
- We have designed a new query language for provenance (called PQL, see our IPAW 08 paper and poster).
- We are working on a provenance security model (we plan to publish a paper on this topic soon!)
The v1 prototype was implemented on Linux 2.6.29. The v1 prototype
recorded relevant system activity and stored it persistently in an
in-kernel database.
If you are interested in running a version of our system, please
send email.
The PASS group entered the
First IPAW Provenance Challenge.
Our results are posted on the
PASS page
of the challenge wiki. We also gave a short
talk at the challenge workshop.
The Spring '06 PASS workshop was a fabulous success; thanks to
everyone who participated. The wiki with the notes from the workshop
is being populated and should be fully on line soon.
The wiki from the October '05 workshop got wikispammed to death and is
currently offline, but is being resuscitated.
Thank you to IBM & NetApp who have made the workshops possible.
- Uri Braun, Avraham Shinnar, and Margo Seltzer.
Securing Provenance.,
(PDF,
HTML)
In Proceedings of the 3rd USENIX Workshop on Hot Topics in Security (HotSec), San Jose, CA, July 2008.
- David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
Choosing a Data Model and Query Language for Provenance.
(PDF)
In proceedings of the 2nd International Provenance and Annotation Workshop, Salt Lake City, UT, Jun 2008.
- David A. Holland, Margo I. Seltzer, Uri Braun, and Kiran-Kumar Muniswamy-Reddy.
PASSing the provenance challenge.
(PDF)
In Concurrency and Control: Practice and Experience: 2008;20:531-540.
- Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer.
Provenance-Aware Storage Systems.
(PDF)
In proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, June 2006.
- Uri Braun, Simson Garfinkel, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Margo Seltzer.
Issues in Automatic Provenance Collection.
(PDF)
In proceedings of the 2006 International Provenance and Annotation Workshop, Chicago, IL, May 2006.
- Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
Provenance-Aware Storage Systems.
(PDF)
Harvard University Computer Science Technical Report TR-18-05, July 2005
- Jonathan Ledlie, Chaki Ng, David A. Holland, Kiran-Kumar
Muniswamy-Reddy, Uri Braun, and Margo Seltzer.
Provenance-Aware Sensor Data Storage.
(PDF,
HTML)
In Proceedings of NetDB 2005, Tokyo, Japan, April 2005.
- David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
Choosing a Data Model and Query Language for Provenance.
(pdf)
2nd International Provenance and Annotation Workshop (IPAW'08), June 2008.
- Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
Provenance-Aware Storage Systems.
(pdf)
Harvard Industrial Partnership (HIP) 2005, October 2005.
- Source of Support: NSF
CSR-PDOS (CNS-0614784)
Proposal Number: 0614784
Title: CSR---PDOS: Support for Atomic Sequences of File System Operations
Funded amount: $561,727
Period: 09/01/06 -- 08/31/09
Location: SUNY at Stony Brook
SubContract: Harvard University
Months: Cal:0; Acad:0; Sumr:1(?)