Harvard University
FAS / SEAS / EECS

Syrah Home

Meeting Schedule

Internal Pages

Contact Us
 
 

PASS: Provenance-Aware Storage Systems

Overview | News | Publications | Members | File Systems | PQL | Internal | Workshop

Provenance (also known as pedigree or lineage) refers to the complete history of a document. In the scientific community, provenance refers to the information that describes data in sufficient detail to facilitate reproduction and enable validation of results. In the archival community, provenance refers to the chain of ownership and the transformations a document has undergone. However, in most computer systems today, provenance is an after-thought, implemented as an auxiliary indexing structure parallel to the actual data.

Provenance, however, is merely a particular type of meta-data. The operating system should be responsible for the collection of provenance and the storage system should be responsible for its management. We define a new class of storage system, called a provenance-aware storage system (PASS), that supports the automatic collection and maintenance of provenance. A PASS collects provenance as new objects are created in the system and maintains that provenance just as it maintains conventional file system meta-data. A PASS, in addition to collecting and maintaining provenance, also supports queries upon the provenance.

We have implemented 2 PASS prototypes. The latest prototype (v2) has been implemented on Linux 2.6.23.17. The new features in v2 are:

  • In addition to provenance collected by the system, applications can record application specific provenance.
  • v2 architecture is Network enabled and we can collect provenance across NFS.
  • We also have designed and built a recovery scheme (that we call Write Ahead Provenance).
  • We have designed a new query language for provenance (called PQL, see our IPAW 08 paper and poster).
  • We are working on a provenance security model (we plan to publish a paper on this topic soon!)
The v1 prototype was implemented on Linux 2.4.29. The v1 prototype recorded relevant system activity and stored it persistently in an in-kernel database.

If you are interested in running a version of our system, please send email.

News

Version 0.4.1 of the PQL query engine has been released.

Thank you to IBM & NetApp who have made the workshops possible.

Publications

  • Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer.
    Provenance for the cloud. (PDF)
    8th USENIX Conference on File and Storage Technologies (FAST '10), February 2010.

  • Kiran-Kumar Muniswamy-Reddy and Margo Seltzer.
    Provenance as First-Class Cloud Data. (PDF)
    3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS'09), October 2009.

  • Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.
    Layering in Provenance Systems. (PDF)
    In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009.

  • Dan Margo and Margo Seltzer
    The Case for Browser Provenance. (PDF)
    1st Workshop on the Theory and Practice of Provenance (TaPP'09), February 2009.

  • Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer
    Making a Cloud Provenance-Aware. (PDF)
    1st Workshop on the Theory and Practice of Provenance (TaPP'09), February 2009.

  • Kiran-Kumar Muniswamy-Reddy and David A. Holland
    Causality Based Versioning. (PDF)
    7th USENIX Conference on File and Storage Technologies (FAST '09), February 2009.
    Selected as a top paper and forwarded to Nov. 2009 issue on Transactions on Storage (TOS).

  • Kiran-Kumar Muniswamy-Reddy, Joseph Barillari, Uri Braun, David A. Holland, Diana Maclean, Margo Seltzer, and Stephen D. Holland.
    Layering in Provenance-Aware Storage Systems. (PDF)
    Harvard University Computer Science Technical Report TR-04-08.

  • Uri Braun, Avraham Shinnar, and Margo Seltzer.
    Securing Provenance., (PDF, HTML)
    In Proceedings of the 3rd USENIX Workshop on Hot Topics in Security (HotSec), San Jose, CA, July 2008.

  • David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
    Choosing a Data Model and Query Language for Provenance. (PDF)
    In proceedings of the 2nd International Provenance and Annotation Workshop, Salt Lake City, UT, Jun 2008.

  • David A. Holland, Margo I. Seltzer, Uri Braun, and Kiran-Kumar Muniswamy-Reddy.
    PASSing the provenance challenge. (PDF)
    In Concurrency and Control: Practice and Experience: 2008;20:531-540.

  • Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Margo Seltzer.
    Provenance-Aware Storage Systems. (PDF)
    In proceedings of the 2006 USENIX Annual Technical Conference, Boston, MA, June 2006.

  • Uri Braun, Simson Garfinkel, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Margo Seltzer.
    Issues in Automatic Provenance Collection. (PDF)
    In proceedings of the 2006 International Provenance and Annotation Workshop, Chicago, IL, May 2006.

  • Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
    Provenance-Aware Storage Systems. (PDF)
    Harvard University Computer Science Technical Report TR-18-05, July 2005

  • Jonathan Ledlie, Chaki Ng, David A. Holland, Kiran-Kumar Muniswamy-Reddy, Uri Braun, and Margo Seltzer.
    Provenance-Aware Sensor Data Storage. (PDF, HTML)
    In Proceedings of NetDB 2005, Tokyo, Japan, April 2005.

Talks

Posters

  • David A. Holland, Uri Braun, Diana Maclean, Kiran-Kumar Muniswamy-Reddy, and Margo Seltzer.
    Choosing a Data Model and Query Language for Provenance. (pdf)
    2nd International Provenance and Annotation Workshop (IPAW'08), June 2008.

  • Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie.
    Provenance-Aware Storage Systems. (pdf)
    Harvard Industrial Partnership (HIP) 2005, October 2005.

Current Members

Alumni

Sponsors

  • Source of Support: NSF
    CSR-PDOS (CNS-0614784)
    Proposal Number: 0614784
    Title: CSR---PDOS: Support for Atomic Sequences of File System Operations
    Funded amount: $561,727
    Period: 09/01/06 -- 08/31/09
    Location: SUNY at Stony Brook
    SubContract: Harvard University
    Months: Cal:0; Acad:0; Sumr:1(?)