Provenance Aware Storage Systems (PASS):
Provenance (also known as pedigree or lineage) refers to the complete
history of a document. In the scientific community, provenance
refers to the information that describes data in sufficient detail
to facilitate reproduction and enable validation of results. In the
archival community, provenance refers to the chain of ownership and
the transformations a document has undergone. However, in most
computer systems today, provenance is an after-thought, implemented
as an auxiliary indexing structure parallel to the actual data.
Provenance, however, is merely a particular type of meta-data. The
operating system should be responsible for the collection of
provenance and the storage system should be responsible for its
management. We define a new class of storage system, called a
provenance-aware storage system (PASS), that supports the automatic
collection and maintenance of provenance. A PASS collects provenance
as new objects are created in the system and maintains that provenance
just as it maintains conventional file system meta-data. A PASS,
in addition to collecting and maintaining provenance, also supports
queries upon the provenance.
Currently, we have implemented a prototype that records relevant system activity and stores it persistently in an in-kernel database and responds to user queries about a file's provenance.
The Hourglass project is building a scalable, robust data
collection system to support geographically diverse sensor network
applications. Hourglass is an Internet-based infrastructure for
connecting a wide range of sensors, services, and applications in
a robust fashion. In Hourglass, streams of data elements are routed
to one or more applications. These data elements are generated from
sensors inside of sensor networks whose internals can be entirely
hidden from participants in the Hourglass system. The Hourglass
infrastructure consists of an overlay network of well-connected
dedicated machines that provides service registration, discovery,
and routing of data streams from sensors to client applications.
In addition, Hourglass supports a set of in-network services such
as filtering, aggregation, compression, and buffering stream data
between source and destination. Hourglass also allows third party
services to be deployed and used in the network.