|
Digital provenance describes the ancestry or history of a
digital object. Most existing provenance systems, however,
operate at only one level of abstraction: the system
call layer, a workflow specification, or the high-level
constructs of a particular application. The provenance
collectable in each of these layers is different, and all of
it can be important. Single-layer systems fail to account
for the different levels of abstraction at which users need
to reason about their data and processes. These systems
cannot integrate data provenance across layers and cannot
answer questions that require an integrated view of
the provenance.
We have designed a provenance collection structure
facilitating the integration of provenance across multiple
levels of abstraction, including a workflow engine,
a web browser, and an initial runtime Python provenance
tracking wrapper. We layer these components atop
provenance-aware network storage (NFS) that builds
upon a Provenance-Aware Storage System (PASS). We
discuss the challenges of building systems that integrate
provenance across multiple layers of abstraction, present
how we augmented systems in each layer to integrate
provenance, and present use cases that demonstrate how
provenance spanning multiple layers provides functionality
not available in existing systems. Our evaluation
shows that the overheads imposed by layering provenance
systems are reasonable.
|