Provenance Integration Requires Reconciliation

Elaine Angelino, Uri Braun, David Holland, Peter Macko, Daniel Margo, Margo Seltzer


While there has been a great deal of research on prove- nance systems, there has been little discussion about challenges that arise when making different provenance systems interoperate. In fact, most of the literature focuses on provenance systems in isolation and does not discuss interoperability – what it means, its re- quirements, and how to achieve it. We designed the Provenance-Aware Storage System to be a general- purpose substrate on top of which it would be “easy” to add other provenance-aware systems in a way that would provide “seamless integration” for the provenance cap- tured at each level. While the system did exactly what we wanted on toy problems, when we began integrating StarFlow, a Python-based workflow/provenance system, we discovered that integration is far trickier and more subtle than anyone has suggested in the literature. This work describes our experience undertaking the integra- tion of StarFlow and PASS, identifying several impor- tant additions to existing provenance models necessary for interoperability among provenance systems.
