|
The cloud is poised to become the next computing environment for
both data storage and computation due
to its pay-as-you-go and provision-as-you-go models.
Cloud storage is already being used to back up desktop
user data, host shared scientific data, store web application
data, and to serve web pages. Today's cloud stores,
however, are missing an important ingredient: provenance.
Provenance is metadata that describes the history of
an object. We make the case that provenance is crucial
for data stored on the cloud and identify the properties of
provenance that enable its utility. We then examine current
cloud offerings and design and implement three protocols for
maintaining data/provenance in current cloud
stores. The protocols represent different points in the de-
sign space and satisfy different subsets of the provenance
properties. Our evaluation indicates that the overheads
of all three protocols are comparable to each other and
reasonable in absolute terms. Thus, one can select a
protocol based upon the properties it provides without
sacrificing performance. While it is feasible to provide
provenance as a layer on top of today's cloud offerings,
we conclude by presenting the case for incorporating
provenance as a core cloud feature, discussing the issues in doing so.
|