Research Projects


Research Publications Courses Software Schedule
Margo's Students
  • Elaine Angelino -- Python Provenance and Medical Informatics
  • Uri Braun -- Provenance Security
  • Peter Macko -- File System Back References
  • Daniel Margo -- Provenance-Aware Browsing
  • Kiran-Kumar Muniswamy-Reddy -- Provenance-Aware Systems
  • Nick Murphy -- System Support for Scientific Computing
  • Chaki Ng -- Strategyproof resource allocation in distributed systems
  • Robin Smogor -- Data Mining in Healthcare
  • Research Project Ideas
    Provenance Aware Storage Systems (PASS): Provenance (also known as pedigree or lineage) refers to the complete history of a document. In the scientific community, provenance refers to the information that describes data in sufficient detail to facilitate reproduction and enable validation of results. In the archival community, provenance refers to the chain of ownership and the transformations a document has undergone. However, in most computer systems today, provenance is an after-thought, implemented as an auxiliary indexing structure parallel to the actual data.

    Provenance, however, is merely a particular type of meta-data. The operating system should be responsible for the collection of provenance and the storage system should be responsible for its management. We define a new class of storage system, called a provenance-aware storage system (PASS), that supports the automatic collection and maintenance of provenance. A PASS collects provenance as new objects are created in the system and maintains that provenance just as it maintains conventional file system meta-data. A PASS, in addition to collecting and maintaining provenance, also supports queries upon the provenance.

    Currently, we have implemented a prototype that records relevant system activity and stores it persistently in an in-kernel database and responds to user queries about a file's provenance.

    Scalable Data Management This project addresses the challenge of making petabyte scale storage systems easily usable. The techniques we developed forty years ago are simply not up to the task of managing billions of files. We leverage provenance (a record of the data and processes that contributed to its creation), content, and other attributes to provide a scalable and searchable file namespace that tracks data as it moves through the scientific workflow.
    Hourglass: The Hourglass project is building a scalable, robust data collection system to support geographically diverse sensor network applications. Hourglass is an Internet-based infrastructure for connecting a wide range of sensors, services, and applications in a robust fashion. In Hourglass, streams of data elements are routed to one or more applications. These data elements are generated from sensors inside of sensor networks whose internals can be entirely hidden from participants in the Hourglass system. The Hourglass infrastructure consists of an overlay network of well-connected dedicated machines that provides service registration, discovery, and routing of data streams from sensors to client applications. In addition, Hourglass supports a set of in-network services such as filtering, aggregation, compression, and buffering stream data between source and destination. Hourglass also allows third party services to be deployed and used in the network.

    Copyright © 2009 Margo I. Seltzer All rights reserved.