Examples of Undergrad Research Projects and Theses

Research Publications Courses Software Schedule
Optimal Bayesian Rule Lists -- Nicholas Larus-Stone (2017)

We demonstrate a new algorithm that finds an optimal rule list as well as providing proof of that optimality. Rule lists, which are lists composed of If-Then statements, are similar to decision tree classifiers and are useful because each step in the model's decision making process is understandable by humans. Our algorithm finds the optimal rule list through the use of three types of bounds: bounds inherent to the rules themselves, bounds based on the current best solution, and bounds based on symmetry between rule lists. We propose novel data structures to minimize the memory usage and runtime of our algorithm on this exponentially difficult problem. One advantage of our algorithm is that it generates not only the optimal rule list and a certificate of optimality, but also the entire solution space of nearly optimal solutions. Our algorithm therefore allows for the analysis and discovery of all optimal and near-optimal solutions on problems requiring human-interpretable algorithms.

Dependency Tracking in ASC -- Peter Kraft (2017)

ASC is a new method of automatic parallelization that transforms parallelization into a machine learning problem. ASC works by predicting the future state of a program's memory and registers and speculatively executing from predicted states. It is limited by its ability to accurately predict these future states. We introduce a new learning model for ASC as well as a notion of data dependency that reduces the size of the state that ASC needs to predict correctly to produce improved performance. Using these, we substantially increase the set of programs ASC can automatically parallelize, demonstrating speedup on several important classes of programs, such as maps.

The Climate for Women in Harvard CS -- Michelle Danoff (2017)

In collaboration with Jim Waldo, we are working to better understand the intersection of gender and computer science at Harvard. The work includes surveys and undergraduate interviews with the goal to understand how students became interested in computer science and how their experience with computer science has been at Harvard. This research involves many components such as survey design, data analysis, and research about efforts at other universities. Our goal is to gain a better understanding of what the experience at Harvard is like now, and pass that on so SEAS and student groups so they can better direct efforts to narrow the gender gap.

Examining the feasibility of crowd-sourcing EpiPens -- Gregory Hewitt (2017)

Epinephrine autoinjectors (e.g. EpiPens) are needed within minutes of anaphylactic shock to be effective, and emergency defibrillators are needed just as quickly to counteract a heart attack or heart arrhythmia. However, if someone is in need of one of these devices and does not have one on their person, it can be difficult and time consuming to find one. The goal of my research project is to find and implement the best way to make devices such as these more accessible in times of emergency. My analysis determines that, based on population densities of certain areas and the probability that any given person will have an autoinjector on their person, an app-based, crowdsourcing network can be implemented to solve this accessibility issue. I also discuss ways to safely and securely share, authenticate, and store a users location by examining what work and implementations have already been done in this area. I discuss why using Internet-connected containers that communicate with this network can replace current Automated Emergency Defibrillator (AED) containers drastically improves the accessibility of these devices and provides a significantly higher chance of survival in an emergency situation. I give detail on what this network requires for implementation and what proptocols are required. Finally, I discuss an evaluation of my own implementation of the hardware and software I used to build it.

Identifying and Visualizing Controversy in Wikipedia -- Brigette Pare (2017)

Information Flow Audit on Streaming Provenance Graphs -- Mohammed Hafeezul Rahman (2017)

Audit compliance on information/data flow graphs has been extensively studied, and there exist numerous applications that allow for provenance capture and checking for data flow violations on these provenance graphs. However, the subgraph containing the violation is usually small compared to the entire information flow graph generated by the system. Thus, we can save significant storage space by developing a framework capable of verifying compliance and enforcing application-specific information security guarantees from the data provenance graphs generated by the application, directly at runtime. The goal of the project is to achieve an in-memory solution to this by designing single-pass graph algorithms to detect violations from the massive streaming provenance graphs.

Predicting the Performance of Automatically Scalable Computation (ASC) -- Serena Wang (2016)

The Automatically Scalable Computation (ASC) architecture is a new approach to auto- matic parallelization that transforms parallelization into a machine learning problem. The underlying principle is that if we observe the state of the machine repeatedly at a given place in a sequential program, we can build a model to predict the state of the machine at that place over time. The "place" in the program is defined by an address loaded into the instruction pointer (IP). We present a machine learning approach to automatic IP selection. Our approach relies on the observation that the error function of ASC's internal machine learning model decreases over time for a good IP value. For a given program and IP value, we build a predictive model for ASC's error function value over time. The inputs are a program and IP value and the target is ASC's error function value. We present methods for representing the program and IP value as a feature set. We also present and evaluate three different machine learning models for predicting the error function values.

Community-Attribute Models for Bibliographic Reference Information via Dynamic Graph Evolution -- Ross Rheingans-Yoo (2016)

We present Cambridge, the first technique for evolutionary analysis of dynamic graphs via a community-attribute graph model. Community-attribute models have been shown to be superior to models conventionally used for evolutionary analysis, particularly in modeling ecommunity structures in networks where communities exhibit dense overlaps. Thus, our use of a community-attribute model for analysis of a bibliographic network evolving in time allows us to observe not only the evolution of discrete clusters, but also the evolution of the ‘core’ of nodes which are strongly linked to multiple communities simultaneously. In particular, our approach allows us to observe and quantify how the sibling communities resulting from community-splitting events share and compete for external intercommunity influence inherited from parent communities. We present evidence that indicates that in such splitting events, highly-connected nodes which were part of the parent networks ‘strong intercommunity ties’ become concentrated in the siblings’ intersection, whereas highly-connected nodes that are part of ‘weak intercommunity ties’ are dispersed to the individual sibling communities. We discuss the implications of our findings for the field of evolutionary graph analysis, and address the evident promise of dynamic community-attribute models in providing fully generative models for dynamic networks.

The Consumer's View on Security and Privacy of Health Wearbles -- Margot Taylor (2016)

This study investigates the health wearable market, in particular, consumer attitudes towards security and privacy of their data and the factors that might cause them to modify their attitudes. It is clear that security of health wearables and implantables has room for improvement. However, security is not one of consumers’ top purchasing criteria at the moment. A case study on the advertisement of security in the personal computers industry revealed that a large scale hack affecting thousands of users has the potential to increase consumers concern about security sufficiently to make it a purchasing criteria. Furthermore, a survey of 301 American adults revealed factors that could modify consumer’s attitudes towards security. First, consumers are significantly more concerned about medical history and diagnostic data than they are about biometric data. Once a wearable contains medical history data, security becomes a top purchasing criteria. Second, Consumer’s concern levels are affected by the distribution channel, as they trust devices prescribed by a doctor to be more secure. The nuancing of consumer attitudes towards security and privacy of health wearables has the potential to impact health companies advertising techniques, value proposition models, and design process to focus on security more. It furthermore suggests that the speed of improvement of security may increase once devices that store medical history and diagnostic data come to market.


Copyright © 2009 Margo I. Seltzer All rights reserved.