Michael Mitzenmacher: Recent Projects




Overview

I'm an applied algorithms person. I look for problems in the intersection of algorithms, networks, and information theory, but in practice I work on problems in all three areas. The Internet is a great source of problems; you need algorithms for effectively transmitting information over a complex network.

Algorithms for the Internet

If you're interested in applied algorithms, the Internet is a great source of problems. Most of my recent work in this area focuses on applications of Low-Density Parity-Check codes (see more on codes below) and Bloom filters. A Bloom filter is a data structure designed to succinctly represent a set of objects so that you can queries of the form "Is this an element of the set?" very quickly. The price you pay for the gain in speed and space is that sometimes the data structure will tell you an element is in the set when it is not. That is, there are false positives. Whenever you want to use or send a set, but the space requirements are too high, you should think about using a Bloom filter instead. This data structure has recently been suggested for several network applications, as Andrei Broder and I detail in our survey of Bloom filters in network applications. I had the opportunity to combine codes and Bloom filters while working with John Byers and the people on the Shapeshifter project at Boston University. Our SIGCOMM 2002 paper discusses how to use codes and Bloom filters to achieve informed content delivery across adaptive overlay networks. A technical report gives details about a new data structure we developed that uses a Bloom filter for fast approximate reconciliation, a useful primitive in overlay networks.

My most recent endeavor in this area has been to consider how to combine rankings from independent sources into a global ranking, which has applications to search and meta-search. I was introduced to this area by my esteemed collaborator Cynthia Dwork (Microsoft Research, San Jose). We should have a draft up shortly of our latest results.

Codes

With Michael Luby, Amin Shokrollahi, and Dan Spielman, I co-developed Tornado codes, a particularly efficient type of low-density parity-check code for erasures. These ideas have been further developed and commercialized by the company Digital Fountain. I worked with Digital Fountain colleagues Michael Luby and John Byers on network applications for these codes, which led to some related work on congestion-control protocols for reliable multicast.

Our work also led to analyses for low-density parity-check codes over erasure channels. Richardson and Urbanke extended our analysis, leading to low-density parity-check codes that provably come very, very close to the Shannon capacity for basic channels. For this work, I am told the six of us are sharing the IEEE Information Theory Society Paper Award, but I haven't received a written confirmation yet.

I currently work on coding problems with Alek Kavcic, of the Harvard EE department, extending these ideas to more complex channels. I am also working on a new low-density parity-check code variation called Verification Codes that combine the benefits of the error and erasure codes developed previously.

Human-Guided Search

I serve as an algorithmic consultant for the Human-Guided Search project at Mitsubishi Electronic Research Labs. The project is led by the truly outstanding Neal Lesh, and more information can be found on the Human-Guided Search project page. Interactive, or human-in-the-loop, optimization systems leverage people's abilities in areas in which they outperform computers, such as visual and strategic thinking. Users can steer interactive optimization systems towards solutions which satisfy real-world constraints. Furthermore, people can better understand, justify, and modify solutions if they have participated in their construction. The Human-Guided Search (HuGS) framework and Java toolkit allows rapid development of interactive optimization systems. The framework and code include visual metaphors for focusing and constraining optimization algorithms. The user can select from different algorithms, including a human-guidable version of a powerful heuristic, called tabu search. We have developed a wide variety of applications with the HuGS toolkit, including interactive systems to solve scheduling, vehicle routing, layout, and protein-folding problems.

Balls and Bins Problems

My Ph.D. thesis was on balls and bins problems, and every time I tell myself I've written my last paper on this theme, I find I'm mistaken.

One model I've been exploring is when there are just two bins, and the probability that the next ball lands in a bin is proportional to some function of the number of balls in the bins. Such models are used to describe positive feedback, which some economists claim really happens in the real world. Our results show that monopoly happens, quickly! The first work in this area was done with my graduate student Eleni Drinea and Alan Frieze.

Another model I've recently worked on with Balaji Prabhakar and Devavrat Shah from Stanford is balls and bins games with memory. Suppose you throw n balls into n bins, with each ball (after the first) getting to choose the shortest of two bins as follows: for each ball, one bin is chosen independently and uniformly at random; the other choice is the best of the two bins left over after the previous ball. This yields a "better" maximum load than choosing two bins independently and uniformly at random for each ball. (Well, sort of... check out the paper.)