The Power of Two Choices in Randomized Load Balancing

Suppose that $n$ balls are placed into $n$ bins, each ball being placed into a bin chosen independently and uniformly at random. Then, with high probability, the maximum load in any bin is approximately $\log n \over \log \log n$. Suppose instead that each ball is placed sequentially into the least full of $d$ bins chosen independently and uniformly at random. It has recently been shown that the maximum load is then only ${\log \log n \over \log d} + \O(1)$ with high probability. Thus giving each ball two choices instead of just one leads to an exponential improvement in the maximum load. This result demonstrates the power of two choices, and it has several applications to load balancing in distributed systems.

In this thesis, we expand upon this result by examining related models and by developing techniques for studying similar randomized load balancing schemes. We first remove the restriction above that the balls be placed sequentially. We provide a lower bound demonstrating a tradeoff between the number of rounds of communication used and the maximum load achieved. Our lower bound shows that one cannot achieve a maximum load of $\O(\log \log n)$ with only a constant number of rounds of communication. We also provide simple algorithms that match our lower bounds within a constant factor.

We then consider dynamic models, where balls enter and leave the system over time. This leads to a natural queueing problem: customers arrive as a Poisson stream at a collection of $n$ servers. Each customer chooses $d$ of these servers independently and uniformly at random and queues at the server currently containing the fewest customers. Customers require an exponentially distributed amount of service time before leaving the system. We call this model the {\em supermarket model}. We determine the behavior of the supermarket model by defining an idealized process, corresponding to a system of infinite size, which is cleaner and easier to analyze. We then relate the idealized system to the finite system, bounding the error between them. Using this technique, we also study several generalizations of the supermarket model and many other load balancing schemes. Our results demonstrate the effectiveness of having two choices in many situations.

Originally appeared as my thesis.