by
James S. Gwertzman
to
Computer Science
in partial fulfullment of the honors requirements
for the degree of
Bachelor of Arts.
Harvard College
Cambridge, Massachusetts
April 10, 1995
Traditional wide-area caching schemes are client initiated. Decisions on where and when to cache information are made without the benefit of the server's global knowledge of the situation. We introduce a technique, geographical push-caching, that is server initiated; it leaves caching decisions to the server. The server may use its knowledge of network topology, geography, and access patterns to minimize network traffic and server load.
The World Wide Web is an example of a large-scale distributed information system that will benefit from this geographical distribution, and we present an architecture that allows a Web server to autonomously replicate HTML pages. We use a trace-driven simulation of the Internet to evaluate several competing caching strategies. Our results show that while simple client caching reduces server load and network bandwidth demands by up to 30%, adding server-initiated caching reduces server load by an additional 20% and network bandwidth demands by an additional 10%. Furthermore push-caching is more efficient than client-caching, using an order of magnitude less cache space for comparable bandwidth and load savings.
To determine the optimal cache consistency protocol we used a generic server simulator to evaluate several cache-consistency protocols, and found that weak consistency protocols are sufficient for the World Wide Web since they use the same bandwidth as an atomic protocol, impose less server load, and return stale data less than 1% of the time.