Autonomous Replication in Wide-Area Internetworks

A thesis presented

by

James S. Gwertzman

to

Computer Science

in partial fulfullment of the honors requirements

for the degree of

Bachelor of Arts.

Harvard College

Cambridge, Massachusetts

April 10, 1995



next up previous
Next: Introduction

Abstract:

The number of users connected to the Internet has been growing at an exponential rate, resulting in similar increases in network traffic and Internet server load. Advances in microprocessors and network technologies have so far kept up with growth, but we are reaching the limits of hardware solutions. In order for the Internet's growth to continue, we must efficiently distribute server load and reduce the network traffic generated by its various services.

Traditional wide-area caching schemes are client initiated. Decisions on where and when to cache information are made without the benefit of the server's global knowledge of the situation. We introduce a technique, geographical push-caching, that is server initiated; it leaves caching decisions to the server. The server may use its knowledge of network topology, geography, and access patterns to minimize network traffic and server load.

The World Wide Web is an example of a large-scale distributed information system that will benefit from this geographical distribution, and we present an architecture that allows a Web server to autonomously replicate HTML pages. We use a trace-driven simulation of the Internet to evaluate several competing caching strategies. Our results show that while simple client caching reduces server load and network bandwidth demands by up to 30%, adding server-initiated caching reduces server load by an additional 20% and network bandwidth demands by an additional 10%. Furthermore push-caching is more efficient than client-caching, using an order of magnitude less cache space for comparable bandwidth and load savings.

To determine the optimal cache consistency protocol we used a generic server simulator to evaluate several cache-consistency protocols, and found that weak consistency protocols are sufficient for the World Wide Web since they use the same bandwidth as an atomic protocol, impose less server load, and return stale data less than 1% of the time.





next up previous
Next: Introduction



James Gwertzman
Wed Apr 12 00:26:11 EDT 1995