The remaining open issue is the parameterization of our simulator. Our algorithm relies on the server's ability to distribute objects to remote sites in a manner consistent with the request stream. That leaves several important decisions to make:
Deciding when to push is very important because of the great range of popularity experienced by servers on the Internet. A globally popular Web server that receives thousands of requests per hour should obviously have push-caching priority over a more obscure server that only receives a few hundred requests per day. To address this question we used an absolute push-threshold parameter.
Each server maintains a load factor based on how many requests it services per hour. Every time the server satisfies a request for a document it increments its load. Every half hour the server computes where . We chose this formula based on the load formula used by Unix  in order to achieve some continuity across load levels. When a server exceeds the push-threshold it replicates its most popular files to the optimal location, resetting the load to zero in the process.
In this manner we arbitrate between popular servers and the less popular. The push-threshold will need to be reset from time to time to keep up with the growth of the Web; we envision that the same service that maintains a list of available servers will also maintain the current threshold value.
A related issue concerns stability; we want to prevent a rogue server from flooding caches with thousands of unpopular pages, denying access to legitimate servers. There is no way to deal with this problem directly without imposing a bottleneck at the point where access is controlled, but the problem is also self-correcting. The replacement policy on push-cache servers should be directly tied to the popularity of the pages cached such that servers replace the least-popular pages first. Pages pushed illegitimately that are not actually popular will therefore quickly be replaced by genuinely popular pages. Caches should also establish a cap on the number of pages they accept from any one server.
We expected that pushing pages frequently would be more efficient than pushing infrequently, but we also thought that there would be a point of diminishing returns. Pushing too frequently would inundate push-cache servers with requests and would be too sensitive to momentary abberations in request patterns.
The server must record information describing the source of Web requests in order to efficiently decide where to push documents. Originally we tried reducing the amount of information that the server needed to store by storing access history information at a coarse level. Our first try recorded the number of requests arriving from each state.
Using states, however, proved to be a particularly inefficient because network access does not fall along state lines. Consider the case of a file that is popular on both the east coast and the west coast. We found in early simulations that almost all of the west coast accesses will be from California, whereas the east coast accesses will be spread across several states including New York, Massachusetts, Rhode Island, Vermont, and Maryland. Figure shows the state requests graphically for the NCSA access log analyzed above.
Figure: Visualization of each state's requests to one of NCSA's servers. Each state's label is placed over an arbitrary city located near the center of that state.
10,000 requests clustered in New England compares favorably to 9500 requests from California, yet due to arbitrary political distinctions the 10,000 requests from New England are divided among 8 states. California will dominate the decision of where to cache documents because its single large number will dominate the collection of much smaller numbers from New England.
The solution is not to make the grouping even more coarse; dividing the country into seven regions as suggested by Claffy and Braun  would help distribute server load but would not have a noticeable effect on network bandwidth.
Instead we record the number of requests that arrive from each individual subnetwork. This information can be used to calculate the optimal location of file replicas, but would be overwhelming if a separate log were kept for each individual file. An analysis of the server logs showed that this was not necessary.
As shown in Section , a few files make up a large percentage of the requests to a given server. Since these are precisely the pages that are most efficiently replicated, we are most interested in recording the access history for these files. It turns out that the request pattern for the entire server closely parallels the request pattern for the most popular files, and therefore only the request patterns for the server as a whole must be kept.
As a rough measure of fit we computed the access history for each file by recording how many requests arrived from each state as well as for the server as a whole. We computed the correlation between the request pattern for the top 10 most popular files from the NCSA server trace and for the server as a whole. These correlations are listed in table . The average correlation in this table is 0.98; these extremely high correlations indicate that it is sufficient to record information only about the entire server as a whole, and not each file individually.
Table: Correlation between the location of requests for the 10 most popular files on an NCSA server and the location of requests for the entire server
Our simulated server maintains a list of the subnets from which requests have been received along with the number of requests received from each. This list is used to decide where pages get replicated. When a pushing decision is made the list is cleared. It should never grow above several thousand networks since that would indicate that pages should be sent to remote servers. Old entries should also be removed periodically to avoid using stale information in making push-caching decisions.
To actually make the replication decision, the server uses a modified friends-of-friends algorithm (see section ). We ``seed'' the algorithm with a selection of existing remote servers, one remote server per cluster. The server then iterates through the access history list just described, adding each subnet's number of requests to the cluster nearest to that subnet. Once this is complete the server then pushes its most popular files to the push-cache server in the cluster responsible for the greatest number of requests.
The caching-strategy parameter helps decide what defines ``nearest'' . We experimented with three different methods: by network hops, by miles, and by random choice. According to the hops metric, the nearest server is the fewest network hops away. According to the miles metric, the nearest server is the fewest miles away. And according to the random metric, the nearest server is chosen randomly.
We expected that making the decision randomly would have the worst performance, and that choosing optimally by using the hops to make the choice would have the best performance. We had hoped that using miles to decide would approximate the performance of using hops.
Server-initiated caching is most efficient when overhead operations, such as pushing documents, are minimized. Servers should therefore push multiple pages. We experimented with the the number of the most popular files pushed at a time, the pages-to-push parameter. We had expected that pushing many pages at a time would be more efficient than pushing a few pages at a time because it would amortize the overhead of push-caching over more pages.
It is not enough to place a replicated copy of several files in the optimal location; local clients must also learn about the location of these cached files in order to use them. Other groups are working on sophisticated ways to handle local resource discovery . This question is outside the domain of this work, so we used the simplest method suggested by Blaze and discussed in section . Clients contact the original server on first request. The server may then send the client a redirect message instructing it to use a different cache instead. Using a more efficient location scheme can only improve the performance of push-caching.
To decide which push-cache is closest to the client our simulator uses the same three metrics listed above: miles, network hops, and random selection. The parameter lookup-strategy decides which policy is in effect; for every request it receives the server checks to see if one of the caches to which it has pushed that document is closer to the client than the server itself. If not, it simply returns the document. If so, it sends the client a redirect message for each of its documents currently cached at that server. This policy is designed to minimize the number of server queries necessary, amortizing the cost of querying a distant server over the many local requests that are thereby made possible.
On average, clients request several documents (4-20) from a server. If the server acts preemptively, sending the client redirects for all of its documents that are currently being cached it should be able to avoid multiple queries. Given that the average file size on the traces we examined is 2000 bytes and that a redirect consists of a URL (average length 30 bytes) and a machine name (4 bytes), we can send approximately 60 redirects for the price of one file. Additionally, if push-caching enables us to halve the expected latency of a transaction, then transferring only two documents from a closer push-cache server makes up for querying the original server.
On the client side, caching redirects takes much less space than caching full documents. The client always tries a cached redirect if it has one since the cache manages consistency, not the client. If the cache is no longer storing the document then it will return an invalid message, and the client contacts the original server. Since the client should have been redirected to a nearby cache server the invalid message will be returned quickly.
As an implementation note, many sites are currently using proxy servers. The client-side of this algorithm is implemented in the proxy server more easily than it is implemented in the client, since the proxy server is assumed to have the space to cache many tens of thousands of redirects as well as thousands of documents.
We expected here as well that using hops would be most efficient, followed closely by miles, and more distantly by random selection.
For push-caching to be effective, there must obviously be push-cache servers available onto which objects may be pushed. We experimented with the number of servers available, the server-number parameter. This parameter must be distinguished from one of the metrics that we will use in the next chapter to measure push-cache performance, the number of active push-cache servers. We did not expect all of the available push-cache servers to be used, because once a set of objects was sufficiently replicated we did not expect the load on any one server to exceed the push-cache threshold.
We do, however, expect network traffic to diminish as more servers are made available, because the larger the pool of available servers, the more likely that a server will be available where it can optimally reduce traffic.