The remaining open issue is the parameterization of our simulator. Our algorithm relies on the server's ability to distribute objects to remote sites in a manner consistent with the request stream. That leaves several important decisions to make:
.
Deciding when to push is very important because of the great range of popularity experienced by servers on the Internet. A globally popular Web server that receives thousands of requests per hour should obviously have push-caching priority over a more obscure server that only receives a few hundred requests per day. To address this question we used an absolute push-threshold parameter.
Each server maintains a load factor based on how many requests it
services per hour. Every time the server satisfies a request for a
document it increments its load. Every half hour the server computes
where
. We chose this
formula based on the load formula used by Unix [22] in
order to achieve some continuity across load levels. When a server
exceeds the push-threshold it replicates its most popular
files to the optimal location, resetting the load to zero in the
process.
In this manner we arbitrate between popular servers and the less popular. The push-threshold will need to be reset from time to time to keep up with the growth of the Web; we envision that the same service that maintains a list of available servers will also maintain the current threshold value.
A related issue concerns stability; we want to prevent a rogue server from flooding caches with thousands of unpopular pages, denying access to legitimate servers. There is no way to deal with this problem directly without imposing a bottleneck at the point where access is controlled, but the problem is also self-correcting. The replacement policy on push-cache servers should be directly tied to the popularity of the pages cached such that servers replace the least-popular pages first. Pages pushed illegitimately that are not actually popular will therefore quickly be replaced by genuinely popular pages. Caches should also establish a cap on the number of pages they accept from any one server.
We expected that pushing pages frequently would be more efficient than pushing infrequently, but we also thought that there would be a point of diminishing returns. Pushing too frequently would inundate push-cache servers with requests and would be too sensitive to momentary abberations in request patterns.
The server must record information describing the source of Web requests in order to efficiently decide where to push documents. Originally we tried reducing the amount of information that the server needed to store by storing access history information at a coarse level. Our first try recorded the number of requests arriving from each state.
Using states, however, proved to be a particularly inefficient because
network access does not fall along state lines. Consider
the case of a file that is popular on both the east coast and the west
coast. We found in early simulations that almost all of the west coast
accesses will be from California, whereas the east coast accesses will
be spread across several states including New York, Massachusetts,
Rhode Island, Vermont, and Maryland. Figure
shows the state requests graphically for
the NCSA access log analyzed above.
Figure: Visualization of each state's requests to one of NCSA's
servers. Each state's label is placed over an arbitrary city located
near the center of that state.
10,000 requests clustered in New England compares favorably to 9500 requests from California, yet due to arbitrary political distinctions the 10,000 requests from New England are divided among 8 states. California will dominate the decision of where to cache documents because its single large number will dominate the collection of much smaller numbers from New England.
The solution is not to make the grouping even more coarse; dividing the country into seven regions as suggested by Claffy and Braun [7] would help distribute server load but would not have a noticeable effect on network bandwidth.
Instead we record the number of requests that arrive from each individual subnetwork. This information can be used to calculate the optimal location of file replicas, but would be overwhelming if a separate log were kept for each individual file. An analysis of the server logs showed that this was not necessary.
As shown in Section
, a few files make up a large
percentage of the requests to a given server. Since these are
precisely the pages that are most efficiently replicated, we are most
interested in recording the access history for these files. It turns
out that the request pattern for the entire server closely parallels
the request pattern for the most popular files, and therefore only the
request patterns for the server as a whole must be kept.
As a rough measure of fit we computed the access history for each file
by recording how many requests arrived from each state as well as for
the server as a whole. We computed the correlation between the request
pattern for the top 10 most popular files from the NCSA server trace
and for the server as a whole. These correlations are listed in
table
. The average correlation in this
table is 0.98; these extremely high correlations indicate that it is
sufficient to record information only about the entire server as a
whole, and not each file individually.
Table: Correlation between the location of requests for the 10 most
popular files on an NCSA server and the location of requests for the
entire server
Our simulated server maintains a list of the subnets from which requests have been received along with the number of requests received from each. This list is used to decide where pages get replicated. When a pushing decision is made the list is cleared. It should never grow above several thousand networks since that would indicate that pages should be sent to remote servers. Old entries should also be removed periodically to avoid using stale information in making push-caching decisions.
To actually make the replication decision, the server uses a modified
friends-of-friends algorithm (see
section
). We ``seed'' the algorithm with a
selection of existing remote servers, one remote server per
cluster. The server then iterates through the access history list just
described, adding each subnet's number of requests to the cluster
nearest to that subnet. Once this is complete the server then pushes
its most popular files to the push-cache server in the cluster
responsible for the greatest number of requests.
The caching-strategy parameter helps decide what defines ``nearest'' . We experimented with three different methods: by network hops, by miles, and by random choice. According to the hops metric, the nearest server is the fewest network hops away. According to the miles metric, the nearest server is the fewest miles away. And according to the random metric, the nearest server is chosen randomly.
We expected that making the decision randomly would have the worst performance, and that choosing optimally by using the hops to make the choice would have the best performance. We had hoped that using miles to decide would approximate the performance of using hops.
Server-initiated caching is most efficient when overhead operations, such as pushing documents, are minimized. Servers should therefore push multiple pages. We experimented with the the number of the most popular files pushed at a time, the pages-to-push parameter. We had expected that pushing many pages at a time would be more efficient than pushing a few pages at a time because it would amortize the overhead of push-caching over more pages.
It is not enough to place a replicated copy of several files in the
optimal location; local clients must also learn about the location of
these cached files in order to use them. Other groups are working on
sophisticated ways to handle local resource discovery [19][6]. This question is outside the domain of this work, so we
used the simplest method suggested by Blaze and discussed in
section
. Clients contact the original server on first
request. The server may then send the client a redirect message
instructing it to use a different cache instead. Using a more
efficient location scheme can only improve the performance of
push-caching.
To decide which push-cache is closest to the client our simulator uses the same three metrics listed above: miles, network hops, and random selection. The parameter lookup-strategy decides which policy is in effect; for every request it receives the server checks to see if one of the caches to which it has pushed that document is closer to the client than the server itself. If not, it simply returns the document. If so, it sends the client a redirect message for each of its documents currently cached at that server. This policy is designed to minimize the number of server queries necessary, amortizing the cost of querying a distant server over the many local requests that are thereby made possible.
On average, clients request several documents (4-20) from a server. If the server acts preemptively, sending the client redirects for all of its documents that are currently being cached it should be able to avoid multiple queries. Given that the average file size on the traces we examined is 2000 bytes and that a redirect consists of a URL (average length 30 bytes) and a machine name (4 bytes), we can send approximately 60 redirects for the price of one file. Additionally, if push-caching enables us to halve the expected latency of a transaction, then transferring only two documents from a closer push-cache server makes up for querying the original server.
On the client side, caching redirects takes much less space than caching full documents. The client always tries a cached redirect if it has one since the cache manages consistency, not the client. If the cache is no longer storing the document then it will return an invalid message, and the client contacts the original server. Since the client should have been redirected to a nearby cache server the invalid message will be returned quickly.
As an implementation note, many sites are currently using proxy servers. The client-side of this algorithm is implemented in the proxy server more easily than it is implemented in the client, since the proxy server is assumed to have the space to cache many tens of thousands of redirects as well as thousands of documents.
We expected here as well that using hops would be most efficient, followed closely by miles, and more distantly by random selection.
For push-caching to be effective, there must obviously be push-cache servers available onto which objects may be pushed. We experimented with the number of servers available, the server-number parameter. This parameter must be distinguished from one of the metrics that we will use in the next chapter to measure push-cache performance, the number of active push-cache servers. We did not expect all of the available push-cache servers to be used, because once a set of objects was sufficiently replicated we did not expect the load on any one server to exceed the push-cache threshold.
We do, however, expect network traffic to diminish as more servers are made available, because the larger the pool of available servers, the more likely that a server will be available where it can optimally reduce traffic.