We had expected that an adaptive protocol like the Alex protocol would do better than the static TTL protocol, so we decided to examine the factors that contributed to its performance. In examining Worrell's simulator closely we realized that he was calculating file lifetimes with a flat distribution across the minimum and maximum he had observed. This scheme meant that there was no correlation among the series of generated lifetimes for a file. Our own observations of server traces taken from our modified Web server show that this is not the case; frequently a file will go unmodified for a long time, then it will be modified many times within a short time-period as the file is updated, and then it will be unmodified again for a long period of time.
An adaptive protocol like Alex will work well with a file exhibiting this behaviour; while the file is changing rapidly Alex will also check rapidly. Once the file stabilizes Alex will not check consistency as often.
Bestavros provides further reasons why this random lifetime generation is flawed; he found that on a given server only a few files change rapidly [2]. Furthermore he notes that globally popular files are the least likely to change, evidence that Worrell's random file selection is flawed as well. If file request distribution is skewed toward the most popular files then the overall number of stale hits would go down even further.
To test these results we modified Worrell's simulator to use actual
Web server logs in generating file lifetimes. These were taken from
the campus Web servers modified to store last-modified
timestamps, described in section
. A script reads
through these access logs generating another trace that records
everytime a file changes.
Table: Summary of mutability statistics for various campus servers
over a one-month period. Mutable files changed more than once over the
time period. Very mutable files changed more than 5 times. Any request
that was not generated by a client in the harvard.edu domain was
considered remote, and any files added in the middle of this time
period were not included in these statistics. Notice that the most
popular server, the FAS server, is also the one with
the fewest mutable files.
Information pertaining to file changes from the three Web server
traces is summarized in
table
. Information about the
traces in this table confirms Bestavros' observation that the files
with the greatest popularity are also the ones with the smallest
mutability.
These files also change far less often than the files with randomly generated lifetimes. One run, for example, with randomly generated lifetimes involved 2085 files over a 56 day simulated run. Those 2085 files changed 19,898 times for an average probability that a given file would change per day of 17%. Our HCS trace on the other hand, the trace that changed the most frequently, involved 573 files over 25 days. Those files changed a total of 260 times for an average probability that a given file would change per day of 1.8%. This result is confirmed by Bestavros, who found that an average file-change probability per day between 0.5% and 2.0% based on file popularity. Bestavros collapsed all changes made on a single day into one change; if we followed suit our probability drops even lower.
This represents an order of magnitude difference in rapidity of change; we expect therefore to obtain far fewer stale cache hits with the Alex and TTL protocols using the trace data than we did with Worrell's random lifetime generation.
We used these traces to drive the modified simulator, and we found that our hypothesis was confirmed. Since the files in our traces changed far less often than the files with randomly generated lifetimes, the stale hit count was therefore much less. We simulated the performance of an invalidation protocol, Alex, and TTL for each of our three traces, and then averaged the results together.
Figure: Comparison
of bandwidth used by invalidation protocol, Alex scheme, and TTL
scheme from trace-driven simulations. Shown is the average from the
FAS trace, the HCS trace, and the DAS trace. Only those files that
were already in the primary host at the beginning of the month were
simulated. Note that the Alex scheme and the TTL scheme both use less
bandwidth than the Invalidation Protocol.
Figure: Comparison of cache hit ratios from Alex scheme and TTL scheme for
trace-driven simulations. Note that either scheme results in a low
stale data return rate when driven by real-world traces.
Figure: Comparison of
server loads imposed by the invalidation protocol, the Alex scheme, and
the TTL scheme. Notice that Alex imposes less load on the server than
TTL, but that neither is superior to the invalidation protocol.
Figures
and
display
the results. These graphs differ significantly from those generated by
Worrell's random file-length generation, but are still
inconclusive. Neither Alex nor TTL yields many stale cache hits, but
their bandwidth savings over an invalidation protocol is not significant
either. Figure
shows the number of server
operations generated by the various protocols: these include requests
for documents, queries to determine whether documents are stale or
not, and the invalidation of documents by the server. These graphs
are also inconclusive: Alex is more efficient than TTL, but neither is
superior to an invalidation protocol.