We had expected that an adaptive protocol like the Alex protocol would do better than the static TTL protocol, so we decided to examine the factors that contributed to its performance. In examining Worrell's simulator closely we realized that he was calculating file lifetimes with a flat distribution across the minimum and maximum he had observed. This scheme meant that there was no correlation among the series of generated lifetimes for a file. Our own observations of server traces taken from our modified Web server show that this is not the case; frequently a file will go unmodified for a long time, then it will be modified many times within a short time-period as the file is updated, and then it will be unmodified again for a long period of time.
An adaptive protocol like Alex will work well with a file exhibiting this behaviour; while the file is changing rapidly Alex will also check rapidly. Once the file stabilizes Alex will not check consistency as often.
Bestavros provides further reasons why this random lifetime generation is flawed; he found that on a given server only a few files change rapidly . Furthermore he notes that globally popular files are the least likely to change, evidence that Worrell's random file selection is flawed as well. If file request distribution is skewed toward the most popular files then the overall number of stale hits would go down even further.
To test these results we modified Worrell's simulator to use actual Web server logs in generating file lifetimes. These were taken from the campus Web servers modified to store last-modified timestamps, described in section . A script reads through these access logs generating another trace that records everytime a file changes.
Table: Summary of mutability statistics for various campus servers over a one-month period. Mutable files changed more than once over the time period. Very mutable files changed more than 5 times. Any request that was not generated by a client in the harvard.edu domain was considered remote, and any files added in the middle of this time period were not included in these statistics. Notice that the most popular server, the FAS server, is also the one with the fewest mutable files.
Information pertaining to file changes from the three Web server traces is summarized in table . Information about the traces in this table confirms Bestavros' observation that the files with the greatest popularity are also the ones with the smallest mutability.
These files also change far less often than the files with randomly generated lifetimes. One run, for example, with randomly generated lifetimes involved 2085 files over a 56 day simulated run. Those 2085 files changed 19,898 times for an average probability that a given file would change per day of 17%. Our HCS trace on the other hand, the trace that changed the most frequently, involved 573 files over 25 days. Those files changed a total of 260 times for an average probability that a given file would change per day of 1.8%. This result is confirmed by Bestavros, who found that an average file-change probability per day between 0.5% and 2.0% based on file popularity. Bestavros collapsed all changes made on a single day into one change; if we followed suit our probability drops even lower.
This represents an order of magnitude difference in rapidity of change; we expect therefore to obtain far fewer stale cache hits with the Alex and TTL protocols using the trace data than we did with Worrell's random lifetime generation.
We used these traces to drive the modified simulator, and we found that our hypothesis was confirmed. Since the files in our traces changed far less often than the files with randomly generated lifetimes, the stale hit count was therefore much less. We simulated the performance of an invalidation protocol, Alex, and TTL for each of our three traces, and then averaged the results together.
Figure: Comparison of bandwidth used by invalidation protocol, Alex scheme, and TTL scheme from trace-driven simulations. Shown is the average from the FAS trace, the HCS trace, and the DAS trace. Only those files that were already in the primary host at the beginning of the month were simulated. Note that the Alex scheme and the TTL scheme both use less bandwidth than the Invalidation Protocol.
Figure: Comparison of cache hit ratios from Alex scheme and TTL scheme for trace-driven simulations. Note that either scheme results in a low stale data return rate when driven by real-world traces.
Figure: Comparison of server loads imposed by the invalidation protocol, the Alex scheme, and the TTL scheme. Notice that Alex imposes less load on the server than TTL, but that neither is superior to the invalidation protocol.
Figures and display the results. These graphs differ significantly from those generated by Worrell's random file-length generation, but are still inconclusive. Neither Alex nor TTL yields many stale cache hits, but their bandwidth savings over an invalidation protocol is not significant either. Figure shows the number of server operations generated by the various protocols: these include requests for documents, queries to determine whether documents are stale or not, and the invalidation of documents by the server. These graphs are also inconclusive: Alex is more efficient than TTL, but neither is superior to an invalidation protocol.