For a server to make the optimal decision about where to cache data it
must have an accurate representation of network topology. As we saw in
section
, there is
currently no way to determine á priori the Internet's
topology. We hypothesized that geographical information could be used
to hint at which servers were topologically close.
We surveyed the Internet using the traceroute [21] program to measure Internet topology, and we used a file maintained by Merit [7][30], listing the address of each subnet administrator for the 42,000 subnets on the Internet today. The critical datum in the Merit file is the zip code listed in the address; in conjunction with a geography server [24], this provides enough information to establish the latitude and longitude of each network administrator. As long as the zip-code of the subnet administrator matches the zip-code for the subnet as a whole we can accurately place the subnet geographically. It is a simple calculation using this information to compute the distance between two arbitrary hosts on the Internet, accurate to within a zip-code and the size of the subnet. This approach is not effective for subnets that span multiple zip-codes, such as backbone networks or regional networks, but it is effective for the local networks that account for a large fraction of the client requests.
To test the correlation between these two types of information we selected several hundred hosts in the United States and surveyed each one's distance from Harvard. We calculated the latency between our host and it, as well as the number of network hops between them using traceroute. We also calculated the distance between them in miles as described above. Since individual workstations are frequently not accessible our survey settles for any computer it can reach on the same subnet as the desired host. If it is not possible to access any host on the desired subnet then a failure is recorded for that host. We ran this program from several other locations around the Internet, including the west coast and Colorado.
We did not expect extremely high correlations because Internet connectivity varies widely. While some hosts are connected by a high-speed network connection, other hosts are connected by slower, less well-connected networks. Different backbone connections are another source of error; because these backbones only connect to one another at a few sites, a file exchanged between two hosts on different backbones, no matter how close to each other geographically, may have to travel quite far on the Internet. As an example, the hosts maddog.harvard.edu and carrara.bos.marble.com, are both located near Boston, but since one is on the MCI backbone and the other is on the Sprintlink backbone packets between them must pass through Washington, DC where the two backbones connect.
Figures
and
display
the data from our east coast observations for distance versus network
hops, distance versus latency, distance versus backbone hops, and
network hops versus latency. The Colorado and west coast observing
runs yielded similar results.
Figure: Results of Network Survey: Network Hops and Network Backbone Hops.
Note that geographical distance establishes a lower bound for network
hops. Note also the number of hosts in the sub-100 mile range that are
0 backbone hops away.
Figure: Results of Network Survey: Network Latency. Note that the latency graph was
cropped at 200 ms for clarity; there were 17 hosts with latencies ranging from
200ms to 1s that were removed.
In looking for signs that geographical distance predicts network distance (network hops, backbone hops, and latencies), we were encouraged by the apparent correlation shown in the graphs. We also noticed the trend that nearby hosts show the greatest correlation between geographic distance and network distance. Once the distance exceeds 500 miles, the importance of geographic distance decreases.
We hypothesized that if we limited our analysis to hosts on the same
backbone network, we would find stronger correlation between
geographical distance and network distance. Table
presents the results of this study. To
examine this hypothesis we divided the hosts into several groups, one
group for each backbone, and then computed the correlations for each
backbone separately.
Table: Backbone-based correlations for geographical distance versus network
hops, latency, and backbone hops, as well as network hops versus
network latency. We have divided our samples into groups based on the
backbone to which they are connected. Measurements were taken from a
host on the 204.70 backbone (the NSFNET); notice how correlations are strongest
overall for other hosts on 204.70.
These observations affirm our hypothesis that the correlations between geographic distance and Internet distance are higher overall when looking at hosts on the same backbone than when looking at all hosts. This result suggests that it will be advantageous to steer clients toward host caches that are both geographically close and on the same backbone network.
We included a comparison of network hops to latency because
calculating expected latency on a network is hard. It requires a
knowledge of network bandwidth and expected loading. If the number of
network hops between two computers is related to the latency, then by
optimizing to reduce network hops we are also optimizing to reduce
latency. Figure
indicates that there is a
moderate correlation between hops and latency: fewer than average hops
implies low latency, and more than average hops implies a high
latency. This is helpful, because it implies that steering a host from
a distant cache to a close cache will decrease latency as well as
network traffic. As we saw in section
this
should be one of the primary goals of replication schemes.
We investigated latency further by following up on a suspicion voiced
by Bestavros in a private conversation. He suspected that latency was
primarily caused by crossing between backbones, not necessarily by the
number of individual backbone hops. This hypothesis would make sense
if connection points between backbones proved to be bottlenecks and
sources of congestion. We therefore modified our survey to also
include the number of backbones traversed. The results of this new
survey are in Figure
. There is a clear
correlation between the maximum latency observed and the number of
backbones crossed, although we can not draw any further conclusions
from the data. We hope to follow up on this finding in future work; if
it turns out that latency is strongly related to the number of
bandwidth hops then simply mirroring web sites on multiple backbones
should reduce latency considerably.
Figure: Number of backbones versus latency. There is a clear correlation
between the maximum latency observed and the number of backbones
crossed. This would support Bestavros' suspicion that crossing
backbones accounts for the majority of the Internet's latency. Notice
that no latencies greater than 100 ms were observed without crossing
at least one backbone boundary.