The final component to large-scale autonomous replication is efficiently locating the nearest replica of a given file. It is easy to make a copy of a piece of data; deciding which copy to use is difficult. Resource location, for example, was the primary difference between Blaze's distributed file system and a traditional distributed file system. Under his system, it was not necessary to satisfy a cache miss with the primary host. A host could locate a copy of a file in another host's cache.
Geographical push-caching is similar to Blaze's system in that cache misses can be satisfied out of other caches; it is different in that the locations of these caches are computed so as to minimize network traffic, and cache misses must be satisfied out of the closest cache. Our resource location scheme will therefore need to be able to locate the closest copy of the file.
Guyton and Schwartz are interested in discovering a nearby resource without any sort of centralized database whatsoever . This differs from earlier approaches such as that used in Grapevine for example  which required centralized shared databases.
Guyton and Schwartz try to determine how to choose among a collection of replicated servers such that the selection takes into account network topology . They evaluated a variety of approaches using a network simulator, uncovering a number of tradeoffs between ease of deployment, effectiveness, network cost, and portability. They finally conclude that there is no obvious ``best approach,'' but only a variety of compromises.
At the heart of this research is the fact that in the current Internet there is no magic black box to determine Internet topology. If this information was known, then optimal resource location would be not only possible but trivial, because this global Internet topology map could be consulted to determine exact host distances. The purpose of Guyton and Schwartz's research is to determine the cost and effectiveness of approximating this information through various means; in section we extend this research further by determining how well geographical information approximates Internet topology.
Guyton and Schwartz examine the variety of choices that distinguish between various resource discovery approaches. These choices include: does the client passively gather location information, or does the client actively seek the nearest replica? If the client actively seeks, does it do so on the level of the network routing protocols, or on the application level? If on the application level, does the client probe the network looking for the nearest copy, or does the client gather routing tables thereby building a local copy of the network topology? If the client probes the network, does it do so by selective triangulation or by using measurement servers that attempt to build up topology maps for portions of the Internet?
Each of these choices lies on a spectrum with ease of deployment/high network cost on one end, and difficult deployment/low network cost on the other. None of them are optimal, and only the least accurate scale in a manner appropriate to the World Wide Web. Route probing for example, one of the most accurate methods, requires a measurement server to calculate the shortest path in a dense graph, a non-trivial calculation. Multiply this effort by the thousands of clients that would need to use such a service and this option becomes infeasible. The fact that an efficient means for detecting Internet topology does not exist forces us to turn toward more radical solutions, such as using geography to predict topology. Section describes our research in this direction.