Distributed file systems

next up previous
Next: Internet Caching Schemes Up: Related Work Previous: Related Work

Distributed file systems


Much of the current research in autonomous replication has been heavily influenced by the caching subsystems of distributed file systems. Efficient operation of a large-scale distributed file system depends on caches to reduce the load on file servers; systems like Sun's NFS [26][27] that do not rely on long-term caches cannot support more than a few hundred clients, whereas systems like XFS [35] that distribute server load among clients are designed to scale to many thousands of clients.

The challenge of offering distributed access to a wide-area information system is almost identical to that faced by a large-scale distributed file system; some researchers have proposed making this relation explicit [33] by actually serving files from the Web over a wide-area distributed system like the Andrew File System [28]. Although this idea is still contentious, because it is not yet clear if the semantics of the World Wide Web match those of a distributed file system closely enough to simply implement the one on top of the other, we can nevertheless still learn much from the distributed file system community about building large-scale systems. In the next section we discuss the large-scale distributed file system designed by Blaze as his PhD thesis.

Large-Scale Hierarchical File Systems


The goal of Blaze's thesis [4] was to design a distributed file system that could scale to a very large size, operating across the Internet. He achieves this goal by building an explicit hierarchical system where clients can not only cache items indefinitely but can also serve them to other clients.

Blaze introduces four types of scale that a successful system must address: population, traffic, administrative, and geographic. Each type has its own set of issues. Scaling with population requires a system to cope with growth in the total number of potential clients and implies that a server should not need to store any information about its clients. Scaling with traffic requires the ability to handle the workload generated by all clients. Scaling administratively implies an ability to span autonomous entities. Finally, coping with geographic scale requires the ability to span large distances, with the inherent latency implied therein.

The Web in its current form addresses these by eliminating many of the frills associated with distributed file systems such as caching and write-sharing. Our challenge is adding an optimized caching system to the Web without reducing its ability to scale along these lines.

Before designing his system, Blaze gathered traces from an NFS server to determine optimal cache strategies for his distributed file system. The most important trend that emerged is that files tend to display strong inertia, where the same files tend to be opened again in the same mode by the same user. It appears, for example, that keeping a file in a client cache for two hours results in an average hit rate of over 66%.

Blaze also found that most files are overwritten while still very young. If a file is not changed soon after creation, it becomes unlikely that it will be changed. Files therefore rapidly move toward a state of being read-only, and shared-read files are even less likely to be written to. Finally, files opened for reading by other machines are likely to be opened for reading by still others. These results imply that cache sharing will work well for a distributed file system because it is unnecessary to update shared-read files often. We will see that a similar analysis drives the Alex FTP cache as well, and in chapter gif we will see that these results also apply to the World Wide Web since Web files that are globally popular changes less often than those that are primarily used locally.

Blaze analyzed a variety of different caching schemes. Flat file systems such as NFS, where all clients connect to the same server, inevitably suffer from a bottleneck effect. Eventually the single server is unable to cope with the connected clients. Even with optimal client caching strategies the server must still deal with 8-12% of the original traffic, and as the system grows this will eventually overwhelm any server.

The only solution is to distribute load among several servers. Replicating all files at secondary servers, however, is too expensive, and caching items at secondary servers as requests pass through them creates delays. Furthermore, having an intermediate cache process all results yields surprisingly little gain. Most objects not in the client cache were not in the intermediate cache either. The answer is to look in the caches maintained by other clients: 60-80% of client cache misses were of files already in another cache.

When the load at a server becomes too high, the server stops serving the file directly and only sends out lists of other servers that it knows have cached the file. Clients attempt to first satisfy file requests from their own file caches, then from servers listed in a local name cache, and finally from the file's primary host. If a client receives a list of servers instead of the file, it caches that list in its name cache, and attempts to contact one of those servers instead. We will use a similar method in section gif to locate replicated files on the Web.

There are several failure modes with this system. When two machines can both talk to a server but not to each other, the server may redirect one to another causing a fault. There are also some consistency issues that arise from computers that are temporarily disconnected from the server and can not receive invalidation messages.

There are several ways in which we apply Blaze's work to our own. We already mentioned that we use his method for locating nearby replicas, and that Web files behave in a similar way to files in a distributed system. We also use his definitions of scalability in section gif to discuss our own system's scalability.



One system that bridges the gap between data replication and distributed file systems is Alex [8]. Alex allows a remote FTP server to be mapped into the local file system so that its files can be accessed using traditional file operations.

This is accomplished through an Alex server that communicates with remote FTP sites through FTP, caches files locally, and communicates with the local file system through NFS. When the user wants to access a remote file, Alex downloads the file using FTP and then caches the file locally.

When another user needs the same file and tries to access it through the file system, it is not necessary to FTP it again. The file will simply be taken out of Alex's cache rather than from the original FTP site. This provides a savings in bandwidth and also provides a measure of reliability in the face of network failures. It also saves local disk space because users do not have to make personal copies of the same FTP files; users simply use the files directly off of the Alex server.

Alex's most novel feature is its cache-consistency mechanism. As we have already seen from Blaze's thesis, the older a file is the less likely it is to be changed. Therefore, the older a file is, the less often Alex has to poll to insure that its local copy is up-to-date. To avoid excessive polling, Alex only guarantees that a file is never more than 10% out of date with its reported age. As an example, if a file is 1 month old, then alex will serve the file for up to three days ( days = 3 days) before checking to see if it is still valid.

This is efficient, not only because FTP servers do not have to propagate file updates, but because it also avoids the necessity of modifying FTP servers to add more sophisticated invalidation techniques. In chapter gif we find that this cache-consistency scheme is applicable to the World Wide Web as well as to FTP. In the next section we examine Web proxies that extend the concepts of Alex to the World Wide Web by caching Web data just as Alex cached FTP data.

World Wide Web proxies


A Web proxy [25] acts as a gateway to the World Wide Web for a campus-sized network or corporation behind a firewall. The proxy accepts a request for a Web documents from a client, retrieves and caches the document, and then makes it available to its client. When another client wants the same document the proxy can serve it from its cache without having to re-request the same document.

The most popular Web browsers support proxies [29] and the HTTP protocol has provisions to help maintain cache consistency. If the proxy wishes to determine if its cached page is up-to-date, a lightweight protocol exists known as the Conditional GET. A browser may make a get request to the server that includes a timestamp, the If-Modified-Since field. If the page has been modified since that time, the server will re-send the page. Otherwise the server will respond with a Not Modified message.

Netscape, a company that sells a commercial web-proxy, claims that a 2 or 3 gigabyte Web proxy can support thousands of internal users and can provide a cache hit ratio as high as 65%. Results such as these indicate that web proxies can help alleviate some web scalability concerns, but as we saw above with Blaze, even optimal client caching still does not help distribute server load sufficiently. Web proxies also require human intervention to set up properly, and proxies must still satisfy cache misses from the primary host.

next up previous
Next: Internet Caching Schemes Up: Related Work Previous: Related Work

James Gwertzman
Wed Apr 12 00:26:11 EDT 1995