Internet As A Big Distributed System - Extended Discussion

The Internet as a Big Distributed System session is chaired by Jeff Mogul from Digital Western Research Center. It consists of four presentations. The first presentation by Paul Leach from Microsoft describes an alternative distributed search approach called query routing, which achieves scalability by incorporating traditional systems methods such as hierarchy and hints.

Bruce Zenel from Columbia University outlined some issues they have encountered in implementing general purpose proxies for mobile computing environment and how they deal with those issues.

Syam Gadde from Duke University presented a distributed Internet object cache model called CRISP which consists of cooperating caching servers and a centralized mapping server and reported some early studies of CRISP caches.

Tzi-cker Chiueh from State University of New York at Stony Brook presented the architecture of Artery, a distributed system specifically designed to support network game development. Some techniques for reducing network backswith are explained in detail.


Extended Discussion

The first speaker was Paul Leach from Microsoft. The title of his talk is "Query Routing: Applying Systems Thinking to Internet Search".

Paul started by pointing out that with the exponential growing rate of the Internet, we need a scalable way of search the contents provided by the Internet. Traditional ways of handling scalability use hierarchy and hints. The example Paul gave are traditional name servers such as DNS. The problem with these systems, however, is that you can search on only one attribute. On the other hand, Paul observed that the reason these systems are scalable is because it has the compression property, namely, index information gets compressed as it goes up the tree from children to the parent node. Paul pointed out that this compression property is essential for scalability.

Paul then gave the architecture of the query routing approach. The architecture is a hierarchical tree, with base servers at the leaves that answer queries, and index servers at the internal nodes that redirect queries to base servers or other index servers that have a "better" idea of which index servers or base servers can answer the query. To achieve scalability, the index information (forward knowledge) must posses the compression property. There are three such indexes: centroids, attributes with hierarchical structures and types. Centroids are essentially unions of the database records. The parent in the indexing tree gets the union of the child's centroids. There are cases where this might not scale, resulting in the root index node flooded with all the indexing information. Paul said they haven't started looking at real databases. But he argued that since there are only 4.3 billion people, we can afford to have the root node containing a few bytes for every single people that tells where to go to look for more details. However, he said they couldn't make the same claim about documents.

Finally, Paul summarized their approach: (1)It generalized the DNS technology to multiple searchable attributes. (2)It identified three scalable types of index information. (3)It identified the requirement for index scalability - the compression property. (4)It used hint and hierarchy to achieve scalability.

Discussion:


The next speaker was Bruce Zenel from Columbia University. The title of his talk is "General Purpose proxies: solved and unsolved problems". This is basically an experience paper that describes the issues Bruce and his colleagues have encountered in implementing general purpose proxies for mobile computing environment and how they deal with those issues.

Bruce started by showing a picture of the computing environment that the proxies operate on. It consists of a group of "mobile hosts" communicating with "fixed hosts" via a "proxy". The purpose of the proxy is to process information flowing from the servers (fixed hosts) to mobile hosts to improve quality of service. Types of processing include compression, removal of parts of the data, store-forward data, and replacement of protocols. The reason the proxy is called general purpose proxy is because it is a general execution environment where one can download code that filters protocols such as HTTP, NFS, TCP and MPEG.

Bruce then described eight issues that they went into in implementation of the general purpose proxies. The first issue is routing. Basically they had to build a new routing protocol to guarantee that the proxy is in the data path between the server and the client. The second issue is connection semantics. They used the technique of socket migration (from the client to the proxy) to achieve user transparency. The third issue is TCP end-to-end semantics, which they didn't deal with because of its complexity. The forth issue is adaptation. To achieve that, two types of controls are applied to the proxy. Internal control directly controls the mobile application. External control takes environment information from the network around the mobile host and feeds the information into the filter, allowing it to make decisions on how to process the data. The fifth issue is security. They developed a two-party authentication system as well as an encryption layer on top of the authentication layer. The sixth issue is proxy location. The routing protocol allows variable placement of the proxy. However, once the proxy is placed, its location is fixed. This relates to the seventh issue: proxy mobility. They didn't allow proxy to move after being placed because proxy is a large process and proxy mobility will bring up many issues related to process migration, which they decided to push off for future work. The last issue is adaptation in terms of application vs protocol. There are basically two approaches: adapting network protocols and adapt the applications. They decided to go with the protocol approach because there are many applications but few protocols. As a result, each protocol filter will have more of an impact than a singular rewritten application.

Discussion:


The next speaker was Syam Gadde (picture below) from Duke University. The title of his talk is "Reduce, Reuse, Recycle: An Approach to Building Large Internet Caches".

He started with the motivation of his work: the increasing Internet usage and the need to develop new techniques to reduce traffic on Internet. Several studies have shown that cache shared among many users can achieve 50% or greater hit ratios. And more sharing could be achieved with more users. So their goal is to design a caching architecture that scales to a large number of users and is easily extensible to accommodate the growth of the number of users.

Syam then showed some existing caching architectures. Centralized proxy is the simplest but the least scalable. Harvest uses cooperative caching which is more scalable. A simple Harvest hierarchy was then given. At the root of the hierarchy is the parent node which has connections to the Internet. The internal nodes are proxies and the leaves are clients. The drawbacks of Harvest are: First, every proxy must respond to a query that is generated because of a miss at any proxy. Second, the proxy must wait till all proxies respond to determine whether the URL is in the cooperative cache. Therefore, as the number of nodes increases, the wait time will increase too.

Syam proceeded to present their new caching architecture, CRISP, which stands for Caching & Replication for Internet Service Performance. CRISP consists of two types of nodes: caching servers that cache objects on behalf of the clients and handle requests, and mapping servers that maintain an index for the whole cooperative cache. As requests miss in caching servers, this generates a query to the mapping service, which either returns a miss at which point the caching server goes to the Internet or returns a re-aim that tells it which proxy to forward the request to.

Syam then claimed that the mapping service is not a bottleneck because CPU usage is very low - only on the order of a cache lookup for every query. I/O is negligible because they assume that the map fits in physical memory. Furthermore, each message includes either a URL or an IP address. As a result, network requirement is also low. Syam added that if it turns out that the mapping server is saturated, one can use multiple mapping servers and partition URL spaces among them. The answer to the centralized point of failure argument is that failure of the mapping service only degraded performance, it doesn't compromise correctness.

Next Syam pointed out three advantages of using a centralized mapping service. First, the mapping service could add to the representativeness for the cooperative caching. Second, it could coordinate global replacement strategies. And finally, the mapping service makes it simple to increase the cache capacity.

Then Syam showed some performance number of the CRISP mapping server and comparison of CRISP to Harvest. In conclusion, performance of CRISP is a little better than that of Harvest for 2,3 and 4 nodes cases.

In summary, Syam said that CRISP could provide scalable cache consistency as well as making it extensible to larger number of users. It could be a powerful tool to implement global replacement policies if that turns out to be a nice thing.

Discussion:


The last speaker was Tzi-cker Chiueh from State University of New York at Stony Brook. The title of his talk was "Distributed Systems Support for Networked Games".

Tzi-cker started by describing the generic model for network games. Basically a virtual state is shared among nodes distributed over the network. When the human participants do something, these update-events are exchanged among the nodes to maintain the consistency of the shared state. He pointed out that there is nothing special about the model compared to a distributed shared virtual memory system.

As far as the communication aspect of network game development is concerned, there are two fundamental challenges. The first is to provide a high-level API that hides all network transport details. The other issue is scalability. The system should work on more than 10,000 nodes. The solution they propose is a middle-ware specifically designed to support network-based multi-user interaction systems. Tzi-cker then pointed out the shortcomings of the earliest work in this area, Distributed Interactive Simulation Protocol(DISP). Early versions of DISP requires full replication and strong consistency. There's no support for environment objects. Also, later generations of DISP model apply ad hoc optimizations which should be generalized.

The proposed model, Artery Distributed Systems, provides a shared memory programming model that frees the application programmers from dealing with communication. The model exploits application-specific semantics to maximize network bandwidth efficiency. Finally, It allows programmers to share common code. Tzi-cker then showed the system architecture of the model. Underneath is the DISP protocol. Each node consists of a simulation process where you specify what you should do when something happens and an artery networking module which provides shared memory abstraction to simulation process. In addition, there is an environment state server that keeps objects under control of any of the participants.

Tzi-cker proceeded to describe in detail the techniques incorporated in Artery and emphasized that none of these ideas is new. The first idea is "dead reckoning". Studies have shown that most network traffic is due to position updates. The basic idea is to predict the position change of the simulated entity and suppress the update message if the difference between the actual position and the predicted one is within certain threshold. The second idea is Dynamic Group Consistency Model(DGCM). The idea is to decompose the virtual world into regions and only maintain consistency within the region. The last idea is to reduce per packet overhead by aggregating multiple small packets into one big packet.

Discussions:


Panel Discussion: