James Gwertzman Margo Seltzer Microsoft Corporation Harvard University One Microsoft Way Pierce Hall Redmond, WA 98007 Cambridge, MA 02138 jamesgw@microsoft.com margo@eecs.harvard.edu
The Web may be evolving from a static world of graphic images and informative text to an ever changing source of customized information, but unfortunately, there is nothing inherent in the Web's struc-ture that takes full advantage of the networked nature of the Internet.
Consider a variation on the classic Turing test for artificial intelligence [18]. Take a snapshot of the files from several popular web servers, and place them on a computer's hard drive. Install a modified web server that accesses these files with appropriately added network delays to emulate connecting with a remote server. Now take a second system that is actually connected to the Web, and install a gateway that can only access the selected sites. Our new test, called the Connectivity Test, is passed if a user can distinguish between the local server and the remote server.
We claim that most of today's Web sites would fail the Connectivity Test. This is disappointing, since one of the benefits to being online comes from real-time social interaction with other users. Humans have a pro-found need to interact with other humans, and the value of online interaction has been proven [17]. It has been estimated that 60% of the activity on America Online, for example, comes from communicating with other users, with 30% of that directly from time spent in their chat rooms [9].
One company, Electronic Communities, realizes this need and is trying to fill it with their cyberspace operating system, COS [5]. We believe that the Web is sufficiently extensible that replacing it is not necessary. Instead, we present extensions possible within the existing Web infrastructure that facilitate full social interaction. The extensions we propose have evolved from the virtual communities that have existed for over ten years. Habitat, one of the first virtual worlds, was built in 1984 for the Commodore 64, and is still around today in various forms [21]. Habitat's creators published one of the seminal works on virtual communities, describing their experiences managing one of the first virtual worlds [11]. One of the most important lessons learned from Habitat is that sophisticated graphics and rapid interactivity are not required for complex behaviors to emerge.
Sociologists quickly picked up this observation, and a number of papers have been written describing the virtual com-munities that have developed in text-only, online forums [3, 4, 11, 16, 17]. These forums share several characteristics that we believe must be brought to the World Wide Web, the most important of which is the notion of self. In all of the online communities, establishing an online identity is, by far, the most rewarding aspect of being online. Smith states that establish-ing an online identity is the only reason why people spend so much time online in news-groups, answering questions left by other users [17].
On the World Wide Web, the popularity of creating one's own "vanity license plate" or personal web page attests to this need to establish identity. In order to incorporate people fully into the Web, we must extend the notion of self further, so that users may be identified as they browse the web, not merely as they create it. Building a, "Great Good Place" on the Web [13] requires giving users the ability to identify each other as they frequent similar locations.
The number of forums that provide such services are growing; Agents, Inc. uses its music recommendation service, Firefly [6], as a means for building community. The lack of an overall Web architecture means that users must restrict their avatars to a single "hangout", since their identity cannot travel with them as they move about the web. This is good news for the companies that provide these "hangout spots" but bad news for the end user, who would like to be able to traverse the entire Web, identity intact.
Fortunately, adding these new abstractions for people and for places onto the current Web infrastructure is not difficult. HTTP already provides many of the necessary hooks; no new hardware needs to be deployed. Clients and servers simply need to agree on common protocols.
A place is clearly a stateful entity, while most web sites today are stateless. This is partly a direct fallout of the stateless HTTP protocol, and partly the Web's focus on accessing shared information. There is no need to stay connected to a Web site once you have retrieved its information. Sites that provide real-time information do so in non-uniform manners.
According to the above definition, a Web site is not a place for people because no space has been allocated for them. The sense of going somewhere on the Web ("Have you been to that new online art museum?") is generated solely by content, and not by interaction with other users.
Creating places on the Web for people means setting space aside for them. A Web place does not necessarily have to be a Web server; a Web place is a virtual construction that might span several servers, or that might share one server with a number of places.

Figure 1:"Things" on a current Web site: a collection of pages that make up a virtual place.

Figure 2:"Persons" and "Things" on a virtual Web place: the place allows people to see each other as well as the content.
One advantage of using an HTTP header for relaying group information is that the header can be attached to any type of file distributed by the server, including both HTML and VRML pages, as well as postscript files, graphical images, or applets. Any of these files can be part of a place. Another advantage to using headers to distribute group information is that the client may retrieve the MULTICAST-URI using the HTTP HEAD request, without needing to download the description of the place itself.
A further advantage to this solution is that it makes the proper distinction between static information that may be cached, and dynamic information that may not. The static files that describe a place may be cached by intermediate caching proxies, thus saving bandwidth. The dynamic information describing the ever changing collec-tion of users in the room is distributed through more appropriate, non-cacheable channels.
As mentioned earlier, we expect that better multi-cast protocols must be used to integrate people and places into the Web seamlessly. In the meantime, however, an acceptable model consists of configuring a web server with a cooperating chat server to create the multi-cast channel for any places that may exist, in part, on that Web server.
Figure 3. Diagram of server operation. 1) The web browser accesses a place. 2) The browser returns the IRC channel associated with the place. 3) The browser makes a connection to an IRC server. 4) A user in a different country requests the place through a proxy server. 5) The proxy server retrieves a copy of the place file. 6) The proxy server returns the IRC channel to the user. 7) The user connects to a local IRC server. 8) The local IRC server connects to other IRC servers.
Browsers must also be modified to support the notion of place. When the "place-aware" browser encounters a web object that belongs to a place, it receives the chat channel in the MULTICAST-URI header field. The browser is then able to establish a chat connection on the designated channel. We are experimenting to determine the optimal user interface; our planned prototype will display the pages of content that make up the things in the place with the traditional browser interface, and it will launch a separate chat interface to allow users to interact with each other.
The chat interface must allow users to highlight interesting content. This involves modifying the browser to support a shared "whiteboard" on which a place's occupants can draw or point out content details (e.g. "The answer to that question is here; take a look at those details in the picture.") The interface must also allow users to transport together from place to place (e.g. "If you like antique cars, we should go here."). This facility could be used to produce Internet tours or interactive, online lectures.
Adding places to the existing Web infrastructure is relatively simple, but the addition of people requires significant new infrastructure. This is addressed in the next session.
A person's name and attributes are merely state that must be communicated between browsers and servers, while authentication introduces substantial infrastructure requirements. There is no single Internet authentication model that pro-vides users with globally unique identifiers. Several companies are interesting in solving this problem; the Microsoft Net-work and Verisign [19] are two examples of corporate entities actively pursing this area. In the meantime, IRC addresses the unique identifier issue by using a nickname server that helps users select unique names.
Users cannot protect their identities or assign permissions based on identifiers unless the nickname server is coupled with an authentication protocol. As the number of companies conducting business on the Internet grows, we expect that standards for authentication infrastructures will evolve to provide for secure financial transactions. Most are based on public key certificate authorities.
The model we will assume for the remainder of this section is that users carry with them their name, attributes, and authentication information. Initially, interaction between people is not authenticated. A user enters a Web place by requesting an object in that place and by joining the appropriate multi-cast group. The multi-cast group handles the notification message announcing the user's arrival, and relays the user's IP address to the group.
Users in the group can then use the IP address and user ID to connect to the user's local Web browser to download the user's public attribute set. The browser's "chat" interface knows how to display and use this information; it may do different things based on the fields that it finds. If it finds an image property, for example, then it may present this photograph to other users.
Authentication is triggered by mutual consent between two or more parties present in the place. Users can retrieve each other's public keys from their address books, and use these keys to correspond securely. The security hole with this solution is the insecure transfer of IP addresses and user ID's. One user can easily masquerade as another without a central trusted authority performing the IP address distribution.
As a person inhabits a room, that person may chose to reveal private attributes to some or all of the other people in the place. This authentication model can be extended to produce the notion of an authenti-cated place, giving the server authentication responsibility. The server would only allow certain people access, barring entry to any user that was not properly authenticated.
Anonymous browsing is incorporated easily into this model, simply by having a user's browser not join the multi-cast group. A user could even join the multi-cast group, but not transmit an IP address, or transmit the IP address, but limit the publicly available address book properties. Unfortunately, in today's model there is no way to prevent a server from collecting and/or reveal-ing a user's IP address. One approach to providing anonymity is to follow the lead of anonymous re-mailers [2]. An anonymous HTTP proxy server could accept HTTP requests on behalf of clients, retrieve pages, and forward the requests back to the initiating client. Implementing such an anonymous proxy server requires only minor modifications to any of the available proxy caches, such as the CERN proxy cache.
Adding people to the Web is more challenging than adding places, primarily due to the authentication issues and the competing goals of providing interaction and protecting privacy. Unless it becomes easy for all users to subscribe to such a model, it will be very difficult to achieve the critical mass that any new Internet proposal needs in order to achieve wide-spread adoption.
The extensions that we have described bring together the Web and virtual environments. Online services such as Amer-ica Online and CompuServe already support these environments, but it is in their vested commercial interest to maintain them as closed communities, separate from the World Wide Web. The integration of the World Wide Web and virtual envi-ronments will evolve from both directions.
The text-based environments are beginning to introduce a graphical component, where users interact with graphical avatars in 3D environments. Similarly, our future work focuses on extending the World Wide Web to incorporate the interaction found in virtual environments. Such a transition requires little additional work, out-side of adding the 3D environments, and the VRML [20] description language already provides one standard for building a graphical world.
VRML does not, however, support the display of or interaction between multiple users. Using multi-cast for multi-user communication is sufficient for text-based interactions, adding graphical interactions is not difficult. The user's per-sonal attributes must be extended to provide a visual representation of the user's avatar. This might be as simple as a carry-ing a bitmap or a set of bitmaps for simple animation, or as complex as a Java class describing the avatar's programmed behaviors. The multi-cast protocol must be extended to filter messages based on geographical proximity, a step that might require the introduction of multi-cast filters that reside on central servers.
These geographical filters prevent users from being overwhelmed by message traffic in large rooms, since a user should logically only be able to hear conversations that occur nearby. Naturally there must be some way to decide how far one's voice carries and to provide the notion of whispering (being heard only by those very close by) and public speaking (being heard by a large number of people in the vicinity).
The multi-cast protocol must also be used to distribute location information; each avatar must broadcast its location to nearby clients so that users see similar views of the environment. The distance filtering function is important here as well, since the exact location of users who are far away will not be as important. In practice, the frequency with each avatar broadcasts its location information will differ with different "shouting" ranges. An avatar might broadcast its location every two or three seconds very loudly (heard at long range), once a second more moderately (heard in moderate range), and twice a second very softly. This reduces the amount of traffic that needs to flow across the Internet and through each user's modem, since users only need accurate distance information for avatars to whom they are very close.
Since most clients on the Internet today are limited in the amount of information they can exchange rather than the amount of processing power they have available, it might also make sense for clients to predict avatar movement based on predictive movement models, similar to those used by the military as part of the SIMNET protocol [8]. Here, each avatar broadcasts not only its location, but also its velocity and acceleration. Browsers use this information to calculate each ava-tar's current location in the absence of updates. With this model, avatars need to broadcast location information more fre-quently when changing direction and speed frequently than when moving along a fixed course.
More research is also needed for determining the ideal user interface for these interactive environments. The Computer-Human Interaction com-munity is actively examining these issues, but the results have not yet been applied to the current generation of VRML browsers, which often have a clumsy 3D navigation interface. More research is also needed in the virtual presentation of information and hypertext links. Should links always be presented with a standard interface such as doors? Or should the presentation of links always be left up to the designer of the world? How should users interact with information within a vir-tual environment? Should the browser switch between 3D navigation mode and 2D content presentation mode? Or should information be presented in a separate window, while the 3D navigation and user interaction takes place in the primary win-dow?
Another area that we are pursuing is electronic commerce. The Habitat group has been very vocal in expressing the importance of a virtual economy in creating a compelling experience. We believe that when the virtual environment being discussed is the World-Wide Web, instead of a small, closed environment, electronic commerce will be especially important, since digital cash will actually enable a thriving, virtual economy of information and online entertainment. Digi-tal cash will benefit more than just large corporations; users will be able to set up their own virtual "pushcarts" to sell goods and services.
The current Web defines an object as a stream of data referenced by a URL. The notion of a place expands on the cur-rent notion of a web object and provides virtual locations in which people congregate. The notion of a person is currently non-existent on the web, and provides the mechanism for multi-user chats and multi-user games, as well as providing basic marketing information.
[2] Anonymous Remailer. Send mail to "help@anon.penet.fi" to retrieve Penet remailer information.
[3] Clark, Tim. "Putting People in Social Computing," Interactive Week. November 27, 1995.
[4] Curtis, Pavel, and Nichols, David. "MUDs Grow Up: Social Virtual Reality in the Real World," in the IEEE Compcon 1994 Conference Proceedings, pp 186-192. 1994.
[5] Farmer, F. Randall, Morningstar, Chip, Crockford, Douglas, "From Habitat to Global Cyberspace," in the IEEE Com-pcon 1994 Conference Proceedings, pp 186-192. 1994.
[6] Firefly. "http://www.agentsinc.com/"
[7] Gosling, James and McGilton, Henry. "The Java(tm) Language Environment: A White Paper," available from http:// java.sun.com/whitePaper/java-whitepaper-1.html. 1995.
[8] Institute of Electrical and Electronics Engineers. IEEE Standard for Information Technology - Protocols for Distributed Interactive Simulation Applications : Entity Information and Interaction. IEEE, New York. 1984.
[9] Merrill Lynch. Merrill Lynch Global Securities Research, September 14, 1995.
[10] Microsoft, "Microsoft Visual Basic Enters the Online Arena With Visual Basic Script, an Open Scripting Solution for Internet Applications." Press Release, Dec. 7th, 1995. Available from http://www.mi crosoft.com/ internet/ vbscripr.htm.
[11] Morningstar, Chip and Farmer, F. Randall, "The Lessons of Lucasfilm's Habitat," in Cyberspace, ed. Michael Bene-dict. MIT Press, Cambridge. 1991.
[12] Netscape Chat. Available from http://www.netscape.com/comprod/chat.html
[13] Oldenburg, Ray. The Great Good Place: Cafes, Coffee Shops, Community Centers, Beauty Parlors, Gen eral Stores, Bargs, Hangouts, and How They Get You Through the Day. Paragon House, New York. 1991.
[14] Oikarinen, J. "Internet Relay Chat Protocol," Network Working Group RFC 1459 (May 1993). Network Information Center.
[15] The Palace. "http://www.thepalace.com/"
[16] Reid, Elizabeth M. "Electropolis: Communication and Community on Internet Relay Chat," Honours The sis, University of Melbourne. 1991.
[17] Smith, Marc A. "Voices from the WELL: The Logic of the Virtual Commons," Master's Thesis, Univer sity of California, Los Angeles. 1991.
[18] Turing, A.M. Computing machinery and intelligence. Mind, 59, 433-560. Weizenbaum, J. (1976). Com puter power and human reason. San Francisco, CA: W.H. Freeman.
[19] Verisign. "http://www.verisign.com/"
[20] VRML. "http://rosebud.sdsc.edu/vrml/"
[21] WorldsAway. "http://www.worldsaway.com/"