People, Places, and Things: The Next Generation Web

People, Places, and Things: The Next Generation Web

		James Gwertzman        Margo Seltzer 
		Microsoft Corporation  Harvard University 
		One Microsoft Way      Pierce Hall
		Redmond, WA 98007      Cambridge, MA 02138
		jamesgw@microsoft.com  margo@eecs.harvard.edu

Abstract

Today's World-Wide Web was designed for sharing information among many users, yet it lacks abstractions for people and places. Browsing the Web is a solitary activity because Web clients and servers do not natively support multi-user interaction. This paper discusses these limitations in the current Web architecture and proposes a set of extensions to improve the multi-user, interactive nature of the Web.

1 Current Web Limitations

The current navigation metaphor for the World-Wide Web consists of pages of content and hyper-links between them. The system was designed for sharing information, and it satisfies that need well. However, the Web is moving away from being a static information repository towards becoming an interactive, dynamic environment. Sun's Java [7] has attracted a great deal of attention as one way to facilitate the creation of interactive Web content, but many other solu tions are quickly coming to market. In this dynamic environment, hyper-text links no longer simply connect HTML pages-they link miniature "applets" built out of OLE controls and Netscape plug-ins, wired, for example, with Visual Script [10].

The Web may be evolving from a static world of graphic images and informative text to an ever changing source of customized information, but unfortunately, there is nothing inherent in the Web's struc-ture that takes full advantage of the networked nature of the Internet.

Consider a variation on the classic Turing test for artificial intelligence [18]. Take a snapshot of the files from several popular web servers, and place them on a computer's hard drive. Install a modified web server that accesses these files with appropriately added network delays to emulate connecting with a remote server. Now take a second system that is actually connected to the Web, and install a gateway that can only access the selected sites. Our new test, called the Connectivity Test, is passed if a user can distinguish between the local server and the remote server.

We claim that most of today's Web sites would fail the Connectivity Test. This is disappointing, since one of the benefits to being online comes from real-time social interaction with other users. Humans have a pro-found need to interact with other humans, and the value of online interaction has been proven [17]. It has been estimated that 60% of the activity on America Online, for example, comes from communicating with other users, with 30% of that directly from time spent in their chat rooms [9].

2 Social Interaction

The need for online interpersonal interaction has not been lost on the purveyors of virtual worlds, companies that cre ate virtual environments in which users can interact, often graphically. WorldsAway [21], The Palace [15], and Microsoft's V-Chat are just a few. In these environments, users can create online alter-egos, frequently called avatars, with which they can explore a graphical environment and interact with other users. MUDs, or Multi-User Dungeons, provide similar capabilities in text-only form, although newer research MUDs incorporate multimedia [4]. The prob lem with these sites is that they are isolated environs, unconnected with each other or with other network resources. It is pos-sible to conduct very sophisticated interactions between people within one of these sites, but is impossible to go over the walls of the closed system and interact with other Internet objects.

One company, Electronic Communities, realizes this need and is trying to fill it with their cyberspace operating system, COS [5]. We believe that the Web is sufficiently extensible that replacing it is not necessary. Instead, we present extensions possible within the existing Web infrastructure that facilitate full social interaction. The extensions we propose have evolved from the virtual communities that have existed for over ten years. Habitat, one of the first virtual worlds, was built in 1984 for the Commodore 64, and is still around today in various forms [21]. Habitat's creators published one of the seminal works on virtual communities, describing their experiences managing one of the first virtual worlds [11]. One of the most important lessons learned from Habitat is that sophisticated graphics and rapid interactivity are not required for complex behaviors to emerge.

Sociologists quickly picked up this observation, and a number of papers have been written describing the virtual com-munities that have developed in text-only, online forums [3, 4, 11, 16, 17]. These forums share several characteristics that we believe must be brought to the World Wide Web, the most important of which is the notion of self. In all of the online communities, establishing an online identity is, by far, the most rewarding aspect of being online. Smith states that establish-ing an online identity is the only reason why people spend so much time online in news-groups, answering questions left by other users [17].

On the World Wide Web, the popularity of creating one's own "vanity license plate" or personal web page attests to this need to establish identity. In order to incorporate people fully into the Web, we must extend the notion of self further, so that users may be identified as they browse the web, not merely as they create it. Building a, "Great Good Place" on the Web [13] requires giving users the ability to identify each other as they frequent similar locations.

The number of forums that provide such services are growing; Agents, Inc. uses its music recommendation service, Firefly [6], as a means for building community. The lack of an overall Web architecture means that users must restrict their avatars to a single "hangout", since their identity cannot travel with them as they move about the web. This is good news for the companies that provide these "hangout spots" but bad news for the end user, who would like to be able to traverse the entire Web, identity intact.

3 A Proposal

Netscape's Chat product [12] is a step in the right direction, but building community on the World Wide Web is more complicated than simply tacking chat rooms on to existing web pages. People need places online where they can interact with each other to share information, and they need persistent identities with which to interact.

Fortunately, adding these new abstractions for people and for places onto the current Web infrastructure is not difficult. HTTP already provides many of the necessary hooks; no new hardware needs to be deployed. Clients and servers simply need to agree on common protocols.

3.1 Places in Cyberspace

The easiest abstraction to understand is that of the place. The American Heritage Dictionary [1] defines place as:

  1. a. An area with definite or indefinite boundaries; a portion of space. b. Room or space, especially ade-quate space: There is place for everyone at the back of the room.
  2. a. The particular portion of space occupied by or allocated to a person or thing.b. A building or an area set aside for a specified purpose: a place of worship.

A place is clearly a stateful entity, while most web sites today are stateless. This is partly a direct fallout of the stateless HTTP protocol, and partly the Web's focus on accessing shared information. There is no need to stay connected to a Web site once you have retrieved its information. Sites that provide real-time information do so in non-uniform manners.

According to the above definition, a Web site is not a place for people because no space has been allocated for them. The sense of going somewhere on the Web ("Have you been to that new online art museum?") is generated solely by content, and not by interaction with other users.

Creating places on the Web for people means setting space aside for them. A Web place does not necessarily have to be a Web server; a Web place is a virtual construction that might span several servers, or that might share one server with a number of places.

Figure 1:"Things" on a current Web site: a collection of pages that make up a virtual place.


Figure 2:"Persons" and "Things" on a virtual Web place: the place allows people to see each other as well as the content.


Extending the Web to support places can be very simple. A place consists of three components:

The communication substrate is a multi-cast group. Whenever the HTTP server replies to a request for any of the objects that comprise the place it must also return a new HTTP response header entitled "MULTICAST-URI" that allows the client to identify and interact with the collection of people who are currently in the room. This URI must be globally unique, and the protocol that the URI refers to must support the requisite semantics for managing a group of users. This group must support the JOIN, LEAVE, and BROADCAST operators and a property that identifies the set of users currently visiting the place. As HTTP currently has no group communication protocol, for the remainder of this discussion, we will adopt the IRC protocol [12] as our multi-cast scheme. In this case, an IRC channel identi fies the multi-cast group for a specific place. It is important to note, however, that IRC implies a text-based protocol, and we look forward to the availability of richer multi-cast pro-tocols that will allow for audio, video, etc.

One advantage of using an HTTP header for relaying group information is that the header can be attached to any type of file distributed by the server, including both HTML and VRML pages, as well as postscript files, graphical images, or applets. Any of these files can be part of a place. Another advantage to using headers to distribute group information is that the client may retrieve the MULTICAST-URI using the HTTP HEAD request, without needing to download the description of the place itself.

A further advantage to this solution is that it makes the proper distinction between static information that may be cached, and dynamic information that may not. The static files that describe a place may be cached by intermediate caching proxies, thus saving bandwidth. The dynamic information describing the ever changing collec-tion of users in the room is distributed through more appropriate, non-cacheable channels.

As mentioned earlier, we expect that better multi-cast protocols must be used to integrate people and places into the Web seamlessly. In the meantime, however, an acceptable model consists of configuring a web server with a cooperating chat server to create the multi-cast channel for any places that may exist, in part, on that Web server.

Figure 3. Diagram of server operation. 1) The web browser accesses a place. 2) The browser returns the IRC channel associated with the place. 3) The browser makes a connection to an IRC server. 4) A user in a different country requests the place through a proxy server. 5) The proxy server retrieves a copy of the place file. 6) The proxy server returns the IRC channel to the user. 7) The user connects to a local IRC server. 8) The local IRC server connects to other IRC servers.

Browsers must also be modified to support the notion of place. When the "place-aware" browser encounters a web object that belongs to a place, it receives the chat channel in the MULTICAST-URI header field. The browser is then able to establish a chat connection on the designated channel. We are experimenting to determine the optimal user interface; our planned prototype will display the pages of content that make up the things in the place with the traditional browser interface, and it will launch a separate chat interface to allow users to interact with each other.

The chat interface must allow users to highlight interesting content. This involves modifying the browser to support a shared "whiteboard" on which a place's occupants can draw or point out content details (e.g. "The answer to that question is here; take a look at those details in the picture.") The interface must also allow users to transport together from place to place (e.g. "If you like antique cars, we should go here."). This facility could be used to produce Internet tours or interactive, online lectures.

Adding places to the existing Web infrastructure is relatively simple, but the addition of people requires significant new infrastructure. This is addressed in the next session.

3.2 People in Cyberspace

The challenges in providing an abstraction for people on the Web is twofold: providing a personae that can be carried with a user from place to place, and providing anonymity for the user who wishes to browse content without social interaction. According to the American Heritage Dictionary, a person is:

  1. A living human being. Often used in combination: chairperson; spokesperson; salesperson.
  2. An individual of specified character: a person of importance.
  3. The composite of characteristics that make up an individual personality; the self.
  4. The living body of a human being: searched the prisoner's person.
  5. Physique and general appearance.
  6. Law. A human being or an organization with legal rights and duties.
The above definition tells us that a person has characteristics, a general appearance, and a set of legal rights and duties. A person should be no different on the Internet. The abstraction for a person on the net must consist of:

The dictionary definition of a person includes the notion of legal rights and duties, but an investigation of these rights and duties in Cyberspace is beyond the scope of this paper. We limit our discussion of rights and duties to a single right: the right of anonymity. As we introduce personae into the Internet, we retain the ability to browse anonymously.

A person's name and attributes are merely state that must be communicated between browsers and servers, while authentication introduces substantial infrastructure requirements. There is no single Internet authentication model that pro-vides users with globally unique identifiers. Several companies are interesting in solving this problem; the Microsoft Net-work and Verisign [19] are two examples of corporate entities actively pursing this area. In the meantime, IRC addresses the unique identifier issue by using a nickname server that helps users select unique names.

Users cannot protect their identities or assign permissions based on identifiers unless the nickname server is coupled with an authentication protocol. As the number of companies conducting business on the Internet grows, we expect that standards for authentication infrastructures will evolve to provide for secure financial transactions. Most are based on public key certificate authorities.

The model we will assume for the remainder of this section is that users carry with them their name, attributes, and authentication information. Initially, interaction between people is not authenticated. A user enters a Web place by requesting an object in that place and by joining the appropriate multi-cast group. The multi-cast group handles the notification message announcing the user's arrival, and relays the user's IP address to the group.

Users in the group can then use the IP address and user ID to connect to the user's local Web browser to download the user's public attribute set. The browser's "chat" interface knows how to display and use this information; it may do different things based on the fields that it finds. If it finds an image property, for example, then it may present this photograph to other users.

Authentication is triggered by mutual consent between two or more parties present in the place. Users can retrieve each other's public keys from their address books, and use these keys to correspond securely. The security hole with this solution is the insecure transfer of IP addresses and user ID's. One user can easily masquerade as another without a central trusted authority performing the IP address distribution.

As a person inhabits a room, that person may chose to reveal private attributes to some or all of the other people in the place. This authentication model can be extended to produce the notion of an authenti-cated place, giving the server authentication responsibility. The server would only allow certain people access, barring entry to any user that was not properly authenticated.

Anonymous browsing is incorporated easily into this model, simply by having a user's browser not join the multi-cast group. A user could even join the multi-cast group, but not transmit an IP address, or transmit the IP address, but limit the publicly available address book properties. Unfortunately, in today's model there is no way to prevent a server from collecting and/or reveal-ing a user's IP address. One approach to providing anonymity is to follow the lead of anonymous re-mailers [2]. An anonymous HTTP proxy server could accept HTTP requests on behalf of clients, retrieve pages, and forward the requests back to the initiating client. Implementing such an anonymous proxy server requires only minor modifications to any of the available proxy caches, such as the CERN proxy cache.

Adding people to the Web is more challenging than adding places, primarily due to the authentication issues and the competing goals of providing interaction and protecting privacy. Unless it becomes easy for all users to subscribe to such a model, it will be very difficult to achieve the critical mass that any new Internet proposal needs in order to achieve wide-spread adoption.

4 Future Work

We are currently building a community Web prototype to test out our ideas. We will use this prototype to explore other extensions to the Web that will help build community on the Web. One such needed extension is a directory ser vice to maintain both persistent and dynamic user information. In order to locate a friend, for example, it will be nec essary to look up that friend's current IP address. This information will be used to locate the user's Web browser, and the Web browser will provide the friend's current location.

The extensions that we have described bring together the Web and virtual environments. Online services such as Amer-ica Online and CompuServe already support these environments, but it is in their vested commercial interest to maintain them as closed communities, separate from the World Wide Web. The integration of the World Wide Web and virtual envi-ronments will evolve from both directions.

The text-based environments are beginning to introduce a graphical component, where users interact with graphical avatars in 3D environments. Similarly, our future work focuses on extending the World Wide Web to incorporate the interaction found in virtual environments. Such a transition requires little additional work, out-side of adding the 3D environments, and the VRML [20] description language already provides one standard for building a graphical world.

VRML does not, however, support the display of or interaction between multiple users. Using multi-cast for multi-user communication is sufficient for text-based interactions, adding graphical interactions is not difficult. The user's per-sonal attributes must be extended to provide a visual representation of the user's avatar. This might be as simple as a carry-ing a bitmap or a set of bitmaps for simple animation, or as complex as a Java class describing the avatar's programmed behaviors. The multi-cast protocol must be extended to filter messages based on geographical proximity, a step that might require the introduction of multi-cast filters that reside on central servers.

These geographical filters prevent users from being overwhelmed by message traffic in large rooms, since a user should logically only be able to hear conversations that occur nearby. Naturally there must be some way to decide how far one's voice carries and to provide the notion of whispering (being heard only by those very close by) and public speaking (being heard by a large number of people in the vicinity).

The multi-cast protocol must also be used to distribute location information; each avatar must broadcast its location to nearby clients so that users see similar views of the environment. The distance filtering function is important here as well, since the exact location of users who are far away will not be as important. In practice, the frequency with each avatar broadcasts its location information will differ with different "shouting" ranges. An avatar might broadcast its location every two or three seconds very loudly (heard at long range), once a second more moderately (heard in moderate range), and twice a second very softly. This reduces the amount of traffic that needs to flow across the Internet and through each user's modem, since users only need accurate distance information for avatars to whom they are very close.

Since most clients on the Internet today are limited in the amount of information they can exchange rather than the amount of processing power they have available, it might also make sense for clients to predict avatar movement based on predictive movement models, similar to those used by the military as part of the SIMNET protocol [8]. Here, each avatar broadcasts not only its location, but also its velocity and acceleration. Browsers use this information to calculate each ava-tar's current location in the absence of updates. With this model, avatars need to broadcast location information more fre-quently when changing direction and speed frequently than when moving along a fixed course.

More research is also needed for determining the ideal user interface for these interactive environments. The Computer-Human Interaction com-munity is actively examining these issues, but the results have not yet been applied to the current generation of VRML browsers, which often have a clumsy 3D navigation interface. More research is also needed in the virtual presentation of information and hypertext links. Should links always be presented with a standard interface such as doors? Or should the presentation of links always be left up to the designer of the world? How should users interact with information within a vir-tual environment? Should the browser switch between 3D navigation mode and 2D content presentation mode? Or should information be presented in a separate window, while the 3D navigation and user interaction takes place in the primary win-dow?

Another area that we are pursuing is electronic commerce. The Habitat group has been very vocal in expressing the importance of a virtual economy in creating a compelling experience. We believe that when the virtual environment being discussed is the World-Wide Web, instead of a small, closed environment, electronic commerce will be especially important, since digital cash will actually enable a thriving, virtual economy of information and online entertainment. Digi-tal cash will benefit more than just large corporations; users will be able to set up their own virtual "pushcarts" to sell goods and services.

5 Conclusions

This paper presents a set of extensions to the current World-Wide Web infrastructure that enable a new class of multi- user applications that are currently either extremely difficult or impossible to develop. Like the current Web infra structure, these provisions make minimal assumptions about the Web client and are therefore applicable for both browser-based multi-user applications as well as stand-alone multi-user applications.

The current Web defines an object as a stream of data referenced by a URL. The notion of a place expands on the cur-rent notion of a web object and provides virtual locations in which people congregate. The notion of a person is currently non-existent on the web, and provides the mechanism for multi-user chats and multi-user games, as well as providing basic marketing information.

6 Bibliography

[1] The American HeritageÆ Dictionary of the English Language, Third Edition. Houghton Mifflin Company.

[2] Anonymous Remailer. Send mail to "help@anon.penet.fi" to retrieve Penet remailer information.

[3] Clark, Tim. "Putting People in Social Computing," Interactive Week. November 27, 1995.

[4] Curtis, Pavel, and Nichols, David. "MUDs Grow Up: Social Virtual Reality in the Real World," in the IEEE Compcon 1994 Conference Proceedings, pp 186-192. 1994.

[5] Farmer, F. Randall, Morningstar, Chip, Crockford, Douglas, "From Habitat to Global Cyberspace," in the IEEE Com-pcon 1994 Conference Proceedings, pp 186-192. 1994.

[6] Firefly. "http://www.agentsinc.com/"

[7] Gosling, James and McGilton, Henry. "The Java(tm) Language Environment: A White Paper," available from http:// java.sun.com/whitePaper/java-whitepaper-1.html. 1995.

[8] Institute of Electrical and Electronics Engineers. IEEE Standard for Information Technology - Protocols for Distributed Interactive Simulation Applications : Entity Information and Interaction. IEEE, New York. 1984.

[9] Merrill Lynch. Merrill Lynch Global Securities Research, September 14, 1995.

[10] Microsoft, "Microsoft Visual Basic Enters the Online Arena With Visual Basic Script, an Open Scripting Solution for Internet Applications." Press Release, Dec. 7th, 1995. Available from http://www.mi crosoft.com/ internet/ vbscripr.htm.

[11] Morningstar, Chip and Farmer, F. Randall, "The Lessons of Lucasfilm's Habitat," in Cyberspace, ed. Michael Bene-dict. MIT Press, Cambridge. 1991.

[12] Netscape Chat. Available from http://www.netscape.com/comprod/chat.html

[13] Oldenburg, Ray. The Great Good Place: Cafes, Coffee Shops, Community Centers, Beauty Parlors, Gen eral Stores, Bargs, Hangouts, and How They Get You Through the Day. Paragon House, New York. 1991.

[14] Oikarinen, J. "Internet Relay Chat Protocol," Network Working Group RFC 1459 (May 1993). Network Information Center.

[15] The Palace. "http://www.thepalace.com/"

[16] Reid, Elizabeth M. "Electropolis: Communication and Community on Internet Relay Chat," Honours The sis, University of Melbourne. 1991.

[17] Smith, Marc A. "Voices from the WELL: The Logic of the Virtual Commons," Master's Thesis, Univer sity of California, Los Angeles. 1991.

[18] Turing, A.M. Computing machinery and intelligence. Mind, 59, 433-560. Weizenbaum, J. (1976). Com puter power and human reason. San Francisco, CA: W.H. Freeman.

[19] Verisign. "http://www.verisign.com/"

[20] VRML. "http://rosebud.sdsc.edu/vrml/"

[21] WorldsAway. "http://www.worldsaway.com/"