SandStorm HTTP Server Results

Matt Welsh, Harvard University
Last updated 1 Nov 2000

Here are some early results comparing a classic threaded HTTP server with an event-driven server using my new primitives (codenamed SandStorm).

Basic server setup: The server in question accepts HTTP connections from clients; for each request it spits out a prepackaged 8192-byte HTML page from memory (no disk access is involved). It supports HTTP/1.1 persistent connections; a client socket is used for 100 HTTP request before it is closed by the server.

Client setup: The clients are simple load generators which spin in a tight loop, opening an HTTP connection to the server and issuing requests. In between each request the client delays for 20ms. When the server closes the HTTP connection the client immediately attempts to reopen the connection.

The server and all clients are 4-way 500 Mhz Pentium III machines running Linux 2.2.15 with IBM JDK 1.1.8. All nodes are connected with Gigabit Ethernet.

Thread-pool server implementation: This implementation is meant to resemble the setup used by Apache. It forks a fixed-size pool of 150 threads. Each thread spins in a loop doing the following:

while (true) {
  synchronized (lock) {
    clientSocket = serverSocket.accept();
  }
  // 100 is the number of request/response pairs per connection
  for (int i = 0; i < 100; i++) {
    readHeader();
    sendResponse();
  }
  clientSocket.close();
}
The choice of 150 threads and 100 requests per connection is identical to that used by "out of the box" Apache configurations. Forking server implementation: This is meant to emulate a "dumb" threads-based implementation which simply forks one thread for each client connection. It makes no attempt to limit the number of threads that it creates. It uses persistent connections and closes down the socket (and kills a server thread) after 100 request/reply pairs.

Event-driven server implementation: This implementation makes use of the 'SandStorm' system that I'm building. A single application thread receives packet events from an underlying nonblocking network layer. It tests incoming packets for the end of an HTTP request, at which point it pushes the 8192-byte response to the appropriate client socket. Most of the details are hidden in the network layer, which uses 3 threads: one for reading data, one for writing data, and one for accepting new socket connections. Therefore the application has a total of 4 threads.


Results: Throughput

This graph shows the aggregate throughput (in terms of completions per second) as a function of the number of clients for each server. As we can see both servers appear to sustain high throughput regardless of the number of clients -- although the thread pool server degrades somewhat at 1000 clients. The forking server cannot run above 500 clients -- it runs out of threads! This is because under Linux kernel v2.2, the maximum number of processes per user is set at 512.

This graph is misleading, however, because it does not tell us how long the clients had to wait to be serviced, or how many clients were serviced at once time. Recall that the threadpool server has just 150 threads, meaning that although it can sustain about 3000 requests/second with 1000 clients, only 150 of those clients are active at any time. The event-driven server should be able to sustain much higher concurrency.


Results: Connect time

This graph shows a histogram of the connect time frequencies for the event-driven server. (The connect time is measured as the amount of time between a client attempting to open a socket and the socket open completing.) As we can see with 100 clients the connect times are very low -- less than 10 ms in most cases (the maximum time was 3534 ms). With 1000 clients, the times are a bit more spread out, with the maximum being 3116 ms.

This graph shows a histogram of the connect time frequencies for the threadpool server. For the 100-client case the maximum time is 3547 ms, but for the 1000-client case the maximum time is 183340 ms. This indicates that the threadpool server (with just 150 threads) is falling behind in processing client requests, even though it sustains high aggregate throughput.

This graph shows a histogram of the connect time frequencies for the forking server. For the 100-client case the maximum time is just 243 ms, but for the 500-client case the maximum time is 45201 ms. This indicates that the server is falling behind in processing client requests, even though is forking one thread per client connection, and sustains high aggregate throughput.


Results: Response time

This graph shows a histogram of per-request response times for the event-driven server. For the 100-client case the response times are clustered close to the origin, the maximum being 1255ms. For the 1000-client case we see more interesting behavior; there is a hump around 400ms, and the tail is much longer -- the maximum time is 12518 ms.

This graph shows a histogram of per-request response times for the threadpool server. For the 100-client case the response times are again clustered close to the origin, the maximum being only 557 ms. But for the 1000-client case there is a very heavy tail, and the maxmimum response time is 190766 ms. Also note the interesting bursts at 3000-ms intervals (these continue off the edge of the graph). This is due to Linux's default TCP SYN timeout, which happens to be 3000ms. If a SYN packet is dropped, the client waits 3000 ms before retransmitting. Under heavy load the server appears to drop more SYN packets (or at least fail to process them in a timely fashion), causing the client to retransmit.

This graph shows a histogram of per-request response times for the forking server. For the 100-client case the response times are again clustered close to the origin, but the maximum is 9617 ms. Note the cluster of response times around 9000 ms; this is no doubt due to the TCP SYN timeout issue described above. For the the 1000-client case there is a very heavy tail, and the maxmimum response time is 48136 ms.


Back to SandStorm Index