CGI and WWW Server Measurement

Michael Courage and Stephen Manley
{courage, manley}@eecs.harvard.edu

Full Paper (postscript 193k)

Introduction

Lots of people have suggestions for improving World Wide Web performance, but very few do a careful analysis of what aspects of the system should be measured and improved.

It is our belief that latency (as observed by Web clients) is the most important factor in user perceived performance. Current benchmarks tend to emphasize throughput (Mbit/s, connections/s, etc) and as a result they are poor predictors of a user's experience. Users are not interested in the number of connections that can be processed every second. They only care about the time between when they click on a link and the arrival of the document on their screen.

In addition no benchmark we have seen takes into account documents generated through the Common Gateway Interface (CGI). Some types of scripts significantly affect the load on the server, and as a result they add latency to all requests.

We have developed a benchmark, WebStone-SS, that simulates realistic workloads in a reproducible way. We analysed logs taken from real web servers to produce a configuration file that drives the benchmark.

With this tool we were able to measure both the effect of realistic workloads on CGI performance, as well as the effect of CGI scripts.

WebStone-SS

We decided that to conduct a reasonable analysis of CGI traffic and its effect on server performance, we first needed a benchmark that could model real WWW traffic. Statistical analysis of server logs indicated that a site's behavior can be modeled by a set of simple configuration variables.

Access pattern heuristics

To generate the configuration files that drive the benchmark, we extract higher level usage patters from WWW logs using simple heuristics. Users are identified by their IP address. Any set of files that the user downloads within 15 seconds of each other, is considered to be a single page. A group of pages downloaded with less than 5 minutes between each download is considered to be a session.

Reduction Heuristics

To reduce the size of the data set, we used three more heuristics to add a level of abstraction.
Combine ifFactor of Reduction
  • Same file sets
2.1
  • Same number of files
  • 75% same file sets
  • Each of the remaining 25% of files is about the same size
4.7
  • Same front (.html) page
2.5

CGI Analysis

Languages and Performance

The advanced string handling capabilities built into the Perl programming language are very well suited to CGI programming. As a result, most CGI scripts are written in Perl.

Perl is an interpreted language, which for many operations runs much slower than compiled languages. In addition the Perl interpreter is a very large program which must be launched every time a Perl script runs. Obviously Perl is a poor choice for large, compute-intensive applications. We set out to answer the following two questions:

  1. Does the start up time for the large interpreter rule out using Perl for latency sensitive Web applications?
  2. Does the heavy interperter increase load on the server that will add latency to all requests?
Start-up Cost
To answer the first question we created two CGI programs, one in Perl and one in C, that return a one byte result. We used (unmodified) WebStone 2.0 to pound the server with requests for each of these scripts. Although on average the Perl version had twice the latency of the C version, the noise present in this measurement was quite a bit larger than the difference between the two runs.

C response time | Perl Response time | Comparison

We conclude that the start-up cost of the Perl interpreter does not add significant latency to a CGI request.

Interpreter effect on Server Load
We used the WebStone-SS benchmark to measure the effect of running short Perl scripts on a realistically loaded server. We made two versions of the software site configuration, one that called the trivial Perl script for every CGI request, and another that loaded a static document instead. The execution of the Perl interpreter produced no observable effect on measured latency of overall traffic on the site.

Perl vs. No CGI

Effect of Server Load on CGI Performance

We used our WebStone 2.0 micro-benchmark test to compare the behavior of a variety of scripts under differing server loads. Most of the latencies went up linearly with the number of clients up to a certain point after which the server failed to service requests properly.

Script latency comparison

We see that the counter and the one byte Perl script were equivalent in terms of latency. Both failed when WebStone reached 30 clients. At this point the error rates jumped up to about 80%. The server's error log reported that at this point it was no longer able to fork off new processes. The Perl script returning 300kb was slower, but also failed at 30 clients. The Glimpse test however failed at only 14 clients. Since the Glimpse script takes so long to execute, the WebStone clients timed out while reading the results.

Effect of CGI on Server Load

To measure the effect of CGI on server load, we made another version of the configuration that called a counter script. We compared the results of this test against the static file configuration. Running the benchmark with these two configurations we found that the counter had little impact on the system. In fact the reported latencies were slightly lower when the CGI was running than when it was not.

Counter vs. No CGI

Since some CGI scripts are much more intensive than counters, we also ran the same trace with the GlimpseHTTP search engine on all CGI requests. Each time the engine searched the NetBSD source tree for "mbufs". In this case more than half of all requests (including non-CGI) took two to three seconds longer to complete.

Glimpse vs. counter.vs no cgi

We believe this slowdown is caused by the search engine's intense use of both the CPU and the disk. The index files the engine uses are over four Mb for the NetBSD source tree. Scanning these huge files will knock Web documents out of the buffer cache. Normal request latency now depends on the speed of the disk.

This result shows two things: CGI should not be ignored in analyses of Web traffic; Web developers should be careful to keep the resource usage of their scripts to a reasonable level.

Conclusions

Other Documents


Margo Seltzer / margo@eecs.harvard.edu