2-Bit Diff-Serv Routing in the ALTQ FreeBSD Kernel

Jon Bennett and Christopher Stein

Final Paper for CS244

Harvard University

January 18, 1999

  1. Abstract
  2. This paper briefly describes our experience modifying an operating system kernel to do 2-bit differentiated services (diff-serv) routing and conducting experiments to observe its behavior. We modified the ALTQ FreeBSD kernel and Adserv extensions. We encountered difficulties with Ethernet cards and device drivers. Once the system was working, we ran experiments with bursty premium flows and investigated the lower level RIO queue and its parameter design space.

     

  3. Introduction
  4. The 2-bit scheme is described in the IETF draft "A Two-bit Differentiated Services Architecture for the Internet", by Nichols, Jacobson, and Zhang. The scheme is rather straightforward. Two bits of the IP header are used to classify packets. The first bit specifies if the packet is premium (PR) or not. If the packet is not PR, the second bit specifies if it is assured (AF) or best effort (BE). The PR traffic has its own IP queue. This queue is serviced with absolute priority over the lower level BE/AF queue. Random Early detection with In/Out (RIO) bit is applied to the lower level queue. Under this scheme, both AF and BE traffic classes have independent Random Early Detection (RED) parameters.

     

  5. Organization
  6. Section 4 describes the software systems that we have worked with in developing a working 2-bit diffserv router. Section 5 discusses our first experiments with PR and BE traffic and how this led us to the discovery of unexpected Ethernet card behavior. Section 6 briefly discusses our PR experiments. Section 7 looks at RIO and the BE/AF queue,

     

  7. Implementation
  8. We built on top of two extant systems. The ALTQ kernel developed by Kenjiro Cho of Sony Research is a modified FreeBSD kernel that provides support for several queueing disciplines. This kernel provides an infrastructure for adding new queueing disciplines and configuring device parameters. The Adserv contribution to ALTQ developed by Octavio Medina of France Telecom provides support for a rough approximation of 2-bit Diff-Serv. Our code development includes modifications to the Adserv queueing code so that premium traffic is served with absolute priority. As well, we wrote device interface routines for specifying queue lengths and RIO parameters from the shell.

     

  9. Ethernet Device Driver
  10. The implementation of 2-bit diff-serv in the ALTQ kernel contained a number of problems that had to be addressed before we could begin generating useful experimental results,. However, it was the interaction of the ALTQ diff-serv code with the device driver for the Inter FXP Ethernet card that caused the most difficulty.

    1. Experimental Results
    2. For our first experiments we chose simply to verify that the ALTQ code successfully supported the multiple queues for BE traffic and PR traffic. Testing BE vs. PR was the "easiest" test of the ALTQ code that we could perform, in the sense that implementing two FIFOs should be more straightforward then implementing the double accounting needed for running red on the BE and AF flows. To perform this test we sent five BE flows from one source workstation and one PR flow from the second source workstation. The queue sizes were set at 200 packets for the BE queue and 200 packets for the PR queue. Since the PR queue is served with highest priority and there is only one PR flow we should expect that there would be no more than one packet at a time in the PR queue. Furthermore, we should also expect to see the PR flow receive 100% of the available bandwidth (~1.2Kbps) since its window size of 64KB is larger than the bandwidth delay product between the source and sink workstations.

      Figure 1 Initial Results

      In Figure 1 we see the results of out first experiment with the PR flow shown in blue achieving only 400Kbps! Even more surprisingly, the accounting output on the bottleneck router indicated that not only was the PR queue growing but if the queue size was set to only 20 packets there was significant packet loss by the PR flow. Another odd behavior that we observed was that it took many 10’s of seconds for the PR flow to ramp up to its maximum speed, despite the fact that it should have ramped up almost instantly. Increasing the PR queue to 200 packets only eliminated the packet loss but not any of the other behaviors. These results seemed to defy explanation and they forced us to conduct a significant number of tests to track down the source of this behavior.

    3. Explanation of Anomalies
    4. At first we suspected a number of areas in the ATLQ diff-serv code of being responsible for these strange behaviors. After removing all the suspect code the problem still remained. Eventually we even went as far as implementing from scratch a two queue static priority server, which still displayed the same behavior! While still not being able to understand how it was possible for there to be queue growth in the PR queue we decided to first explore why the PR flow was not receiving 100% of the bandwidth. Since we were using TCP flows for our traffic sources and there was packet loss on the PR queue it was reasonable to assume that TCP congestion control was causing the PR flow to not receive 100% of the bandwidth.

      To test this theory we first increased the PR queue size until there was no packet loss by the PR flow. This change had no effect on the behavior. Next we tried using a greedy UDP source for the PR flow. With a UDP PR flow we achieved exactly the results that we expected. The PR flow immediately received 100% of the bandwidth. When no packets were being lost the only difference between the TCP and UDP sources was that the TCP flow still was constrained by its window flow control to not put more that 64KB of data into the network at a time. This would turn out to be the key factor. Increasing the TCP window size of the PR flow from 64KB to 256KB caused the PR flow to receive 100% of the link bandwidth. However the TCP PR flow still took many 10’s of seconds to ramp up while a UDP PR flow would ramp up immediately.

      The explanation of these results turned out to be caused by a 128 packet FIFO used to allow direct memory access (DMA) transfers between the kernel and the Ethernet card. This 128 packet DMA FIFO was added to the FreeBSD kernel to reduce the number of interrupts that the kernel would have to handle from 1 interrupt per packet to 1 interrupt per 128 packets. Unfortunately this had the effect that the PR and BE traffic would share a queue. With a TCP window off 64KB and 1500 byte packets the TCP flows would have no more than 43 packets in flight at one time. This is almost exactly 1/3 of the size of the DMA FIFO which matches the results shown in Figure 1 where the PR flow receives 1/3 of the bandwidth.

    5. Further Results
    6. To further explore the effects of the DMA FIFO we compiled a number of kernels with different sized DMA FIFOs.

      Figure 2 shows an experiment with a kernel with a 4 packet DMA FIFO in which the PR flow shown in blue receives 100% of the link bandwidth which seems to make it clear that the DMA FIFO was the cause of the strange behavior shown in Figure 1.

      Figure 2: 4 Packet DMA FIFO

       

      Figure 3 shows the same experiment with a 96 packet DMA FIFO in which the PR flow received ~44% of the link bandwidth. With 43 packets being 44.79% of the 96 packet FIFO we now seem to have a very clear demonstration of cause and effect.

      Figure 3: 96 Packet DMA FIFO

       

      Figure 4 shows the same experiment with a 64 packet DMA FIFO in which the PR flow received ~66% of the link bandwidth. As before this matches the expected results as 43 packets is 67.18% of the 64 packet FIFO.

       

      Figure 4: 64 Packet DMA FIFO

       

      In the last experiment with a 32 packet DMA FIFO the PR flow received 100% of the link bandwidth which is to be expected as the 43 packet TCP window is larger than the 32 packet DMA FIFO.

      Figure 5: 32 Packet DMA FIFO

       

       

       

  11. Premium Service
  12. Our first experiments attempted to discover the impact of a single greedy PR flow on a BE flow. We expected to see a quick ramping up of PR traffic and starvation of BE. At first we did not see this. Our continued attempts to reach this experimental result drove our debugging process, leading us to the discovery of the strange behavior of the Intel EtherExpress cards and the impact of this behavior on the queueing algorithms and end-to-end behavior. Once we had played around with the single flows we moved to tests with a single bursty PR source and several BE senders. Stein discusses these experiments further in his paper "The Effects of Bursty Premium Traffic on Best Efforts Traffic in 2-bit Diff-Serv".

     

  13. Assured/Best Effort Service
  14. The lower level queue runs the RIO scheme. RIO is heavily parameterized. Random early detection (RED) is applied to both BE and AF packets. Each of these two classes has it's own RED parameters. These are the minimum queue threshold; the point at which probabilistic packet dropping begins, the maximum queue threshold; after which all packets are dropped, the drop probability at the maximum threshold, and the alpha used for computing the weighted average length of the queue. The weighted average queue length is used for all drop probability calculations. For the computation of the BE average queue length both BE and AF packets are counted. Only AF packets are counted for the computation of the AF queue length. So the presence of an AF packet in the queue will affect the drop probability of an arriving BE packet while the presence of a BE packet in the queue will have no affect on the drop probability for AF packets. For this single RIO queue we have 8 parameters to worry about. How do we tune these parameters to engineer behavior? Our experimental results indicate that this is a difficult and challenging task. Bennett discusses this further in his paper "Parameter Tuning for Best Effort and Assured Traffic in a Diff-Serv Implementation".

     

  15. Conclusion
  16. While bringing the system to a working state that met the 2-bit specification we had the most difficulty with the Ethernet card and its device driver.

    In a high speed environment DMA transfers may be a requirement to avoid subjecting the main processor to the high interrupt load that would otherwise occur. We expect that the DMA logic of network cards will become more and more complex as network speeds require that processing is batched or offloaded from the main processor. We have observed that these policies at the device and driver levels can have significant effects on end-to-end traffic behavior.