Exponential Smoothing

Dr. Dobb's Journal March 1998

By William Stallings

William is the author of High-Speed Networks: TCP/IP and ATM Design Principles (Prentice Hall, 1998). He can be reached at http://www.shore.net/~ws/ or ws@shore.net.

Whenever your software interfaces with hardware or other software, it's hard to know what to expect. How much memory will be required? How long will it take? The most robust systems measure past performance, then use that to predict future response. In this way, your software can smoothly adapt to varying situations.

In "Improving Kermit Performance" (DDJ, February 1996; available electronically, see "Resource Center," page 3), I used a "decaying average" to predict future delays, then used those predictions to optimize error handling. This month, William explores this technique -- also known as "exponential smoothing" -- in more detail and explains a variety of uses.

-- Tim Kientzle

You often need to guess the next value in a time series based on previous values in the time series. For example, the nonpreemptive scheduling policy known as "shortest process next" (SPN) selects the process with the shortest expected processing time (before it is blocked by an I/O or system call) to run next. For this purpose, the operating system keeps a running average of all of the previous processing bursts for each process, and uses these averages to estimate the next bursts. Running averages for estimating future values also show up in many areas of communications protocol design.

Smoothing Techniques

Suppose you have a series of measured values V(1), V(2), V(3),..., where V(i) is the value observed at the ith observation. Assume you want to predict (estimate) the value V(K+1) based on the value up through V(K). One approach would be simply to take the average of observed values, as in Figure 1(a), where AV(K) is the average of the first K values. The estimate for the next value in the series, V(K+1), is equal to AV(K).

Figure 1(b) shows the expression reformulated so that it is not necessary to recalculate the entire summation each time.

Each term in the summation is given equal weight; that is, each term is multiplied by the same constant 1/K. Typically, you would like to give greater weight to more recent instances because they are more likely to reflect future behavior. A common technique for predicting the next value on the basis of a series of past values is known as "exponential averaging," or "exponential smoothing." Figure 2(a) shows how you compute SV(K), the "smoothed estimate."

Compare Figure 2(a) with Figure 1(b). By using a constant value of (0<<1), independent of the number of past observations, you have a circumstance in which all past values are considered, but more distant ones have less weight. To see this more clearly, consider Figure 2(b). Since both and (1-) are less than 1, each successive term in the preceding equation is smaller. For example, for a=0.8, the expansion is Figure 2(c). The older the observation, the less it is counted in the average.

Figure 3 shows the size of the coefficient as a function of its position in the expansion. The smaller the value of , the greater the weight given to the more-recent observations. For a=0.5, virtually all of the weight is given to the four or five most recent observations, whereas for =0.875, the averaging is effectively spread out over the ten or so most recent observations. The advantage of using a small value of is that the estimate will quickly reflect a rapid change in the observed quantity. The disadvantage is that brief surges in the value of the observed quantity will result in jerky changes in the smoothed average.

Figure 4 compares simple averaging with exponential averaging (for two different values of ). In Figure 4(a), the observed value begins at 1, grows gradually to a value of 10, and then stays there. In Figure 4(b), the observed value begins at 20, declines gradually to 10, and then stays there. In both cases, you start out with an estimate of SV(1)=0. Exponential averaging tracks changes in process behavior faster than does simple averaging and the smaller value of a results in a more rapid reaction to the change in the observed value.

TCP Retransmission Timer

In the Transmission Control Protocol (TCP), two TCP entities exchange segments over a TCP connection. Each side retains a copy of each segment that it sends and, if an acknowledgment is not received within a retransmission timeout (RTO) interval, resends the segment. Thus, if a segment is lost in transit, it will automatically be resent. If RTO is set too small, then a TCP entity may retransmit a segment for which an acknowledgment is on its way but delayed because of network congestion; such unnecessary retransmissions actually worsen the congestion. On the other hand, if RTO is too large, the protocol will be sluggish in responding to a lost segment.

Initially, TCP used a smoothed average to estimate the round-trip time. However, this didn't work well if the round-trip time varied significantly. To cope with this, Van Jacobson proposed a refinement in the standard algorithm that has now been officially adopted for TCP. Figure 5 summarizes this algorithm.

The Van Jacobson algorithm provides an improvement by estimating both the round-trip time (RTT) and the standard deviation of the RTT. The RTO value is then set to the estimated value of RTT plus a constant multiplied by the estimated standard deviation. This enables the use of more reasonable values of the retransmission timer. Standard implementations of TCP use the parameters g=1/8=0.125, h=1/4=0.25, and f=4. Experience has shown that Van Jacobson's algorithm can significantly improve TCP performance.

ATM ABR Service

Support for bursty data traffic on ATM networks, such as traffic generated by TCP/IP-based applications, has traditionally been carried on the so-called Unspecified Bit Rate (UBR) service. UBR is designed to make use of available capacity on the ATM network not consumed by time-sensitive traffic such as voice and video. All of this unused capacity could be made available for the UBR service. This service is suitable for applications that can tolerate variable delays and some cell losses, which is typically true of TCP-based traffic. With UBR, cells are forwarded on a first-in-first-out (FIFO) basis using the capacity not consumed by other services; both delays and variable losses are possible. No initial commitment is made to a UBR source and no feedback concerning congestion is provided; this is referred to as a "best-effort service."

To improve the service provided to bursty sources that would otherwise use UBR, the Available Bit Rate (ABR) service has been defined. An application using ABR specifies a peak cell rate (PCR) that it will use and a minimum cell rate (MCR) that it requires. The network allocates resources so that all ABR applications receive at least their MCR capacity. Any unused capacity is then shared in a fair and controlled fashion among all ABR sources. The ABR mechanism uses explicit feedback to sources to assure that capacity is fairly allocated. Any capacity not used by ABR sources remains available for UBR traffic.

To determine how much capacity to allow to each connection, each ATM switch must monitor the traffic through itself. One of the techniques recommended by the ATM Forum for this purpose uses the equation MACR(I)=(1-)×MACR(I-1) +×CCR(I) for each virtual connection through the switch, where CCR is a measure of the current cell rate on a given connection, and MACR (mean allowed cell rate) is an estimate of the cell rate for the connection in the next sampling period. Typically, =1/16, so that more weight is given to past values of CCR than to the current value. Based on the estimates for each connection, if a switch experiences congestion, it restricts the cell rate of the connections passing through it in a manner that is proportional to the estimated demand from each connection.

Real-Time Transport Protocol (RTP)

Real-time transport protocol (RTP) is designed to support both point-to-point and multicast real-time applications. RTP overcomes three deficiencies in TCP for real-time applications:

TCP is a point-to-point protocol that sets up a connection between two end points. Therefore, it is not suitable for multicast distribution.
TCP includes mechanisms for retransmission of lost segments, which then arrive out of order. Such segments are not usable in most real-time applications.
TCP contains no convenient mechanism for associating timing information with segments, which is another real-time requirement.

One key requirement for receivers of real-time traffic is to estimate the amount of delay variation, or jitter, experienced on a connection in order to determine buffering requirements. In essence, the receiver would like to pass incoming traffic through a delay buffer that will smooth out delay variability so that the received data has the same timing characteristics as the transmitted data. There is no simple way to measure this quantity at the receiver, but it is possible to estimate the average jitter in the following way. At a particular receiver, you define the following parameters for a given source:

S(I) = Timestamp from RTP data packet I.
R(I) = Time of arrival for RTP data packet I, expressed in RTP timestamp units. The receiver must use the same clock frequency (increment interval) as the source, but need not synchronize time values with the source.
D(I) = The difference between the interarrival time at the receiver and the spacing between adjacent RTP data packets leaving the source.
J(I) = Estimated average interarrival jitter up to the receipt of RTP data packet I.

The value of D(I) is calculated as D(I)=(R(I)-R(I-1))-(S(I)-S(I-1)). Thus, D(I) measures how much the spacing between arriving packets differs from the spacing between transmitted packets. In the absence of jitter, the spacings will be the same and D(I) will have a value of 0. The interarrival jitter J(I) is calculated continuously as each data packet I is received, according to the formula in Figure 6. J(I) is calculated as an exponential average of observed values of D(I). Only a small weight is given to the most recent observation, so that temporary fluctuations do not invalidate the estimate. The estimated value of jitter is reported to other participants in the connection. The jitter measure may provide a warning of increasing congestion before it leads to packet loss.

Internet Congestion Control

When routers in a network become congested to the point of buffer saturation, it becomes necessary to discard incoming or buffered packets. It may be desirable for routers to begin to discard packets before total saturation occurs. In this way, TCP connections receive early warning of increased congestion as the result of a few lost packets. The TCP connections can then back off the volume that they generate to avoid more catastrophic congestion. The most important example of a proactive packet discard scheme is Random Early Detection (RED), introduced by Sally Floyd. RED has been implemented by a number of router vendors.

In essence, RED monitors the queue length for an output port on the router. Associated with each output buffer are two thresholds THmax and THmin. When the queue length is less than THmin, no discard occurs; when it is greater than THmax, any arriving packet is discarded. Between these two thresholds, an incoming packet has an increasing probability of being discarded as the queue length grows from the lower to the higher threshold. The use of two thresholds and a sliding probability scale affords the opportunity to fine-tune the algorithm, and experience has shown that it can be effective in anticipating congestion without wasteful discard.

However, rather than use the actual queue length for the calculation, RED uses a smoothed average queue length SQ(K)=(1-W)×SQ(K-1)+W×Q(K), where Q is the actual queue length, SQ is the smoothed average, and W is a weighting factor.

You might wonder why an average queue size is used when it would be simpler to use the actual queue size. The purpose of using an average queue size is to filter out transient congestion at the router. The weight W determines how rapidly SQ changes in response to changes in actual queue size. Floyd recommends a quite small value of 0.002. As a result, SQ lags considerably behind changes in actual queue size. The use of this small weight prevents the algorithm from reacting to short bursts of congestion.

Summary

Exponential averaging is used in a surprisingly wide variety of protocols and congestion-control algorithms. By varying the weighting factor used in the average, more or less weight can be given to recent observations compared to more distant ones. Anyone needing a simple technique for making estimates based on past observations should be aware of this versatile tool.

References

The algorithm used by TCP to adjust its retransmission timeout was described by Van Jacobson in "Congestion Avoidance and Control." Proceedings, SIGCOMM '88, Computer Communication Review, August 1988. The article was reprinted in Computer Communication Review, January 1995. A revised version is available at ftp.ee.lbl.gov/papers/congavoid.ps.Z.

The RED technique was described by Sally Floyd and Van Jacobson in "Random Early Detection Gateways for Congestion Avoidance," IEEE/ACM Transactions on Networking, August 1993.

DDJ

Back to Table of Contents