I l@ve RuBoard

A.3 Atomic Operations on Integral Types

Assuming x is a variable of type int, consider this statement:



++x;

It might seem awkward that a book focused on design analyzes a simple increment statement, but this is the thing with multithreading—little issues affect big designs.

To increment x, the processor has to do three operations:

Fetch the variable from memory.
Increment the variable inside the arithmetic logic unit (ALU) of the processor. The ALU is the only place where an operation can take place; memory does not have arithmetic capabilities of its own.
Write the variable back to memory.

Because the first operation reads, the second modifies, and the third writes the data, this troika is known as a read-modify-write (RMW) operation.

Now suppose this increment happens in a multiprocessor architecture. To maximize efficiency, during the modify part of the RMW operation the processor unlocks the memory bus. This way another processor can access the memory while the first increments the variable, leading to better resource use.

Unfortunately, another processor can start an RMW operation against the same integer. For instance, assume there are two increments on x, which initially has value 0, performed by two processors P1 and P2 in the following sequence:

P1 locks the memory bus and fetches x.
P1 unlocks the memory bus.
P2 locks the memory bus and fetches x (which is still 0). At the same time, P1 increments x inside its ALU. The result is 1.
P2 unlocks the memory bus.
P1 locks the memory bus and writes 1 to x. At the same time, P2 increments x inside its ALU. Because P2 fetched a 0, the result is, again, 1.
P1 unlocks the memory bus.
P2 locks the memory bus and writes 1 to x.
P2 unlocks the memory bus.

The net result is that although two increment operations have been applied to x starting from 0, the final value is 1. This is an erroneous result. Worse, neither processor (thread) can figure out that the increment failed and will retry it. In a multithreaded world, nothing is atomic—not even a simple increment of an integer.

There are a number of ways to make the increment operation atomic. The most efficient way is to exploit processor capabilities. Some processors offer locked-bus operations—the RMW operation takes place as described previously, except that the memory bus is locked throughout the operation. This way, when P2 fetches x from memory, it will be after its increment by P1 has completed.

This low-level functionality is usually packaged by operating systems in C functions that provide atomic increment and atomic decrement operations.

If an OS defines atomic operations, it usually does so for the integral type that has the width of the memory bus—most of the time, int. The threading subsystem of Loki (file Threads.h) defines the type IntType inside each ThreadingModel implementation.

The primitives for atomic operations, still inside ThreadingModel, are outlined in the following:



template <typename T>


class SomeThreadingModel


{


public:


   typedef int IntType; // or another type as dictated by the platform


   static IntType AtomicAdd(volatile IntType& lval, IntType val);


   static IntType AtomicSubtract(volatile IntType & lval, IntType val);


   ... similar definitions for AtomicMultiply, AtomicDivide,


   ... AtomicIncrement, AtomicDecrement ...


   static void AtomicAssign(volatile IntType & lval, IntType val);


   static void AtomicAssign(IntType & lval, volatile IntType & val);


};

These primitives get the value to change as the first parameter (notice the pass by non-const reference and the use of volatile), and the other operand (absent in the case of unary operators) as the second parameter. Each primitive returns a copy of the volatile destination. The returned value is very useful when you're using these primitives because you can inspect the actual result of the operation. If you inspect the volatile value after the operation,



volatile int counter;


...


SomeThreadingModel<Widget>::AtomicAdd(counter, 5);


if (counter == 10) ...

then your code does not inspect counter immediately after the addition because another thread can modify counter between the call to AtomicAdd and the if statement. Most of the time, you need to see what value counter has immediately after your call to AtomicAdd, in which case you write



if (AtomicAdd(counter, 5) == 10) ...

The two AtomicAssign functions are necessary because even the copy operation can be nonatomic. For instance, if your machine bus is 32 bits wide and long has 64 bits, copying a long value involves two memory accesses.

I l@ve RuBoard