Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 10. Threading

10.2. Synchronizing Block Access: Critical Regions

For most synchronization requirements, a single atomic operation is not sufficient. Such cases require exclusive access to what is known as a critical region [Bulk1999]. For example, if you have two variables to update atomically, you must use a synchronization object to ensure each thread is granted exclusive access to the critical region, as in:



// Shared objects


SYNC_TYPE   sync_obj;


SomeClass   shared_instance;


. . .


// Somewhere in code called by multiple threads


lock(sync_obj);


int i = shared_instance.Method1(. . .);


shared_instance.Method2(i + 1, . . .);


unlock(sync_obj);

The two operations Method1() and Method2() must be conducted in an uninterrupted sequence. Hence, the code wherein they are called is encapsulated within calls to acquire and release a synchronization object. In general, the use of all such synchronization objects is very expensive, and so ways in which their costs can be minimized or avoided are desirable.

There are two causes of the high-costs of synchronization objects to secure critical regions. The first is that the costs of using the synchronization objects themselves can be high. For example, consider the timings (in milliseconds) shown in Table 10.1, for 10 million acquire-release cycles of four Win32 synchronization objects, and a control scenario consisting of two empty function calls. The results plainly show that the cost of using synchronization objects is considerable, up to 150 times that of a normal function call.

Table 10.1.
Synchronization Object
Uniprocessor machine
SMP machine
None (control)
117
172
CRITICAL_SECTION
1740
831
Atomic inc
1722
914
Mutex
17891
22187
Semaphore
18235
22271
Event
17847
22130

The second cost associated with synchronization objects for protecting critical regions is that incurred by any threads that are blocked from entering the critical region. The longer the critical region, the more likely this cost is to be incurred, and therefore it is good to keep critical regions as short as possible or to break them into subcritical sections, as was discussed in section 6.2. However, because the costs of the acquire and/or release calls can be very high, the balance between breaking critical regions and the causing long waits for pending threads is a delicate one. Only performance profiling can give you definitive answers on a case-by-case basis.

10.2.1 Interprocess and Intraprocess Mutexes

Mutexes are the most common form of synchronization object for guarding critical regions and, depending on the operating system, can be two kinds: interprocess and intraprocess. An interprocess mutex is one that may be referenced in more than one process, and can therefore provide interprocess synchronization. On Win32, such a mutex is normally created by calling CreateMutex() and naming the object. Other processes may then access the same mutex by specifying the same name to CreateMutex() or OpenMutex().^[6] With PTHREADS [Bute1997], the UNIX POSIX standard threading library, a mutex may be passed to its child via a process fork, or can be shared via mapped memory.

^[6] An unnamed mutex handle may also be passed to a child process through other IPC mechanisms, but naming is the most straightforward mechanism.

Intraprocess mutexes, by contrast, are only visible to threads within a process. As they do not need to exist across process borders, they can partially or completely avoid the costly trips to the kernel, since their state can be maintained within process memory. Win32 has such a construct, known as a CRITICAL_SECTION, which is a lightweight mechanism that keeps most of the processing out of the kernel, only making kernel calls when the ownership is to be transferred to another thread. There are considerable performance gains to be had by using intraprocess mutexes where suitable, as can be seen in Table 10.1, whose results were obtained by an executable with a single thread. We'll see later how the CRITICAL_SECTION performs when the application is multithreaded.

10.2.2 Spin Mutexes

There's a special kind of intraprocess mutex, which is based on the ordinarily bad practice of polling. Simply, polling is waiting for a condition to change by repeatedly testing it, as in:



int g_flag;





// Waiting thread


while(0 == g_flag)


{}


. . . // Now do what we've been waiting for

This kind of thing chews up the cycles, as the polling thread is often^[7] given equal priority with the thread that will be changing the flag to enable it to proceed. Polling is one of those bad ideas that ordinarily mark one out as a multithreading neophyte, suitable for apple-pie desks or unlooked-for severance pay.

^[7] Depending on respective thread priorities and on any external events on which other threads may be waiting.

However, there are circumstances in which spinning is eminently suitable. Let's first look at an implementation of a spin mutex, the imaginatively named UNIXSTL^[8] class spin_mutex, shown in Listing 10.2.

^[8] The STLSoft subproject that maps UNIX APIs to STL.

Listing 10.2.



class spin_mutex


{


public:


  explicit spin_mutex(sint32_t *p = NULL)


    : m_spinCount((NULL != p) ? p : &m_internalCount)


    , m_internalCount(0)


  {}


  void lock()


  {


    for(; 0 != atomic_write(m_spinCount, 1); sched_yield())


    {}


  }


  void unlock()


  {


    atomic_set(m_spinCount, 0);


  }


// Members


private:


  sint32_t  *m_spinCount;


  sint32_t  m_internalCount;


// Not to be implemented


private:


    spin_mutex(class_type const &rhs);


    spin_mutex &operator =(class_type const &rhs);


};

The mechanism of a spin mutex is very simple. When lock() is called, an atomic write is performed to set the spin variable *m_spinCount (an integer) to 1. If its previous value was 0, then the caller was the first thread to set it, and has "acquired" the mutex, so the method then returns. If its previous value was 1, then the caller was beaten to it, by another thread, and thus does not own the mutex. It then calls the PTHREADS function sched_yield() to yield to another thread, and when it wakes up again, it tries again. It repeats this until it succeeds. Thus it is locked out of ownership of the mutex.

When the thread that acquired the mutex calls unlock(), the spin variable is set back to 0, and another thread may then acquire it. The slight complication with the constructor and the m_internalCount is that this class can be constructed on an external spin variable, which can be very useful in certain circumstances (as we'll see in Chapters 11 and 31).

Spin mutexes are not a good solution where there is a high degree of contention, but where the likelihood of contention is very low and/or the cost of acquiring/releasing the synchronization mechanism must be low, they can be an effective solution. Given their potential high cost, I tend to use them only for initialization where contention is exceedingly rare, but theoretically possible, and must be accounted for.