Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 31. Return Value Lifetime

31.4. Solution 2—TSS

Two of the three parameters to the integer_to_string() functions are provided to ensure thread safety. If we could somehow arrange to have a thread-safe internal buffer, then we'd only need to supply the integer to be converted. Sounds like a job for some Thread-Specific Storage (TSS) (see section 10.5). Using TSS, we can extend the original functions in a new function, say int_to_string(),^[4] which could be defined as:

^[4] There are actually eight functions, corresponding to the eight integer_to_string functions, for each numeric integer type (signed and unsigned 8-, 16-, 32-, and 64-bit integers). It's more effort for the poor library writer (sigh), and occasionally inconvenient for the user, but this way helps ensure that other integral types, for example, wchar_t and bool, are not (mis-)used with these purely numeric functions (see section 19.4).



template< typename C


        , typename I


        >


C const *int_to_string(I value)


{


  const size_t  CCH     = 21; // fits 64-bit + sign


  C             *buffer = i2str_get_tss_buffer<C, CCH>();


  return integer_to_string(buffer, CCH, value);


}

We make CCH = 21, to provide enough space for any integer type up to 64 bits (signed or unsigned). The implementation relies on the function i2str_get_tss_buffer() to return a buffer, of static storage, of the given character type C on a thread-specific basis.

31.4.1 __declspec(thread)

On the Win32 platform, TSS is available in two forms, as we saw in section 10.5.3. For executables and dynamic libraries that are explicitly loaded during application startup [Rich1997], several Win32 compilers provide the Microsoft extension __declspec(thread), which declares a thread-specific variable. Using __declspec(thread) we can offer an implementation of i2str_get_tss_buffer() as follows:^[5]

^[5] Note that in the actual WinSTL implementation each of the eight int_to_string<>() functions use 21 as their buffer length. This actually saves space—both code and data—since the number of instantiations of i2str_get_tss_buffer<>() drop from a potential maximum of 8 to just 2. In addition, it also reduces time costs where the implementation of the function contains nontrivial logic, as we see with the next implementation.



template< typename C


        , size_t   CCH


        >


C *i2str_get_tss_buffer()


{


  __declspec(thread) static C s_buffer[CCH];


  return s_buffer;


}

This implementation has extremely good performance characteristics, only marginally less than integer_to_string() itself (see section 31.3). Alas the restrictions to the use of __declspec(thread) (see section 10.5.3) mean we can't seriously consider it for our conversion library functions.

31.4.2 Win32 TLS

The other form of Win32 TSS is the TLS API, which we discussed in section 10.5.2. Using Win32 TLS, the implementation of i2str_get_tss_buffer becomes:

Listing 31.3.



template< typename C


        , size_t   CCH


        >


C *i2str_get_tss_buffer()


{


  static Key<C, CCH>  s_key;


  Slot<C, CCH>        *slot = s_key.GetSlot();


  if(NULL == slot)


  {


    slot = s_key.AllocSlot();


  }


  return slot->buff;


}

All the work is done by the Slot (see Listing 31.4) and Key (see Listing 31.5) classes, in which we get to give our threading skills a good workout.

Listing 31.4.



template< typename C


        , size_t   CCH


        >


struct Slot


{


  Slot(Slot *next)


    : next(next)


  {}


  ~Slot()


  {


    delete next;


  }


  C     buff[CCH];


  Slot  *next;


};

The Key class allocates a TLS key via TlsAlloc(), which is then used in its GetSlot() and AllocSlot() methods. GetSlot() simply returns the suitably cast return value from TlsGetValue(). AllocSlot() is a little more complicated, since it needs to allocate a Slot instance and add it onto Key's linked-list of Slot instances, within a thread-safe block. This block only needs to guard the integrity of the linked-list—held by the the m_top member—and so does not incorporate the call to TlsSetValue(). (The Slot instances are all destroyed in Key's destructor, which occurs at module/process shutdown and does not need to use any thread-safe measures.)

Listing 31.5.



template< typename C


        , size_t   CCH


        >


struct Key


{


  typedef Slot<C, CCH>  Slot;


  Key()


    : m_key(::TlsAlloc())


  {


    if(TLS_OUT_OF_INDEXES == m_key)


    {


      . . . // throw an exception


    }


  }


  ~Key()


  {


    // Walk the slot list and free. This can be as slow as you


    // like, since performance is not important at this point.


    delete m_top;


    ::TlsFree(m_key);


  }


  Slot *GetSlot()


  {


    // NOTE: This does not need to be thread-safe


    return reinterpret_cast<Slot*>(::TlsGetValue(m_key));


  }


  Slot *AllocSlot()


  {


    Slot  *next;


    { // Protect linked-list manipulation


      lock_scope<thread_mutex>  lock(m_mx);


      m_top = next = new Slot(m_top);


    }


    ::TlsSetValue(m_key, next);


    return next;


  }


private:


  dword_t const m_key;


  Slot          *m_top;


  thread_mutex  m_mx;


};

There's a fair bit of code involved, here, but once the Key has been constructed the first time through the function (on any thread), and a Slot has been allocated the first time through for each thread, there is very little further cost as we'll see in section 31.8. The very observant among you may have noticed a potential race condition^[6] in that there is no thread-serialization protection visible in int_to_string() for the static construction of the Key instance (see Chapter 11).

^[6] I'm sure that those of you who spotted it will have reasoned just how incredibly unlikely this race condition is to be encountered. However "incredibly unlikely" doesn't cut the mustard in multithreaded software development, so it must be rendered impossible.

The solution in this case is that the constructor for the Key class is itself thread safe, via the use of spin mutexes—what else?—and you can find the full implementation on the CD.

So the DLL dynamic-loading issue has been addressed, but the garden's not all green. One problem is that the number of keys on a given Win32 system is finite. On later operating systems, this is not a big problem, but on earlier versions of Windows there are very few TLS keys available (see section 10.5.2). It's not difficult to imagine very sophisticated software having a great many components that utilize TLS, so it is quite conceivable that exhaustion may occur.

Another downside is that this form of TSS is slower (though not by much; see section 31.8) than __declspec(thread). As noted earlier, making all the int_to_string() overloads use CCH set to 21 is efficient in space and time terms. However, there is another benefit. In light of what we now know about the potential scarcity of keys we can see that we will now use a maximum of two TLS keys—for char and wchar_t^[7]—rather than up to eight, making the catastrophic failure to allocate a key significantly reduced.

^[7] If you want to be perverse, you may say three, since you might, if you were brave enough, be doing variable length internationalized string handling on Windows 95/98 using the unsigned char type.

Nonetheless, there is the issue of what to do if and when a TLS key is unavailable. From a practical point of view, one can ensure that both char and wchar_t variants of one of the conversion functions are called in the application initialization. While not proofing the application from failure, this will at least precipitate that failure sooner, so is a significant help with testing practical robustness. Naturally it does absolutely nothing to guarantee that exhaustion cannot happen. Where absolute robustness is required, we must take another approach.

31.4.3 Platform-Independent APIs

The previous two solutions were both variants of the same function(s) in the WinSTL project.^[8] In my commercial incarnation as a consultant for Synesis Software, I implemented a platform-independent version of the int-to-string functions, which have been recently updated to use the STLSoft integer_to_string() function(s). In one of the Synesis core DLLs exists the function LongToStringA()—along with its unsigned, Unicode and 64-bit siblings—defined within the SynesisStd namespace (see Listing 31.6).

^[8] The implementation defaults to the TlsAlloc() version, but allows you to specify the appropriately ugly _WINSTL_INT_TO_STRING_USE_DECLSPECTHREAD_FOR_EXES to use __declspec(thread). If you do so in a DLL build, however, you'll receive some warnings strongly advising you not to do so.

It utilizes the platform-independent TSS library which we saw in section 10.5.4. The implementation of the library involves intraprocess mutual exclusion synchronization objects and ordered thread-identifier slot lists, so it shouldn't come as a great surprise that the disadvantage with this approach is that it performs substantially less well than the other two approaches.

Listing 31.6.



PCAChar LongToStringA(Long value)


{


  const size_t  I2S_LIMIT = 0x7f;


  TssValue      value   = Tss_GetSlotValue(sg_hkeyA);


  PAChar        buffer;


  if(value == 0)


  {


    value = (TssValue)Mem_Alloc_NoTrack(


                              sizeof(AChar) * (1 + I2S_LIMIT)));


    . . .


    Tss_SetSlotValue(sg_hkeyA, value, NULL);


  }


  buffer = SyCastRaw(PAChar, value);


  return integer_to_string(buffer, 1 + I2S_LIMIT, value);


}

31.4.4 RVL

This solution is immune to RVL-LV and RVL-PDP. At first glance it also looks as if we've addressed the problem of RVL-LS by virtue of provided thread-safe buffers, albeit at a substantial increase in complexity. However, this RVL is tricky stuff, and we'll see in the next section that there are still problems with this solution.