Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 32. Memory

32.3. Allocators

We'll look now at another trade-off, between compile time and run time selection of allocators. Once again, this is a trade-off between efficiency and flexibility. We saw in Chapter 9 that passing memory between link units is an unappealing prospect whether we require all link units to share the same underlying allocator, or we require the client code of each link unit to return every allocated piece of memory from whence it came. The former approach is restrictive, and the latter can sometimes be hard to adhere to. In either case, the decision of the source of memory is made at compile time.

A third way to achieve this is to pass the allocator into the library, thereby saving the decision for run time. There is a cost to this, effectively the cost of a vtable call, although this is usually, but not always, insignificant compared with the cost of the allocation, and the use made of the block. But it also has a potentially large payoff.

First, it relieves the author of the library from having to care about making the decision regarding which memory allocation library to use. You may wonder why this is even an issue. If so, you should probably take a peek inside the C run time library of your favorite compiler. Most vendors go to some serious effort to tune the allocation of memory, often allocating from separate pools depending on the size of the memory allocations. The reason for this is that a single heap subject to a wide spectrum of block size requests will become seriously fragmented, and this can have a significant deleterious effect on performance. Second, you can plug in the allocator at run time, in response to conditions at the time. Third, you don't need to recompile the library in order to change allocators. Finally, you can de-/re-allocate in client code something that was allocated within the library, which can be a useful simplification in many circumstances.

There are different mechanisms for specifying allocators at run time, but most tend to boil down to the same thing:^[12] specifying pointers to allocation and deallocation functions in a structure.

^[12] Another technique worth mentioning is interposing, or interpositioning [Lind1994], whereby a new library containing custom definitions of memory (or other) routines is linked in, and the calls to these libraries within the process are hooked into those from the new library. The author comments that "[o]ver the years we have seen no convincing examples where [it] was essential [and] the effect could not be achieved in a different . . . manner" and "[it is] only for people who enjoy a walk on the wild side." I couldn't agree more, and although I've used it on occasion, I think the techniques shown in this section represent the robust and maintainable choice.

32.3.1 Function Pointers

The ubiquitous zlib library defines the two function typedefs:



typedef void *(*alloc_fn)(void *opaque, uInt items, uInt size);


typedef void (*free_fn)(void *opaque, void *address);

These function typedefs are used within the z_stream structure, which is used by the client to exchange information with the functions in the zlib API.



typedef struct z_stream_s


{


  . . .


  alloc_func zalloc;  /* used to allocate the internal state */


  free_func  zfree;   /* used to free the internal state */


  voidpf     opaque;  /* private data object passed to zalloc and zfree */


  . . .


} z_stream;

All you need do is set the zalloc and zfree members to point to your functions, and zlib will use them. If you don't have any special requirements, you can specify NULL, and zlib will use its own internal default allocation and deallocation functions.

32.3.2 Allocator Interfaces

An alternative, and more object-oriented, approach is to bind the functions into an interface. The extensible parser project I described in section 8.3 used this approach. The Synesis core libraries define an allocator IAllocator, with a portable interface (see section 8.2), which looks something like the following C++-ified version:^[13]

^[13] Any resemblance to the COM IMalloc interface can only be as a result of coincidence...



struct IAllocatorVTable


  : public IRefCounter


{


  virtual LPVoid  Alloc(Size cb);


  virtual LPVoid  Realloc(LPVoid pv, Size cb);


  virtual void    Free(LPVoid pv);


  virtual Size    GetSize(LPCVoid pv) const;


  virtual void    Compact();


};

The advantages here are threefold. First, it puts all the functions in one place, which reduces the potential for erroneously specifying a mismatched pair. If you're really unlucky, such a combination might actually work with some compilers and/or operating environments, postponing maintenance conundrums for when your memory has faded and your timescales have shortened.

Second, by binding them in an interface, you can provide class instances for your allocators, which is by far the more convenient way to manipulate them (see Chapter 8).

Third, by using an interface you can feel free to provide more functions than are provided by just an allocation/deallocation function pair. Reallocation is an obvious choice, but there are others, as can be seen in IAllocator.

Finally if you supply a reference-counted interface, you can allow the library to hang on to it. We'll see how this can be useful shortly.

32.3.3 Per-library Initialization

Passing in the memory functions to each method, even via a structure or an interface, can still be tiresome and slightly inefficient if it requires an extra function parameter. One option is to supply it to the initialization function of the library API you intend to use.



int AcmeLib_Init(IAllocator *ator);


void AcmeLib_Uninit();





void *AcmeLib_GiveMeBlock(size_t cb);

You can easily combine it with link-unit global state, and an allocator adapter, and happily use it with any STL classes within the library. An allocator adaptor would look something like that shown in Listing 32.5.

Listing 32.5.



template< typename    T


        , IAllocator  *A


        >


struct IAllocator_adaptor


{


  typedef T           value_type;


  typedef value_type  *pointer;


       . . .


  typedef size_t      size_type;


  typedef ptrdiff_t   difference_type;





  static pointer allocate(size_type n, void const*)


  {


    return (pointer)(A->Alloc(sizeof(value_type) * n));


  }


  static void deallocate(pointer p, size_type)


  {


    A->Free(p);


  }


       . . .

It could be combined with an STL class as in the following:



IAllocator                              *s_ator;


typedef IAllocator_adaptor<int, s_ator> Acme_allocator;





void AcmeLib_DoOtherStuff(int i)


{


  std::vector<int, Acme_allocator>          v(i);


  std::basic_string<char, Acme_allocator>   s(. . .)'


  . . .

This is a convenient mechanism, but it has some serious flaws. First, it needs to be made reentrant, so it needs thread-safe link-unit global state, including the allocator pointer, initialization counter, and a locking object to protect the pointer.

Second, it is inconsistent with what we know about initializing APIs (see section 11.2)—that we like multiply-initializable APIs—if two calls stipulate different allocators. Both options—first wins and last wins—present problems. If first wins then the second initialization must fail, otherwise that client code will use the wrong allocator, the one specified in the first call. Conversely, if last wins then the relationship between the client code of the first initialization and the API will be broken. The only way such an API is usable is to make it singly-initializable.

Because of these problems, it is better to restrict use of this mechanism to a library that is only internally initialized in a link unit, and is not accessible to client code external to the authors of the library. In such contexts, however, it can be very useful, and I've used it to good effect on many occasions.

32.3.4 Per-Call Specification

We can obviate all the initialization, threading, and reentrancy issues of per-library initialization by doing as zlib does (see Appendix A), and passing the allocator along with every library API call. The downside to this is that the client code is responsible for maintaining the relationship between a memory block and its allocator. In most cases, the client code will only be using one allocator itself, so this is a no-brainer, but it's something you need to be aware of.

One interesting feature of this approach is that you can provide thread-specific allocators. Since most allocators are written to be thread safe, they must guard their internal structures with synchronization objects (see section 10.2). There are costs associated with such synchronization locking, and for a frequently used allocator they can be significant. One possible optimization is to use a different, nonlocking allocator for each thread, and per-call specification facilitates that.