I l@ve RuBoard

4.5 The Fixed-Size Allocator

The next layer of our small-object allocator consists of FixedAllocator. FixedAllocator knows how to allocate and deallocate blocks of a fixed size but is not limited to a chunk's size. Its capacity is limited only by available memory.

To achieve this, FixedAllocator aggregates a vector of Chunk objects. Whenever an allocation request occurs, FixedAllocator looks for a Chunk that can accommodate the request. If all Chunks are filled up, FixedAllocator appends a new Chunk. Here is the relevant part of the definition of FixedAllocator:



class FixedAllocator


{


   ...


private:


   std::size_t blockSize_;


   unsigned char numBlocks_;


   typedef std::vector<Chunk> Chunks;


   Chunks chunks_;


   Chunk* allocChunk_;


   Chunk* deallocChunk_;


};

To achieve a speedy lookup, FixedAllocator does not iterate through chunks_ looking for a space for each allocation. Instead, it holds a pointer to the last chunk that was used for an allocation (allocChunk_). Whenever an allocation request comes, FixedAllocator::Allocate first checks allocChunk_ for available space. If allocChunk_ has a slot available, the allocation request is satisfied—presto—using allocChunk_. If not, a linear search occurs (and, possibly, a new Chunk is appended to the chunks_ vector). In any case, allocChunk_ is updated to point to that found or added chunk. This way we increase the likelihood of a fast allocation next time. Here's the code that implements this algorithm:



void* FixedAllocator::Allocate()


{


   if (allocChunk_ == 0 ||


      allocChunk_->blocksAvailable_ == 0)


   {


      // No available memory in this chunk


      // Try to find one


      Chunks::iterator i = chunks_.begin();


      for (;; ++i)


      {


         if (i == chunks_.end())


         {


            // All filled up-add a new chunk


            chunks_.reserve(chunks _.size()+1);


            Chunk newChunk;


            newChunk.Init(blockSize_, numBlocks_);


            chunks_.push_back(newChunk);


            allocChunk_ = &chunks_.back();


            deallocChunk_ = &chunks_.back();


            break;


         }


         if (i->blocksAvailable_ > 0)


         {


            // Found a chunk


            allocChunk_ = &*i;


            break;


         }


      }


   }


   assert(allocChunk_ != 0);


   assert(allocChunk_->blocksAvailable_ > 0);


   return allocChunk_->Allocate(blockSize_);


}

Using this strategy, FixedAllocator satisfies most of the allocations in constant time, with occasional slowdowns caused by finding and adding a new block. Some memory allocation trends make this scheme work inefficiently; however, they tend not to appear often in practice. Don't forget that every allocator has an Achilles' heel.

Memory deallocation is more problematic, because at deallocation time there's a piece of information missing—all we have is a pointer to deallocate, and we don't know to what Chunk that pointer belongs. We can walk through chunks_ and check whether the given pointer falls in between pData_ and pData_ + blockSize_ * numBlocks_. If that's the case, then we pass the pointer to that Chunk's Deallocate member function. The problem is, this takes time. Although allocations are fast, deallocations take linear time. We need an additional device that speeds up deallocations.

We could use a supplemental cache memory for freed blocks. When user code frees a block using FixedAllocator::Deallocate(p), the FixedAllocator does not pass p back to the corresponding Chunk; instead, it adds p to an internal memory—a cache that holds available blocks. When a new allocation request comes, FixedAllocator first looks up the cache. If the cache is not empty, FixedAllocator fetches the last available pointer from cache and returns it right away. This is a very fast operation. Only when the cache is depleted does FixedAllocator need to go the standard route by forwarding the allocation request to a Chunk. This is a promising strategy, but it works badly with certain allocation and deallocation trends that are common to small-object use.

There are four main trends in allocating small objects:

Bulk allocation. Many small objects are allocated at the same time. For example, this is the case when you initialize a collection of pointers to small objects.
Deallocation in same order. Many small objects are deallocated in the same order as they were allocated. This happens when most STL containers are destroyed.^[1]
^[1] Standard C++ does not define the order of destruction of objects within a standard container. Therefore, each implementer has to make a choice. Usually, containers are destroyed through a simple forward iteration. However, some implementers go for the more "natural" order by destroying objects in reverse order. The rationale is that in C++ objects are destroyed in reverse order of their creation.
Deallocation in reverse order. Many small objects are deallocated in the reverse order of their allocation. This trend occurs naturally in C++ programs as you call functions that manipulate small objects. Function arguments and temporary stack variables follow this trend.
Butterfly allocation and deallocation. Objects are created and destroyed without following a certain order. This happens as your program is running and has occasional need for small objects.

A caching strategy serves the butterfly trend very well because it can sustain quick allocations and deallocations if they arrive randomly. However, for bulk allocation and deallocation, caching is not of help. Worse, caching slows down deallocations because clearing the cache memory takes its own time.^[2]

^[2] I could not come up with a reasonable caching scheme that works equally well for same-order deallocation and reverse-order deallocation. Caching hurts either one or the other. Because both trends are likely to occur in real programs, caching is not a good choice.

A better strategy is to rely on the same concept as with allocation. The member variable FixedAllocator::deallocChunk_ points to the last Chunk object that was used for a deallocation. Whenever a deallocation occurs, deallocChunk_ is checked first. Then, if it's the wrong chunk, Deallocate performs a linear search, satisfies the request, and updates deallocChunk_.

Two important tweaks increase the speed of Deallocate for the allocation trends enumerated earlier. First, the Deallocate function searches the appropriate Chunk starting from deallocChunk_'s vicinity. That is, chunks_ is searched starting from deallocChunk_ and going up and down with two iterators. This greatly improves the speed of bulk deallocations in any order (normal or reversed). During a bulk allocation, Allocate adds chunks in order. During deallocation, either deallocChunk_ hits right away or the correct chunk is fetched in the next step.

The second tweak is the avoidance of borderline conditions. Say both allocChunk_ and deallocChunk_ point to the last Chunk in the vector, and there is no space left in the current set of chunks. Now imagine the following code is run:



for (...)


{


   // Some smart pointers use the small-object


   // allocator internally (see Chapter 7)


   SmartPtr p;


   ... use p ....


}

Each pass through the loop creates and destroys a SmartPtr object. At creation time, because there's no more memory, FixedAllocator::Allocate creates a new Chunk and appends it to the chunks_ vector. At destruction time, FixedAllocator::Deallocate detects an empty block and frees it. This costly cycle gets repeated for each iteration of the for loop.

This inefficiency is unacceptable. Therefore, during deallocation, a chunk is freed only when there are two empty chunks. If there's only one empty chunk, it is efficiently swapped to the end of the chunks_ vector. Thus, we avoid costly vector<Chunk>::erase operations by always deleting the last element.

Of course, some situations defeat this simple heuristic. If you allocate a vector of SmartPtrs of appropriate size in a loop, you are back to the same problem. However, such situations tend to appear more rarely. Besides, as mentioned in the introduction of this chapter, any allocator may perform worse than others under specific circumstances.

The deallocation strategy chosen also fits the butterfly allocation trend acceptably. Even if not allocating data in an ordered manner, programs tend to foster a certain locality; that is, they access a small amount of data at a time. The allocChunk_ and deallocChunk_ pointers deal nicely with such memory use because they act as a cache for the latest allocations and deallocations.

In conclusion, we now have a FixedAllocator class that can satisfy fixed-memory allocation requests in an acceptably fast and memory-efficient manner. FixedAllocator is optimized for typical trends in small-object allocation.

I l@ve RuBoard