Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 25. Fast, Non-intrusive String Concatenation

25.1. fast_string_concatenator<>

Even if you're like me and find most programming books impossible to digest whole, I'd advise you to have a copy of Bjarne Stroustrup's The C++ Programming Language [Stro1997] by your bed, or in the bathroom, or next to the goggle-box,^[2] and idly dip into it every now and then. It's by far the best catalyst of new ideas I've come across in C++, probably because Bjarne makes only brief but pithy mention of an idea and then moves on. (I sometimes wonder whether this is a deliberate tactic, so that all we second stringers can feel like we're inventing things that he's already anticipated.)

^[2] What my dad used to call the television, mostly when he'd catch me watching cartoons at unearthly hours.

Chapter 22 of [Stro1997] describes how several operators can be used in a chain to represent a single logical operation for matrix manipulation. This had me wondering about whether I could apply these principles to string concatenation to achieve performance improvements: the result is the fast_string_concatenator template.^[3]

^[3] Unusually for me, this one wasn't dreamt up in my sleep. It was more of an Archimedes moment...

25.1.1 Working with Custom String Classes

Before we get into the details of the implementation of fast_string_concatenator, let's see how we might use it with a custom string class, String.

The canonical way to implement concatenation [Meye1998] is as a nonmember function that is implemented in terms of the member operator +=():



String operator +(String const &lhs, char const *rhs)


{


  String result(lhs);


  result += rhs;


  return result;


}

There's an alternate form, if operator +=() returns a reference to the instance:



String operator +(String const &lhs, char const *rhs)


{


  return String(lhs) += rhs;


}

It's quite clear from this implementation where all the costly intermediate instances, allocation, and copying come from. There would be similar implementations for the four other overloads:



String operator +(String const &lhs, char rhs);


String operator +(char const *lhs, String const &rhs);


String operator +(char lhs, String const &rhs);


String operator +(String const &lhs, String const &rhs);

When using fast_string_concatenator, the five operators share a seemingly identical, simple definition:

Listing 25.1.



fast_string_concatenator<String>


  operator +(String const &lhs, char const *rhs)


{


  return fast_string_concatenator<String>(lhs, rhs);


}


fast_string_concatenator<String>


  operator +(String const &lhs, char rhs)


{


  return fast_string_concatenator<String>(lhs, rhs);


}


fast_string_concatenator<String>


  operator +(String const &lhs, String const &rhs);


fast_string_concatenator<String>


  operator +(char lhs, String const &rhs);


fast_string_concatenator<String>


  operator +(String const &lhs, String const &rhs);

The most obvious difference is that they no longer return instances of String, instead returning instances of fast_string_concatenator<String>. The implementation of each operator is simply to pass the two arguments to the constructor of the anonymous concatenator instance, which will be subject to RVO (see section 12.2).

25.1.2 Tying the Concatenators Together

Since each operator +() returns a concatenator, we clearly need more if we're going to make multipart concatenation sequences. This is facilitated by a number of standard operator +() overloads that are defined along with the class. The first three we'll look at each take a concatenator instance as their left-hand parameter:



template <typename S, typename C, typename T>


fast_string_concatenator<S, C, T>


  operator +(fast_string_concatenator<S, C, T> const &lhs, S const &rhs);


. . .


  operator +(fast_string_concatenator<S, C, T> const &lhs, C const *rhs);


. . .


  operator +(fast_string_concatenator<S, C, T> const &lhs, C const rhs);

Since the + operator is left-to-right associative, these three operators enable the result of the left-most concatenation to be bound with the next, and so on. Looking at our example, "Goodbye" and ' ' are concatenated to form a concatenator instance using one of the custom String operator +()s, which is then concatenated with "cruel" via the second of the standard ones shown earlier.

In this way, any permutation of character, C-style string and string class can be combined in a concatenation sequence, resulting in a final fast_string_concatenator instance. There's more to it, such as concatenation seeding (see section 24.4) and pathological bracing (see section 25.5), but that's essentially it for the operators in normal circumstances.

25.1.3 The Concatenator Class

Now it's time to look at the concatenator class itself. We've already seen that it is a template, and that it takes three template parameters. The three parameters S, C, and T represent the string type, the character type, and the traits type, respectively



template< typename S


        , typename C = typename S::value_type


        , typename T = std::char_traits<C>


        >


class fast_string_concatenator;

In the definition of the class, C defaults to S::value_type, and T defaults to std::char_traits<C>. These are provided as default parameters rather than just assumed to allow wider application of the template.

In most cases, therefore, one only has to be concerned about the string type, and leave the rest to the defaults. Since most modern string classes define a value_type member type, it works swimmingly. For those that do not, it is not difficult to stipulate both first and second types in the concatenator instantiation, or to use a type veneer (see Chapter 21) to provide the necessary member types for the string.

We've seen that all the operator +() implementations follow precisely the same form by returning an anonymous instance of the concatenator, constructed from the left and right hand parameters to the operator. This is reflected in the public interface of the class, as shown in Listing 25.2.

Listing 25.2.



template< . . . >


class fast_string_concatenator


{


public:


  typedef S                                 string_type;


  typedef C                                 char_type;


  typedef T                                 traits_type;


  typedef fast_string_concatenator<S, C, T> class_type;


// Construction


public:


  fast_. . . (string_type const &lhs, string_type const &rhs);


  fast_. . . (string_type const &lhs, char_type const *rhs);


  fast_. . . (string_type const &lhs, char_type const rhs);


  fast_. . . (char_type const *lhs, string_type const &rhs);


  fast_. . . (char_type const lhs, string_type const &rhs);





  fast_. . . (class_type const &lhs, string_type const &rhs);


  fast_. . . (class_type const &lhs, char_type const *rhs);


  fast_. . . (class_type const &lhs, char_type const rhs);


// Conversion


public:


  operator string_type() const;


  . . .

You'll also note the only other public method of the class, which is an implicit conversion operator to the string_type. (You see, I told you there were rare occasions when it's good to have such a thing!) Now you can see how we get back the resultant string from the concatenator. This method has a simple implementation:



template< . . . >


fast_string_concatenator<S, C, T>::operator S() const


{


  size_t      len = length();


  string_type result(len, '~');


  write(&result[0]);


  assert(len == traits_type::length(result.c_str()));


  return result;


}

The concatenator's length() method is called, then a string instance result is created of the given length, using the standard library's String model's constructor (C++-98: 21.3.1) that takes a length and the character to assign to all elements. This is how we achieve the single memory allocation for the entire concatenation sequence. You could use any character here—the code shown uses '~' as an eye-catcher—since they will all be overwritten. In fact, this represents a possible future optimization of the technique, whereby one could have a private constructor of the string class, to which the concatenator had access, which would allocate and zero terminate the string contents but not otherwise initialize its contents. (Though as we'll see later, the excellent performance of the current solution probably makes this unnecessary.)

Next the write() method is called, passing the address of the start of result's internal memory. This method walks the concatenation sequence and writes the contents into the given memory. When it returns, result contains the result of the concatenation and is returned. If your compiler supports NRVO, then result will actually be the variable receiving the result of the concatenation sequence in the client code.

25.1.4 Internal Implementation

Before we look at the implementation of the length() and write() methods, I'll show you the remainder of the class definition—Listing 25.3—so you'll see how the arguments to the individual concatenation operators are represented as a network.

Listing 25.3.



template< . . . >


class fast_string_concatenator


{


  . . . // Public interface


private:


  struct Data


  {


    struct CString


    {


      size_t          len;


      char_type const *s;


    };


    union DataRef


    {


      CString           cstring;


      char_type         ch;


      class_type  const *concat;


    };


    enum DataType


    {


        seed    // Argument was the seed type


      , single  // Argument was a single character


      , cstring // Argument was a C-string or string object


      , concat  // Argument was another concatenator


    };


    Data(string_type const &s);


    Data(char_type const *s);


    Data(char_type ch s);


    Data(class_type const &fc);


    Data(fsc_seed const &fc);





    size_t    length() const;


    char_type *write(char_type *s) const;





    DataType const  type;


    DataRef         ref;


  };


  friend struct Data;


private:


  char_type *write(char_type *s) const;


private:


  Data          m_lhs;


  Data          m_rhs;


  . . .


};

The nested structure Data is a discriminated union of a single character ch, a character sequence cstring (of type struct CString), and a pointer to a concatenator concat. Each concatenator instance contains two instances of this structure, m_lhs and m_rhs, representing the two arguments to operator +(). Now it's clear how the network is represented. Except for single characters, all the other types—c-strings, string objects, and concatenator instances—are referenced by pointers.

This is perfectly valid because C++ requires that temporary objects remain alive (i.e., do not have their destructors called) until the end of the statement in which they are constructed. Hence, all the temporary objects in the concatenation sequence are still valid at the point that the final fast_string_concatenator<String> instance is used to create the final string instance.

In our original example, the network builds as follows:



// FC = shorthand for temporary fast_string_concatenator<String>


FC#1: lhs == c-string ("Goodbye"); m_rhs == character (' ')


FC#2: lhs == ptr to FC (FC#1);     m_rhs == c-string ("cruel")


FC#3: lhs == ptr to FC (FC#2);     m_rhs == character (' ')


FC#4: lhs == ptr to FC (FC#3);     m_rhs == c-string ("world!")

FC#4 is the instance on which operator string_type() const is called to return the result to the client code.

Given that we know how the network is constructed, the implementations of length() and write() are very straightforward. First length().



template< . . . >


size_t fast_string_concatenator<S, C, T>::length() const


{


  return m_lhs.length() + m_rhs.length();


}

The length of a given concatenator is simply the sum of the two arguments it represents. Hence it defers to its m_lhs and m_rhs members. The implementation of Data::length() switches on the data type, recursing in the concat case:

Listing 25.4.



. . .


switch(type)


{


  case    seed:


    len = 0;


    break;


  case    single:


    len = 1;


    break;


  case    cstring:


    len = ref.cstring.len;


    break;


  case    concat:


    len = ref.concat->length();


    break;


}


. . .

Note that the normal wisdom of having a default case is not followed here. This is because tests showed that it cost 1 to 5% performance on several compilers involved in the test (see below). The actual code contains a run time assertion (see section 1.4) such as the following:



assert( type == cstring || type == single ||


        type == concat || type == seed);

The result of the outermost length() call is used in the implicit conversion operator to create a string of exactly the required size; hence only one allocation is needed.

But how do we achieve the single write operation? This starts out in the concatenator's write() method:



template< . . . >


C *fast_string_concatenator<S, C, T>::write(C *s) const


{


  return m_rhs.write(m_lhs.write(s));


}

The left-hand side writes out its contents into the buffer that is passed into the method via Data::write(), which returns the point at which it finished writing. This is then passed to the right-hand side, and the result of that is passed back to the caller.

The implementation of Data::write() follows a similar pattern to the length() method, in switching on the data type. Single characters are copied in and the write pointer advanced by one. The contents of string objects and C-style strings are copied into the buffer via memcpy(), and the pointer is advanced by the appropriate amount. For concatenators, the write() method is called and its return value returned.

Listing 25.5.



template< . . . >


C *fast_string_concatenator<S, C, T>::Data::write(C *s) const


{


  size_t  len;


  switch(type)


  {


    case    seed:


      break;


    case    single:


      *(s++) = ref.ch;


      break;


    case    cstring:


      len = ref.cstring.len;


      memcpy(s, ref.cstring.s, sizeof(C) * len);


      s += len;


      break;


    case    concat:


      s = ref.concat->write(s);


      break;


  }


  return s;


}

Hence, write() copies in the requisite number of characters for the data type, and then passes back the address of the next location in which to write. That means that the reversed double call m_rhs.write(m_lhs.write(s)) in fast_string_concatenator<>::write() results in the string being written in correct order, and in a single pass.

It may seem like a lot of code, but you can be assured that good compilers make short work of such things. In optimized builds it is very efficient indeed.

It's worth noting that by using only standard expected public operations of a string classes, application of the concatenator to arbitrary string types is nonintrusive.