Previous section   Next section

Imperfect C++ Practical Solutions for Real-Life Programming
By Matthew Wilson
Table of Contents
Part Five.  Operators


Chapter 25. Fast, Non-intrusive String Concatenation

A well-known inefficiency in Java [Larm2000] is in string concatenation. The way around it is to ensure that the concatenation is done within a single statement, which facilitates a compiler optimization in which the successive arguments to the + operator are silently translated into calls to a hidden StringBuffer instance, resulting in a more efficient construction of the string from its constituent parts. Hence,



String s = s1 + " " + s2 + " " + s3;



is automatically converted to



StringBuffer  sb = new StringBuffer();


sb.append(s1);


sb.append(" ");


sb.append(s2);


sb.append(" ");


sb.append(s3);


String        s = sb.toString();



This results in significant increases in performance [Wils2003e] compared with manual concatenation over several statements.

Most C++ libraries overload the semantics of operator +() for string concatenation, resulting in similar inefficiencies as a result of the generation of the intermediate objects required in the concatenation chains. The inefficiency of C++ string concatenation sequences stems from two factors.

First, the intermediate string associated with each intermediate concatenation (each + operator subexpression) will involve at least one memory allocation[1] to accommodate the total string contents of the concatenation's two parameters.

[1] Except in the cases where the string class uses the small-string optimization (SSO) [Meye2001], and it is applicable. Of course, as soon as the combined length of the intermediates reaches the SSO internal capacity, it then becomes a pessimization.

Second, every intermediate will copy the contents of its two arguments. For an expression with N concatenations (N + operator's), arguments 0 and 1 will be copied N times, argument 1 will be copied N-1 times, and so on.

Assuming that the following code is compiled with a compiler that supports NRVO (see section 12.2.1), there are still likely to be between 4 and 8 memory allocations, and 4, 3, and 2 copies taken of the contents of strings s3, s2 and s1, respectively.



String      s1  = "Goodbye";


char const  *s2 = "cruel";


String      s3  = "world!";


String      s   = s1 + ' ' + s2 + ' ' + s3;



In principle, since none of the intermediate results are used (or useful) outside of the statement, all that is required is one memory allocation and one copy of the contents of each source string. Ideally, we would like to see each individual concatenation subexpression resulting only in a record being taken of the arguments, and being passed "up" the chain, until the string needs to be generated, at which point the allocation of memory, and the copying into that memory of the individual pieces of the resultant string, can be done just once.


      Previous section   Next section