![]() | |
![]() ![]() |
![]() | Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson |
Table of Contents | |
Part Five. Operators |
Chapter 25. Fast, Non-intrusive String ConcatenationA well-known inefficiency in Java [Larm2000] is in string concatenation. The way around it is to ensure that the concatenation is done within a single statement, which facilitates a compiler optimization in which the successive arguments to the + operator are silently translated into calls to a hidden StringBuffer instance, resulting in a more efficient construction of the string from its constituent parts. Hence, String s = s1 + " " + s2 + " " + s3; is automatically converted to StringBuffer sb = new StringBuffer(); sb.append(s1); sb.append(" "); sb.append(s2); sb.append(" "); sb.append(s3); String s = sb.toString(); This results in significant increases in performance [Wils2003e] compared with manual concatenation over several statements. Most C++ libraries overload the semantics of operator +() for string concatenation, resulting in similar inefficiencies as a result of the generation of the intermediate objects required in the concatenation chains. The inefficiency of C++ string concatenation sequences stems from two factors. First, the intermediate string associated with each intermediate concatenation (each + operator subexpression) will involve at least one memory allocation[1] to accommodate the total string contents of the concatenation's two parameters.
Second, every intermediate will copy the contents of its two arguments. For an expression with N concatenations (N + operator's), arguments 0 and 1 will be copied N times, argument 1 will be copied N-1 times, and so on. Assuming that the following code is compiled with a compiler that supports NRVO (see section 12.2.1), there are still likely to be between 4 and 8 memory allocations, and 4, 3, and 2 copies taken of the contents of strings s3, s2 and s1, respectively. String s1 = "Goodbye"; char const *s2 = "cruel"; String s3 = "world!"; String s = s1 + ' ' + s2 + ' ' + s3; In principle, since none of the intermediate results are used (or useful) outside of the statement, all that is required is one memory allocation and one copy of the contents of each source string. Ideally, we would like to see each individual concatenation subexpression resulting only in a record being taken of the arguments, and being passed "up" the chain, until the string needs to be generated, at which point the allocation of memory, and the copying into that memory of the individual pieces of the resultant string, can be done just once. ![]() |
![]() | |
![]() ![]() |