Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 25. Fast, Non-intrusive String Concatenation

25.2. Performance

I've tested the concatenator with quite a number of different string classes, and it has the same excellent performance characteristics with all of them. We look at the performance of three string classes in particular.

The first, TRivial_string, is a custom class written specifically for this test. It has two members, m_len (size_t) and m_s (char_type *), which hold the length and a dynamically allocated character buffer. It uses new and delete to allocate the memory, and stores exactly the amount of memory required for the current contents and the terminating null character. In other words, it is a bog-standard string class.

The second string used is the standard library's basic_string. Since different compiler vendors use different implementations, the performances between the different standard strings will be representative as much of the library implementation as it is of the compiler. Despite this likely variation, I've included this string since it represents, for many developers, the string class; the other two strings here will clearly demonstrate compiler, rather than library, differences in application of the concatenator.

The third string is STLSoft's basic_simple_string<>, which stores its capacity and length along with the string contents in a resizable dynamically allocated buffer and which performs optimistic allocation on a granularity of 32 characters.

The same test program was used for the three string types, measuring the performance of concatenation sequences of 1, 2, 3, 4, 8, 16, and 32 with and without fast_string_concatenator.^[4] The results shown here represent the relative times, as a percentage, of the fast_string_concatenator with respect to the normal operator +() implementations provided with the given string class. Values lower than 100% indicate a superior performance for the concatenator. The program was compiled and tested with Borland (v5.6), CodeWarrior (v8), Digital Mars (v8.38), GCC (v3.2), Intel (v7.0), and Visual C++ (versions 6.0 and 7.1).

^[4] The test program and full results are all included on the CD.

Table 25.1 shows the performances for the TRivial_string class, Table 25.2 shows the performances for std::basic_string<char>, and Table 25.3 shows the performances for stlsoft::basic_simple_string<char>.

Table 25.1. Relative performance of the concatenator for the TRivial_string class.
#Concats
Borland
CodeWarrior
Digital Mars
GCC
Intel
VC++ (6.0)
VC++ (7.1)
Average
1
90.8%
77.8%
87.8%
77.5%
64.8%
92.6%
99.6%
84.4%
2
47.8%
42.8%
47.9%
38.4%
36.8%
52.3%
52.8%
45.5%
3
35.2%
29.7%
34.5%
29.1%
25.9%
36.9%
37.0%
32.6%
4
29.1%
24.3%
28.4%
25.0%
23.9%
29.7%
29.6%
27.1%
8
24.2%
19.6%
24.0%
21.3%
19.1%
24.3%
23.6%
22.3%
16
13.3%
7.7%
13.8%
12.2%
11.0%
11.4%
10.7%
11.4%
32
9.6%
4.2%
8.9%
10.2%
7.7%
6.5%
7.7%
7.8%

Table 25.2. Relative performance of the concatenator for std::basic_string<char>.
#Concats
Borland
CodeWarrior
Digital Mars
GCC
Intel
VC++ (6.0)
VC++ (7.1)
Average
1
101.4%
89.4%
132.8%
91.3%
55.1%
188.1%
170.8%
118.4%
2
56.1%
57.4%
91.7%
52.9%
44.9%
108.7%
137.5%
78.5%
3
43.0%
44.2%
71.3%
39.0%
32.1%
76.2%
76.3%
54.6%
4
36.9%
33.8%
64.4%
35.2%
23.7%
59.5%
48.1%
43.1%
8
30.7%
30.6%
63.9%
31.8%
20.5%
49.0%
47.1%
39.1%
16
15.6%
14.3%
49.3%
25.3%
10.7%
26.2%
16.2%
22.5%
32
12.9%
10.2%
26.1%
16.7%
8.5%
19.6%
11.0%
15.0%

Table 25.3. Relative performance of the concatenator for stlsoft::basic_simple_string<char>.
#Concats
Borland
CodeWarrior
Digital Mars
GCC
Intel
VC++ (6.0)
VC++ (7.1)
Average
1
98.6%
127.2%
102.1%
67.7%
50.8%
99.3%
99.7%
92.2%
2
62.4%
82.8%
98.8%
42.8%
33.9%
67.0%
65.4%
64.7%
3
50.3%
61.5%
74.7%
31.7%
29.4%
51.1%
49.8%
49.8%
4
44.6%
53.6%
66.3%
29.0%
27.6%
46.5%
41.2%
44.1%
8
37.6%
45.6%
55.3%
23.3%
23.8%
40.8%
35.6%
37.4%
16
21.8%
21.4%
25.7%
13.7%
15.6%
21.9%
18.0%
19.7%
32
17.9%
13.7%
15.2%
10.6%
13.7%
14.6%
13.8%
14.2%

Before I conducted the tests, my expectation was that using the concatenator for sequences of one or two concatenations would actually result in a performance hit, with only longer sequences feeling the benefit of the optimization. Thankfully, in most cases, it seems that I underestimated the effect of the optimization. Though the data points for 16 and 32 concatenations are merely academic—if you're writing code with 32 concatenations, you probably need to take a holiday—the savings to be had for up to, say, 8 concatenations are considerable, up to 80%.

With TRivial_string (see Table 25.1), every data point demonstrates a superior performance for the concatenator, so using it for this class represents an unconditional win. Roughly speaking, a concatenation sequence of length 2 is twice as fast, and one of length 3 is three times as fast. This is very encouraging, but the string implementation is quite rudimentary so perhaps we shouldn't count any chickens just yet.

With STLSoft basic_simple_string (see Table 25.3), the performance results are almost as encouraging, except that for a single concatenation CodeWarrior and Digital Mars both suffer a small performance penalty, of 27% and 2%, respectively. Despite that, I think it's clear that using the concatenator represents a definite win.

The results for std::basic_string (see Table 25.2) are somewhat less conclusive. Some standard library implementations use reference counting and copy-on-write, and this may affect the performance advantage of the concatenator. For a single concatenation, Borland and Digital Mars both suffer a small cost, although at 1% and 33%, they're not particularly off putting. Of more concern is that both Visual C++ versions exhibit a performance loss for concatenation sequences of length 1 and 2. Of course, we might simply surmise that the Visual C++ run time library has such a good string implementation that the concatenator cannot keep up until sequences of three or more elements. However, since the Intel performance was obtained with the Visual C++ 7.0 library, whose string implementation is virtually identical to that for version 7.1,^[5] that explanation doesn't really hold water. In other words, where Visual C++ has a relative performance of 171% and 138% Intel has a relative performance of 55% and 49%. This goes to show the effect that the template optimizing capabilities of the compiler can have on the use of such a scheme. It is my belief that as compilers continue to improve in their abilities to optimize templates, the exceptional levels of performance afforded by the concatenator with the Intel compiler will be more broadly applicable.

^[5] Indeed, performing the same test using Intel 7.1 using Visual C++ 7.1 libraries shows virtually identical performance to the Intel 7.0 performance shown here.

Table 25.4.
#Concats
(6.0) VC++
VC++ (6.0) + concatenator
VC++ (7.1)
VC++ (7.1) + concatenator
1
143
269
226
386
2
263
286
275
378
3
408
311
476
363
4
543
323
761
366
8
665
326
969
456
16
2338
613
4116
667
32
5180
1016
9232
1011

Notwithstanding potential future improvements in compilers, I would suggest that use of the fast concatenator with string libraries represents a net win, and a significant one at that, although it must be conceded that, for Visual C++ at least, performance profiling might be needed to prove this. To inform on that, it's interesting to take a look at the absolute concatenation times, in milliseconds, for the two compilers (see Table 25.4).

Using the concatenator does not linearize the time cost with respect to the number of concatenation elements, but it does flatten the exponential growth considerably. Hence, in any software where the average sequence length is greater than 2, there are likely to be significant savings even with Visual C++.

25.2. Performance

Table 25.1. Relative performance of the concatenator for the `TRivial_string` class.

Table 25.2. Relative performance of the concatenator for `std::basic_string<char>`.

Table 25.3. Relative performance of the concatenator for `stlsoft::basic_simple_string<char>`.

Table 25.4.

25.2. Performance

Table 25.1. Relative performance of the concatenator for the TRivial_string class.

Table 25.2. Relative performance of the concatenator for std::basic_string<char>.

Table 25.3. Relative performance of the concatenator for stlsoft::basic_simple_string<char>.

Table 25.4.

Table 25.1. Relative performance of the concatenator for the `TRivial_string` class.

Table 25.2. Relative performance of the concatenator for `std::basic_string<char>`.

Table 25.3. Relative performance of the concatenator for `stlsoft::basic_simple_string<char>`.