Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 33. Multidimensional Arrays

33.5. Performance

We've looked at several different options for using multidimensional arrays in C++, which are adequate, elegant, or perfectly brilliant, depending on your point of view. But as with many aspects of software engineering, we must balance flexibility with performance, so we're going to finish the chapter by looking at the performance of various array types to inform on our balancing.

In the tests included here, the code executes N-inner loops, where N is the dimensionality of the array being tested, and assigns to and reads back from the [i][j][k]th elements in the loops. The values assigned are deterministically calculated, so that they are the same for each array type. Effectively, the test determines the costs of both read and write access to the elements. For the dynamically sized types, the cost of creation and destruction of the array storage are included within the timed sections.

The element types tested were std::string, int and double, but since the results indicated virtually identical relative performances in all cases, I'm just showing the results for int.

The times are expressed as percentages, relative to the slowest type for each dimension tested. All the test programs are included on the CD.

33.5.1 Sized at Run Time

Table 33.5 shows the relative performances obtained for vector of vectors, Boost's multi_array, and fixed_array_1/2/3d; all using index operator syntax. It also shows the costs for fixed_array using the at() and at_unchecked() direct method access (which circumvents the intermediates required for two or more dimensions).

Table 33.5. Relative performance of arrays sized at runtime.
int
1
2
3
vector< vector <. . .> >
39.1%
31.6%
100.0%
boost::multi_array
91.0%
100.0%
97.6%
fixed_array – [i][j][k]
41.0%
62.8%
77.7%
fixed_array – at
100.0%
53.8%
44.3%
fixed_array – at_unchecked
42.0%
31.6%
27.0%

There are several interesting things to note. First, the cost of the checked access provided by at() becomes less significant than that of the intermediates required by the subscript access at higher dimensions. When dealing with three dimensions, the at() access is approximately twice as fast as unchecked subscript access. This is because intermediates are not generated, and also because the entire offset calculation is done in one operation, rather than in three steps.

When at_unchecked() is considered, we can see that the benefits of skipping the intermediate calculations and the intermediate objects is compelling. It is on a par with vector and fixed_array subscript access in one dimension, with vector in two dimensions, and otherwise superior to all other schemes.

It's also equally clear that the flexibility of the design of Boost's multi_array has an impact on performance. At two and three dimensions, it is significantly less efficient than fixed_array and up to four times as slow. It fares worse than vector except in three dimensions where the cost of the large number of individual allocations required by the array of arrays of arrays created by the vector version begins to tell. Serious numerists may not wish to pay such costs.

I think we can fairly say that fixed_array has proven its worth, in performance. Sure, it's not terribly pleasing to have separate classes that do largely the same thing,^[8] but to be able to sensibly enumerate all elements with begin(), end(), to have size() provide a meaningful result, and to have such great performance seals the argument for me. Of course, you may see it differently, and that's entirely proper; there isn't a black and white answer here, as we're deep inside one of C++'s imperfections.

^[8] I've not given up on the notion of factoring out much of the repeated functionality, whilst maintaining the distinct nature of the at(), at_unchecked(), begin(), end() and size() methods. If you want to try this, you can find the classes on the CD; please email me with your results.

33.5.2 Sized at Compile Time

Table 33.6 shows the relative performances obtained for built-in arrays, Boost's array, and static_array_1/2/3d, all using index operator syntax. It also shows the costs for static_array using the at() and at_unchecked() direct method access (which circumvent the intermediates required for two or more dimensions).

Table 33.6. Relative performance of arrays sized at compile time.
int
1
2
3
built-in array
50.9%
49.1%
35.6%
array< array<. . .> >
52.4%
48.3%
36.7%
static_array – [i][j][k]
54.2%
83.9%
100.0%
static_array – at
100.0%
100.0%
88.4%
static_array – at_unchecked
51.9%
49.0%
35.5%

The picture's quite a bit different than the dynamically sized arrays. Simply, it's clear that subscript access of built-in arrays and boost:array and at_unchecked() access of static_array are all pretty much equivalent at all dimensions. Using subscript or at() for static_array is considerably more costly.

As was observed in section 33.3.1, building multidimensional arrays from boost::array is unsuitable because begin(), end() and size() all return meaningless values for two or more dimensions. But we can see from these performance results that static_array, even though it answers these concerns, imposed an unacceptable cost, except when using at_unchecked(), which is only able offer performance on par with built-in arrays.

Unlike fixed_array, then, I would say that static_array is not worth the bother. I would suggest that when using statically dimensioned arrays we are best to stick with the built-in arrays. They're the fastest, and although there is no direct support for treating them as STL containers, at least we're not facing types that give invalid answers to STL method questions.