C.1. The Computational ModelCan we really say anything useful about a program's compile time costs without examining the implementation of every compiler? Using the standard techniques for analyzing runtime complexity, we can.[1] When we describe the complexity of runtime programs, we count the number of primitive operations they execute on an abstract machine. An abstract machine is a model of the actual hardware that hides such issues as instruction cycle times, cache locality, and register usage. In the case of template metaprograms, the abstract machine is a model of the compiler implementation that hides such issues as its internal data structures, symbol table lookup efficiency, and the parsing algorithm.
We measure metaprogram complexity in terms of the number of template instantiations required. It's not an entirely arbitrary choice: Compilation times tend to be correlated with the number of template instantiations performed. It's also not a perfect choice, but only by sweeping aside factors that are sometimes relevant can we simplify the abstract machine enough to reason about its performance. C.1.1. MemoizationEven if we ignore the other factors, thinking about complexity just in terms of template instantiations can be strange, since a particular template specialization is only instantiated once in a translation unit: typedef foo<char>::type t1; // foo<char> instantiated here ... typedef foo<char>::type t2; // foo<char> is just looked up Unlike the way regular function calls work, when a metafunction is called again with the same arguments, the compiler doesn't have to go through the whole computation again. If you're familiar with the idea of "memoization," you can think of all metafunction results as being memoized. At the first invocation, the instantiated class is stored in a lookup table indexed by the template's arguments. For subsequent invocations with the same arguments, the compiler merely looks up the template instantiation in the table. C.1.2. An ExampleConsider the classic recursive Fibonacci function, with O(n2) complexity: unsigned fibonacci(unsigned n) { if (n < 2) return n; else return fibonacci(n - 1) + fibonacci(n - 2); } Invoking fibonacci(3) might cause the following series of calls: fibonacci(3) fibonacci(2) fibonacci(1) fibonacci(0) fibonacci(1) fibonacci(2) fibonacci(1) fibonacci(0) Now let's do a direct translation into templates: template<unsigned n, bool done = (n < 2)> struct fibonacci { static unsigned const value = fibonacci<n-1>::value + fibonacci<n-2>::value; }; template<unsigned n> struct fibonacci<n,true> { static unsigned const value = n; }; In this case, fibonacci<3>::value might cause the following sequence of instantiations and lookups, where instantiations are shown in bold:
fibonacci<3>
fibonacci<2>
fibonacci<1>
fibonacci<0>
fibonacci<1>
fibonacci<2>
The complexity of the compile time fibonacci function is not O(n2), but O(n). That's true even if you count lookups: there is at most one instantiation and one lookup per n. C.1.3. What Are We Hiding?What's being hidden by this way of describing the abstract machine? Without looking at the compiler's source code, we can't be sure. In the interest of "full disclosure," we'll discuss the things we know are being swept under the rug. As we mentioned earlier, we're hiding implementation details of the compiler. In a moment we'll discuss some ways in which those details can "leak" out of our abstraction and become observable. We're also glossing over a few details of metaprogram implementation. For example, some associative sequences use function overload resolution to implement their lookup strategies.[2] Overload resolution can have a nontrivial cost in the compiler, but we're not considering it.
|