![]() | |
![]() ![]() |
![]() | Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson |
Table of Contents | |
Chapter 14. Arrays and Pointers |
14.6. Arrays of Inherited TypesThis well-documented pitfall [Meye1996, Stro1997, Dewh2003] is one of C++'s most problematic imperfections. If you have a parent class Base and a derived class Derived, and they are different in size (i.e., instances of Derived are larger than those of Base), then passing a pointer to an array of Derived to a function that takes a pointer to an array of Base will result in unpleasant consequences, since all but the item at index offset 0 will be misaligned. The best that can be hoped for in such circumstances is a quick and obvious crash. Consider the code in Listing 14.4. Listing 14.4.struct Base { Base() : m_i0(0) {} int m_i0; }; void print_Base(Base &b) { printf("%d ", b.m_i0); } class Derived : public Base { Derived() : m_i1(1) {} int m_i1; }; void print_array(Base ab[], size_t cb) { for(Base *end = ab + cb; ab != end; ++ab) { print_Base(*ab); // Process each element } } int main() { Base ab[10]; Derived ad[10]; print_array(ab, 10); // Ok print_array(ad, 10); // Compiles and runs, but badness awaits! . . . In the example, the first call to print_array() will correctly yield "0 0 0 0 0 0 0 0 0 0", but the second will yield "0 1 0 1 0 1 0 1 0 1". This is actually worse than a crash, since the bug can go unnoticed when the symptomatic behavior is as "benign" as this. Thankfully most real cases of this do crash. It can be argued that this is not an imperfection at all,[8] merely an artifact of the C++'s object model [Lipp1996]. But it is so often overlooked and/or misunderstood, it is so dangerous, and compilers are unable to provide any defense against it, so it adds up to a serious imperfection in my estimation.
In the remainder of this section, we look at several partial solutions and avoidance techniques and a fully effective alternate representation of array parameters. 14.6.1 Store Polymorphic Types by PointerSince a derived class pointer is a base class pointer, the standard recommended approach [Meye1996, Dewh2003, Sutt2000] to our problem is to store pointers to the instances in an array (or std::vector), and process that. It obviates the problem entirely, and is the preferred solution in many cases. However, this only covers those cases where you wish to manipulate types via (virtual) functions. This is not always appropriate, since it's often desirable to provide simple wrapper classes for C-API structures (see sections 3.2 and 4.4). Also, it can sometimes be desirable (although this is rare; see Chapter 21) to inherit from non-vtable-polymorphic types. One final disadvantage is that it imposes costs both on the algorithm manipulating the code due to the extra indirection incurred for each element, and on the calling code in allocating and instantiating the array and the item references in the array independently, as in Listing 14.5. Listing 14.5.void process_array(Base *ab[], size_t cb) { for(Base **end = ab + cb; ab != end; ++ab) { print_Base(**ab); // Process each element. Might also need // to test *ab is not null! } } int main() { Derived ad[10]; Base *apb[dimensionof(ad)]; for(int i = 0; i < dimensionof(ad); ++i) { apb[i] = &ad[i]; } process_array(apb, 10); // Ok, but was it worth the effort? . . . Since we're in the business of exploring all our options, we'll proceed to look at the alternatives. 14.6.2 Provide Nondefault ConstructorsThe first thing one can do is to prevent any arrays from being created. Arrays of class types may only be declared when the class type provides an accessible default constructor, one with defaulted arguments, or no constructor at all. If one can ensure that derived classes do not contain default constructors, then the problem is avoided. However, since there is no way to define a base class such that its derived classes may not contain default constructors, this is barely a solution when all classes are written by a single author, or within a team or by a development organization that is subject to thorough code reviews. It has no hope in other circumstances. 14.6.3 Hide Vector new and deleteIt is possible to influence the array nature of derived types by hiding the vector new and delete operators—operator new[]() and operator delete[]()—in the base class: class Base { std::string s; private: void *operator new [](size_t); void operator delete [](void *); }; int main() { Base *pbs = new Base[5]; // Illegal - inconvenient Derived *pds = new Derived[5]; // Illegal - good Base ab[5]; // Still legal - good Derived ad[5]; // Still legal – bad! . . . Hiding the vector new and delete operators only proscribes the allocation of arrays from the heap. It does not prevent one from declaring a stack-based array of derived instances and then passing that to a function expecting an array of parent instances. Furthermore, there is nothing to stop the authors of derived classes from providing publicly accessible vector new and delete operators. Hence, the primary effect of this method is to prevent us from creating a heap array of Base; since it doesn't prevent the things we want to avoid, it's pretty useless! 14.6.4 Use std::vectorAs I said in the previous section, when it comes to storing and manipulating variable sized arrays, I believe there are very few reasons (see section 32.2.8 for one such reason) to look past std::vector, and that is certainly the advice from the experts [Sutt2000, Stro1997, Meye1996, Dewh2003]. Most C++ afficionados would suggest that using std::vector would represent the solution to our array processing problem: void process_array(std::vector<Base> &ab) { std::for_each(ab.begin(),ab.end(), . . .); // Process all elements } int main() { std::vector<Base> ab(10); std::vector<Derived> ad(10); process_array(ab); // Ok process_array(ad); // Compile error . . . However, what you might use in your client code is a different matter from what a library function should be able to handle. This solution is certainly type-safe, since std::vector<Base> is an entirely different beast than std::vector<Derived> and the two cannot interoperate (without some sledgehammer casting). But despite this, I think that the advice is flawed. First, there are circumstances where arrays are required—for example, when of static storage—and std::vector (or any other arraylike container) simply will not suffice. Second, the elements may already be stored somewhere else, perhaps as part of a larger range within another vector. If we want to pass a subset of that vector's contents to our function, we must copy them out, pass them, and, for non-const manipulation, copy them back in again.[9] This is because vector, like all standard library containers, stores and manipulates elements by value. Even if we could somehow guarantee the consistency in a real system during this process, imagine the performance implications! (And don't even start to ponder the exception-safety ramifications!)
14.6.5 Ensure That Types Are the Same SizeNone of the previous suggested mechanisms represent an optimal or appropriate mechanism to defend us against the inappropriate use of derived-type arrays of inherited types. Before we look at the solution, we should consider the cases where, for all its dangers, using array algorithms on inherited types may be desirable. What if we could enforce the restriction that any derived class types used by the array processing functions have the same memory footprint as their base and are, therefore, safe to use in this context? If they're the same size, then the slicing issue is moot (see Chapter 21), and since a derived type is a base type, it's perfectly valid to treat it as such in the processing function. So we allow arrays of inherited types when the sizes are the same. The question, then, is how do we ensure that they are the same size? Expert reviewers may have the skills to determine this in code reviews for limited cases, but the variety of factors that can influence this effect—template derivation (see Chapters 21 and 22), structure packing (see Chapters 13 and 14), derived class overhead (see section 12.4)—conspire to make this unrealistic in practice. Even where reviews are conducted—which is all too rarely—they are not guaranteed to catch all errors and are only part of the verification armory [Glas2003]. We could have assertions (see section 1.4) in the code, but run time assertions may not be fired (i.e., incomplete coverage of code paths, release build testing). Compile-time assertions are a lot better, but do not provide obvious error messages and their omission from a particular derived class may slip under the reviewing radar. A better way is to use a constraint (see section 1.2). A constraint is a special piece of code, usually a template class, which serves to enforce a design assumption. This enforcement usually takes the form of a compile-time error, such as the inability to convert one type to another. Since we want our types to be the same size, we use the imaginatively named must_be_same_size constraint (see section 1.2.5). Now we have the tool to detect inappropriate derived-type arrays, but where do we use it? In fact the answer is suggested by the std::vector solution, in that the parameterization of templates by inheritance-related types results in unrelated types. Our final solution takes the form of an enhanced version of the array_proxy template that we saw in section 14.4. Listing 14.6 shows its full form, incorporating the constraint and some extra member template constructors. Listing 14.6.template <typename T> class array_proxy { public: typedef T value_type; typedef array_proxy<T> class_type; typedef value_type *pointer; typedef value_type *const_pointer; // Non-const! typedef value_type &reference; typedef value_type &const_reference; // Non-const! typedef size_t size_type; // Construction public: template <size_t N> explicit array_proxy(T (&t)[N]) // Array of T : m_begin(&t[0]) , m_end(&t[N]) {} template <typename D, size_t N> explicit array_proxy(D (&d)[N]) // Array of T-compatible type : m_begin(&d[0]) , m_end(&d[N]) { // Ensures that D and T are the same size. constraint_must_be_same_size(T, D); } template <typename D> array_proxy(array_proxy<D> &d) : m_begin(d.begin()) , m_end(d.end()) { // Ensures that D and T are the same size. constraint_must_be_same_size(T, D); } // State public: pointer base(); const_pointer base() const; size_type size() const; bool empty() const; static size_type max_size(); // Subscripting public: reference operator [](size_t index); const_reference operator [](size_t index) const; // Iteration public: pointer begin(); pointer end(); const_pointer begin() const; const_pointer end() const; // Members private: pointer const m_begin; pointer const m_end; // Not to be implemented private: array_proxy &operator =(array_proxy const &); }; The first constructor sets the member pointers m_begin and m_end to the start and (one past) the end of the array to which it is applied. Using the array_proxy we can rewrite process_array(): void process_array(array_proxy<Base> ab) { std::for_each(ab.begin(), ab.end(), . . .); // Process all elements } In this case, process_array() was written to take a non-const array_proxy by value, because the processing of the elements may want to change them. If your function requires only read only access, it is potentially slightly more efficient to declare it as passing array_proxy<T> const &, though the difference in performance is unlikely to show up on any profiler you can use, given the likely relative cost of the internals of process_array() itself. We can expand the example, to include a Derived_SameSize type that derives from Base but does not change the memory footprint (see section 12.4). It is now valid to pass arrays of Derived_SameSize to process_array(), and the new array_proxy facilitates that via its other two template constructors. Derived_SameSize : public Base {}; void main() { Base ab[10]; Derived ad[10]; Derived_SameSize ads[10]; process_array(make_array_proxy(ab)); // Ok process_array(make_array_proxy(ad)); // Compiler error. Good! process_array(make_array_proxy(ads)); // Ok – very smart } This is a complete solution to the problem. It is efficient (there is no cost on any decent compiler), it is type safe, and it completely empowers the designer of functions to defend themselves (or rather their code) against potential misuse by derivers of their types. Furthermore, it is smart enough to facilitate the case where the inherited types are of the same size as the parent type and to allow them to be proxied! One final advantage is that it is now not possible to pass the wrong array extent to process_array, as it certainly was with the original two-parameter version, allowing us to adhere to the DRY principle. This technique does not impose protection on hierarchies of types, so one could argue that it fails to adequately guard against passing around arrays of inheritance-related types. But I believe that that's the wrong way to look at it. It is the functions—free functions, template algorithms, member functions—themselves that need protecting from being passed arrays of the wrong type. Since it is within the purview of the author of any such functions to stipulate array_proxy<T> rather than T*, he/she can have all the protection considered necessary. ![]() |
![]() | |
![]() ![]() |