Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 14. Arrays and Pointers

14.6. Arrays of Inherited Types

This well-documented pitfall [Meye1996, Stro1997, Dewh2003] is one of C++'s most problematic imperfections. If you have a parent class Base and a derived class Derived, and they are different in size (i.e., instances of Derived are larger than those of Base), then passing a pointer to an array of Derived to a function that takes a pointer to an array of Base will result in unpleasant consequences, since all but the item at index offset 0 will be misaligned. The best that can be hoped for in such circumstances is a quick and obvious crash. Consider the code in Listing 14.4.

Listing 14.4.



struct Base


{


  Base()


    : m_i0(0)


  {}


  int m_i0;


};





void print_Base(Base &b)


{


  printf("%d ", b.m_i0);


}





class Derived


  : public Base


{


  Derived()


    : m_i1(1)


  {}


  int m_i1;


};





void print_array(Base ab[], size_t cb)


{


  for(Base *end = ab + cb; ab != end; ++ab)


  {


    print_Base(*ab); // Process each element


  }


}





int main()


{


  Base    ab[10];


  Derived ad[10];


  print_array(ab, 10); // Ok


  print_array(ad, 10); // Compiles and runs, but badness awaits!


  . . .

In the example, the first call to print_array() will correctly yield "0 0 0 0 0 0 0 0 0 0", but the second will yield "0 1 0 1 0 1 0 1 0 1". This is actually worse than a crash, since the bug can go unnoticed when the symptomatic behavior is as "benign" as this. Thankfully most real cases of this do crash.

It can be argued that this is not an imperfection at all,^[8] merely an artifact of the C++'s object model [Lipp1996]. But it is so often overlooked and/or misunderstood, it is so dangerous, and compilers are unable to provide any defense against it, so it adds up to a serious imperfection in my estimation.

^[8] As several of my reviewers did!

Imperfection: C++'s array/pointer duality, combined with its support for polymorphic handling of inherited types, is a hazard against which the compiler provides no assistance.

In the remainder of this section, we look at several partial solutions and avoidance techniques and a fully effective alternate representation of array parameters.

14.6.1 Store Polymorphic Types by Pointer

Since a derived class pointer is a base class pointer, the standard recommended approach [Meye1996, Dewh2003, Sutt2000] to our problem is to store pointers to the instances in an array (or std::vector), and process that. It obviates the problem entirely, and is the preferred solution in many cases.

However, this only covers those cases where you wish to manipulate types via (virtual) functions. This is not always appropriate, since it's often desirable to provide simple wrapper classes for C-API structures (see sections 3.2 and 4.4). Also, it can sometimes be desirable (although this is rare; see Chapter 21) to inherit from non-vtable-polymorphic types.

One final disadvantage is that it imposes costs both on the algorithm manipulating the code due to the extra indirection incurred for each element, and on the calling code in allocating and instantiating the array and the item references in the array independently, as in Listing 14.5.

Listing 14.5.



void process_array(Base *ab[], size_t cb)


{


  for(Base **end = ab + cb; ab != end; ++ab)


  {


    print_Base(**ab); // Process each element. Might also need


                      // to test *ab is not null!


  }


}





int main()


{


  Derived ad[10];


  Base    *apb[dimensionof(ad)];


  for(int i = 0; i < dimensionof(ad); ++i)


  {


    apb[i] = &ad[i];


  }


  process_array(apb, 10); // Ok, but was it worth the effort?


  . . .

Since we're in the business of exploring all our options, we'll proceed to look at the alternatives.

14.6.2 Provide Nondefault Constructors

The first thing one can do is to prevent any arrays from being created. Arrays of class types may only be declared when the class type provides an accessible default constructor, one with defaulted arguments, or no constructor at all. If one can ensure that derived classes do not contain default constructors, then the problem is avoided. However, since there is no way to define a base class such that its derived classes may not contain default constructors, this is barely a solution when all classes are written by a single author, or within a team or by a development organization that is subject to thorough code reviews. It has no hope in other circumstances.

14.6.3 Hide Vector new and delete

It is possible to influence the array nature of derived types by hiding the vector new and delete operators—operator new[]() and operator delete[]()—in the base class:



class Base


{


  std::string  s;


private:


  void *operator new [](size_t);


  void operator delete [](void *);


};





int main()


{


  Base    *pbs  = new Base[5];    // Illegal - inconvenient


  Derived *pds  = new Derived[5]; // Illegal - good


  Base    ab[5];                  // Still legal - good


  Derived ad[5];                  // Still legal – bad!


  . . .

Hiding the vector new and delete operators only proscribes the allocation of arrays from the heap. It does not prevent one from declaring a stack-based array of derived instances and then passing that to a function expecting an array of parent instances.

Furthermore, there is nothing to stop the authors of derived classes from providing publicly accessible vector new and delete operators. Hence, the primary effect of this method is to prevent us from creating a heap array of Base; since it doesn't prevent the things we want to avoid, it's pretty useless!

14.6.4 Use std::vector

As I said in the previous section, when it comes to storing and manipulating variable sized arrays, I believe there are very few reasons (see section 32.2.8 for one such reason) to look past std::vector, and that is certainly the advice from the experts [Sutt2000, Stro1997, Meye1996, Dewh2003].

Most C++ afficionados would suggest that using std::vector would represent the solution to our array processing problem:



void process_array(std::vector<Base> &ab)


{


  std::for_each(ab.begin(),ab.end(), . . .); // Process all elements


}





int main()


{


  std::vector<Base>     ab(10);


  std::vector<Derived>  ad(10);


  process_array(ab); // Ok


  process_array(ad); // Compile error


  . . .

However, what you might use in your client code is a different matter from what a library function should be able to handle. This solution is certainly type-safe, since std::vector<Base> is an entirely different beast than std::vector<Derived> and the two cannot interoperate (without some sledgehammer casting). But despite this, I think that the advice is flawed.

First, there are circumstances where arrays are required—for example, when of static storage—and std::vector (or any other arraylike container) simply will not suffice.

Second, the elements may already be stored somewhere else, perhaps as part of a larger range within another vector. If we want to pass a subset of that vector's contents to our function, we must copy them out, pass them, and, for non-const manipulation, copy them back in again.^[9] This is because vector, like all standard library containers, stores and manipulates elements by value. Even if we could somehow guarantee the consistency in a real system during this process, imagine the performance implications! (And don't even start to ponder the exception-safety ramifications!)

^[9] At this point you may well be suggesting that such copying might also be necessary for built-in arrays. Fear not—that will be taken care of in the solution.

14.6.5 Ensure That Types Are the Same Size

None of the previous suggested mechanisms represent an optimal or appropriate mechanism to defend us against the inappropriate use of derived-type arrays of inherited types. Before we look at the solution, we should consider the cases where, for all its dangers, using array algorithms on inherited types may be desirable. What if we could enforce the restriction that any derived class types used by the array processing functions have the same memory footprint as their base and are, therefore, safe to use in this context? If they're the same size, then the slicing issue is moot (see Chapter 21), and since a derived type is a base type, it's perfectly valid to treat it as such in the processing function.

So we allow arrays of inherited types when the sizes are the same. The question, then, is how do we ensure that they are the same size? Expert reviewers may have the skills to determine this in code reviews for limited cases, but the variety of factors that can influence this effect—template derivation (see Chapters 21 and 22), structure packing (see Chapters 13 and 14), derived class overhead (see section 12.4)—conspire to make this unrealistic in practice. Even where reviews are conducted—which is all too rarely—they are not guaranteed to catch all errors and are only part of the verification armory [Glas2003].

We could have assertions (see section 1.4) in the code, but run time assertions may not be fired (i.e., incomplete coverage of code paths, release build testing). Compile-time assertions are a lot better, but do not provide obvious error messages and their omission from a particular derived class may slip under the reviewing radar. A better way is to use a constraint (see section 1.2). A constraint is a special piece of code, usually a template class, which serves to enforce a design assumption. This enforcement usually takes the form of a compile-time error, such as the inability to convert one type to another. Since we want our types to be the same size, we use the imaginatively named must_be_same_size constraint (see section 1.2.5).

Now we have the tool to detect inappropriate derived-type arrays, but where do we use it? In fact the answer is suggested by the std::vector solution, in that the parameterization of templates by inheritance-related types results in unrelated types. Our final solution takes the form of an enhanced version of the array_proxy template that we saw in section 14.4. Listing 14.6 shows its full form, incorporating the constraint and some extra member template constructors.

Listing 14.6.



template <typename T>


class array_proxy


{


public:


  typedef T                 value_type;


  typedef array_proxy<T>    class_type;


  typedef value_type        *pointer;


  typedef value_type        *const_pointer; // Non-const!


  typedef value_type        &reference;


  typedef value_type        &const_reference; // Non-const!


  typedef size_t            size_type;


// Construction


public:


  template <size_t N>


  explicit array_proxy(T (&t)[N]) // Array of T


    : m_begin(&t[0])


    , m_end(&t[N])


  {}


  template <typename D, size_t N>


  explicit array_proxy(D (&d)[N]) // Array of T-compatible type


    : m_begin(&d[0])


    , m_end(&d[N])


  {


    // Ensures that D and T are the same size.


    constraint_must_be_same_size(T, D);


  }


  template <typename D>


  array_proxy(array_proxy<D> &d)


    : m_begin(d.begin())


    , m_end(d.end())


  {


    // Ensures that D and T are the same size.


    constraint_must_be_same_size(T, D);


  }


// State


public:


  pointer           base();


  const_pointer     base() const;


  size_type         size() const;


  bool              empty() const;


  static size_type  max_size();


// Subscripting


public:


  reference       operator [](size_t index);


  const_reference operator [](size_t index) const;


// Iteration


public:


  pointer       begin();


  pointer       end();


  const_pointer begin() const;


  const_pointer end() const;


// Members


private:


  pointer const m_begin;


  pointer const m_end;


// Not to be implemented


private:


  array_proxy &operator =(array_proxy const &);


};

The first constructor sets the member pointers m_begin and m_end to the start and (one past) the end of the array to which it is applied.

Using the array_proxy we can rewrite process_array():



void process_array(array_proxy<Base> ab)


{


  std::for_each(ab.begin(), ab.end(), . . .); // Process all elements


}

In this case, process_array() was written to take a non-const array_proxy by value, because the processing of the elements may want to change them. If your function requires only read only access, it is potentially slightly more efficient to declare it as passing array_proxy<T> const &, though the difference in performance is unlikely to show up on any profiler you can use, given the likely relative cost of the internals of process_array() itself.

We can expand the example, to include a Derived_SameSize type that derives from Base but does not change the memory footprint (see section 12.4). It is now valid to pass arrays of Derived_SameSize to process_array(), and the new array_proxy facilitates that via its other two template constructors.



Derived_SameSize


  : public Base


{};





void main()


{


  Base              ab[10];


  Derived           ad[10];


  Derived_SameSize  ads[10];


  process_array(make_array_proxy(ab));   // Ok


  process_array(make_array_proxy(ad));   // Compiler error. Good!


  process_array(make_array_proxy(ads));  // Ok – very smart


}

This is a complete solution to the problem. It is efficient (there is no cost on any decent compiler), it is type safe, and it completely empowers the designer of functions to defend themselves (or rather their code) against potential misuse by derivers of their types. Furthermore, it is smart enough to facilitate the case where the inherited types are of the same size as the parent type and to allow them to be proxied!

One final advantage is that it is now not possible to pass the wrong array extent to process_array, as it certainly was with the original two-parameter version, allowing us to adhere to the DRY principle.

This technique does not impose protection on hierarchies of types, so one could argue that it fails to adequately guard against passing around arrays of inheritance-related types. But I believe that that's the wrong way to look at it. It is the functions—free functions, template algorithms, member functions—themselves that need protecting from being passed arrays of the wrong type. Since it is within the purview of the author of any such functions to stipulate array_proxy<T> rather than T*, he/she can have all the protection considered necessary.