Gotcha #89: Arrays of Class Objects

Be wary of arrays of class types, especially of base class types. Consider an "applicator" function that applies a function to each element of an array:

gotcha89/apply.cpp



void apply( B array[], int length, void (*f)( B & ) ) { 


   for( int i = 0; i < length; ++i )


       f( array[i] );


}


// . . .


D *dp = new D[3];


apply( dp, 3, somefunc ); // disaster!

The trouble is that the type of the first formal argument to apply is "pointer to B," not "array of B." As far as the compiler is concerned, we're initializing a B * with a D *. This is legal if B is a public base class of D, since a D is-a B. However, an array of D is not an array of B, and the code will fail badly when we attempt pointer arithmetic using B offsets on an array of D objects.

Figure 9-1 illustrates the situation. The apply function expects the array pointer to refer to an array of B (on the left of the diagram), but it actually refers to an array of D (on the right). Recall that indexing is really just shorthand for pointer arithmetic (see Gotcha #7), so the expression array[i] is equivalent to *(array+i). Unfortunately, the compiler will perform the pointer addition with the assumption that array refers to a base class object. If a derived class object is larger than or has a different layout from a base class object, the index operation will result in an incorrect address.

Figure 9-1. Pointer arithmetic used to access the elements of an array of base class objects usually doesn't work for an array of derived objects.

graphics/09fig01.gif

Incremental attempts to make the array behave sensibly fail. If the base class B were declared to be abstract (a good idea in general), that would prevent any arrays of B from being created, but the apply function would still be legal (if incorrect), since it deals with pointers to B rather than B objects. Declaring the formal argument to be a reference to an array (as in B (&array)[3]) is effective but not practical, as we must then fix the size of the array to a given bound (in this case, 3) and cannot pass a pointer (to an allocated array, for instance) as an actual argument.

Arrays of base class objects are just plain inadvisable, and arrays of class objects in general have to be watched closely.

Using a generic algorithm in place of a function hard-coded to a specific type can be an improvement:



for_each( dp, dp+3, somefunc );

The use of the standard for_each algorithm allows the compiler to perform argument type deduction on the arguments to the function template. The implicit conversion from derived class to public base is not a problem, because no such conversion is performed. The compiler will instantiate a version of for_each for the derived class D. Unfortunately, this is a different solution from our original design, in that we've swapped a runtime polymorphic approach for a compile-time one.

A better approach is to use an array of pointers to class objects rather than using an array of objects. This allows polymorphic use of the array without the associated pointer arithmetic issues:



void apply_prime( B *array[], int length, void (*f)( B * ) ) { 


   for( int i = 0; i < length; ++i )


       f( array[i] );


}

Often, an even better approach is to dispense with arrays entirely and employ instead one of the standard containers, generally a vector. The use of a strongly typed container avoids the possibility of pointer arithmetic problems for containers of class objects, and a container of pointers to a base class allows polymorphic use:



vector<B> vb; // no Ds allowed! 


vector<B *> vbp; // polymorphic

[ Team LiB ]