Gotcha #29: Converting through `void *`

Even C programmers know that a void * is second cousin to a cast and should be avoided to the extent possible. As with a cast, converting a typed pointer to void * removes all useful type information. Typically, the original type of the pointer must be "remembered" and restored when the void * is used. If the type is resupplied correctly, everything will work fine (except, of course, that having to remember types for later casting implies that the design needs work).



void *vp = new int(12); 


// . . .


int *ip = static_cast<int *>(vp); // will work

Unfortunately, even this simple use of void * can open the door to portability problems. Remember that static_cast is the cast operator we use (when we must cast) for relatively safe and portable conversions. For example, one might use a static_cast to cast from a base class pointer to a publicly derived class pointer. For unsafe, platform-dependent conversions, we're forced to use reinterpret_cast. For example, one might use a reinterpret_cast to cast from an integer to a pointer or between pointers to unrelated types:



char *cp = static_cast<char *>(ip); // error! 


char *cp = reinterpret_cast<char *>(ip); // works.

The use of reinterpret_cast is a clear indication to you and to the readers and maintainers of your code that you're not only casting but that you're casting in a potentially nonportable way. Use of a void * intermediary allows that important warning to be circumvented:



char *cp = static_cast<char *>(vp); // put int addr into a char *!

It gets worse. Consider a user interface that allows the address of a "Widget" to be stored and later retrieved:



typedef void *Widget; 


void setWidget( Widget );


Widget getWidget();

Users of this interface recognize that they have to remember the type of Widget they set, so they can restore its type information when it's retrieved:



// In some header file . . . 


class Button {


   // . . .


};


class MyButton : public Button {


   // . . .


};


// elsewhere . . .


MyButton *mb = new MyButton;


setWidget( mb );





// somewhere else entirely . . .


Button *b = static_cast<Button *>(getWidget()); // might work!

This code will usually work, even though we lose some type information when we extract the Widget. The stored Widget refers to a MyButton but is extracted as a Button. The reason this code will often work has to do with the likely way that the storage for a class object is laid out in memory.

Typically, a derived class object contains the storage for its base class subobject starting at offset 0, as if its base class part were the first data member of the derived class, and simply appends any additional derived class data below that, as in Figure 4-1. Therefore, the address of a derived class object is generally the same as that of its base class. (Note, however, that the standard guarantees correct results only if the address in the void * is converted to exactly the same type used to set the void *. See Gotcha #70 for one way this code could fail even under single inheritance.)

Figure 4-1. Likely layout of a derived class under single inheritance

graphics/04fig01.gif

However, this code is fragile, in that a remote change during maintenance may introduce a bug. In particular, a straightforward and proper use of multiple inheritance may break the code:



// in some header file . . . 


class Subject {


   // . . .


};


class ObservedButton : public Subject, public Button {


   // . . .


};


// elsewhere . . .


ObservedButton *ob = new ObservedButton;


setWidget( ob );


// . . .


Button *badButton = static_cast<Button *>(getWidget()); //


disaster!

The problem is with the layout of the derived class object under multiple inheritance. An ObservedButton has two base class parts, and only one of them can have the same address as the complete object. Typically, storage for the first base class (in this case, Subject) is placed at offset 0 in the derived class, followed by the storage for subsequent base classes (in this case, Button), followed by any additional derived class data members, as in Figure 4-2. Under multiple inheritance, a single object commonly has multiple valid addresses.

Figure 4-2. Likely layout of an object under multiple inheritance. An `ObservedButton` object contains subobjects for both its `Subject` and `Button` base classes. Loss of type information caused `badButton` to refer to a non-`Button` address.

graphics/04fig02.gif

Ordinarily this is not a problem, since the compiler is aware of the various offsets and can perform the correct adjustments at compile time:



Button *bp = new ObservedButton; 


ObservedButton *obp = static_cast<ObservedButton *>(bp);

In the code above, bp correctly points to the Button part of the ObservedButton object, not to the start of the object. When we cast from a Button pointer to an ObservedButton pointer, the compiler is able to adjust the address so that it points to the start of the ObservedButton object. It's not hard, since the compiler knows the offset of each base class part within a derived class, as long as it knows the type of the base and derived classes.

And that's our problem. When we use setWidget, we throw away all useful type information. When we cast the result of getWidget to Button, the compiler can't perform the adjustment to the address. As a result, the Button pointer is actually referring to a Subject!

Void pointers do have their uses, as do casts, but they should be used sparingly. It's never a good idea to use a void * as part of an interface that requires one use of the interface to resupply type information lost through another use.

[ Team LiB ]

Gotcha #29: Converting through void *

Figure 4-1. Likely layout of a derived class under single inheritance

Figure 4-2. Likely layout of an object under multiple inheritance. An ObservedButton object contains subobjects for both its Subject and Button base classes. Loss of type information caused badButton to refer to a non-Button address.

Gotcha #29: Converting through `void *`

Figure 4-2. Likely layout of an object under multiple inheritance. An `ObservedButton` object contains subobjects for both its `Subject` and `Button` base classes. Loss of type information caused `badButton` to refer to a non-`Button` address.