Gotcha #78: Failure to Grok Virtual Functions and Overriding

Many novice C++ programmers have only a superficial understanding of the mechanics of overriding as it's implemented in C++. Sometimes an illustration of the mechanics of the implementation of overriding helps to clarify things. There are a number of different effective mechanisms for implementing virtual functions and overriding in C++. The treatment below describes one common approach.

Let's look first at a simple implementation for single inheritance.



class B { 


 public:


   virtual int f1();


   virtual void f2( int );


   virtual int f3( int );


};

In this implementation of virtual functions, each virtual function contained within a class is assigned an index by the compiler. For example, B::f1 is assigned index 0, B::f2 is assigned index 1, and so on. These indexes are used to access a table of pointers to functions. The table element at index 0 contains the address of B::f1, the element at index 1 contains the address of B::f2, and so on. Each object of the class contains a pointer, inserted implicitly by the compiler, to the table of function pointers. An object of type B might be laid out as in Figure 7-5.

Figure 7-5. A simple implementation of virtual functions under single inheritance

graphics/07fig05.gif

Colloquially, the table of function pointers is called the "vtbl," pronounced "vee table," and the pointer to vtbl is called the "vptr," pronounced "vee pointer." The constructors for class B initialize the vptr to refer to the appropriate vtbl (see Gotcha #75). Calling a virtual function involves indirection through the vtbl. The function call



B *bp = new B; 


bp->f3(12);

is translated something like this:



(*(bp->vptr)[2])(bp, 12)

We get the address of the function to call by indexing the vtbl with that function's index. We then make an indirect call, passing the address of the object as the implicit "this" argument to the function. The virtual function mechanism in C++ is efficient. The indirect function call is generally highly optimized for each hardware architecture, and all objects of the same type typically share a single vtbl. Under single inheritance, each object has a single vptr, no matter how many virtual functions are declared in the class.

Let's look at the implementation of a derived class that overrides some of its base class's virtual functions:



class B { 


 public:


   virtual int f1();


   virtual void f2( int );


   virtual int f3( int );


};


class D : public B {


   int f1();


   virtual void f4();


   int f3( int );


};

An object of type D contains a subobject of type B. Typically, but not universally (see Gotcha #70), the base class subobject is located at the start of the derived class object (that is, at offset 0), and any additional derived class data members are appended after the base class part, as in Figure 7-6.

Figure 7-6. A simple implementation of virtual functions under single inheritance for a derived class object. The base class subobject still contains a vptr, but it refers to a table customized for the derived class.

graphics/07fig06.gif

Let's look at the same virtual member function call we saw earlier, but this time we'll use a D object rather than a B object:



B *bp = new D; 


bp->f3(12);

The compiler will generate the same calling sequence, but this time we'll bind at runtime to the function D::f3 rather than B::f3:



(*(bp->vptr)[2])(bp, 12)

The utility of the virtual function mechanism is more obvious in truly polymorphic code, where the precise type of object being manipulated is unknown:



B *bp = getSomeSortOfB(); 


bp->f3(12);

The virtual calling sequence generated by the compiler is capable of calling, without recompilation, the f3 function of any class derived from B, even of classes that do not yet exist.

Mechanically speaking, overriding is the process of replacing the address of a base class member function with the address of a derived class member function when constructing a virtual function table for a derived class. In our example above, class D has overridden the base class virtual functions f1 and f3, inherited the implementation of f2, and added a new virtual function f4. This is reflected precisely in the structure of the virtual table for class D.

The mechanics of virtual functions under multiple inheritance are more complex in their details but employ essentially the same approach. The additional complexity is the result of a single object's having more than one base class subobject and therefore more than one valid address. Consider the following hierarchy:



class B1 { /* . . . */ }; 


class B2 { /* . . . */ };


class D : public B1, public B2 { /* . . . */ };

A derived class object can be manipulated through the interface of any of its public base classes; this is the meaning of the is-a relationship. Therefore, an object of type D can be referred to through pointers or references to D, B1, or B2:



D *dp = new D; 


B1 *b1p = dp;


B2 *b2p = dp;

Only one base class subobject can be located at offset 0 in a derived class object, so base class subobjects are typically allocated in the order in which they appear on the base class list in the derived class definition. In the case of D, the storage for B1 will come first, followed by that for B2, as in Figure 7-7 (see Gotcha #38).

Figure 7-7. Likely layout of an object under multiple inheritance

graphics/07fig07.gif

Let's flesh out this simple multiple-inheritance hierarchy with some virtual functions:



class B1 { 


 public:


   virtual void f1();


   virtual void f2();


};


class B2 {


 public:


   virtual void f2();


   virtual void f3( int );


   virtual void f4();


};

The B1 and B2 classes each have virtual functions, so objects of these types will each contain a vptr to a class-specific vtbl, as in Figure 7-8.

Figure 7-8. Two potential base classes

graphics/07fig08.gif

A D object is-a B1 and is-a B2, so it will have two vptrs and two associated vtbls (see Figure 7-9):



class D : public B1, public B2 { 


 public:


   void f2();


   void f3( int );


   virtual void f5();


};

Figure 7-9. Possible implementation of virtual functions under multiple inheritance. The complete object overrides virtual functions for both of its base class subobjects.

graphics/07fig09.gif

Notice that D::f2 overrides the f2 in both of its base classes. An overriding derived class function will override every base class virtual function with the same name and signature (number and type of formal arguments), whether the base class is a direct base class or a base class of a base class (of a base class …). Note that even though D adds a new virtual function (D::f5), the compiler doesn't insert a vtpr into the D-specific part of the object. Typically, new derived class virtual functions will be appended to one of the base class virtual function tables.

We do have a problem, though. Let's look at some possible code:



B2 *b2p = new D; 


b2p->f3(12);

We're going to engage in the common practice of manipulating a derived class object through one of its base class interfaces. However, if we generate the same calling sequence we did under the single-inheritance model we examined earlier, we'll wind up with a bad value for the this pointer:



(*(b2p->vptr)[1])(b2p,12)

The reason is that the call is dynamically bound to D::f3, which is expecting an implicit this argument that refers to the start of a D object. Unfortunately, b2p refers to the start of a B2 (sub)object, which is offset some number of bytes into the D object in which it's embedded. (Refer to Figure 7-7.) It's necessary to "fix up" the value of this passed in the call by adjusting the value of b2p to refer to the start of the D object.

Fortunately, when it's constructing the vtbl for a derived class, the compiler knows precisely what these fix-up values are, since it knows precisely the class for which it's constructing the vtbl and the offsets of the various base class subobjects within the derived class. There are several common ways to apply this fix-up information, from small sections of code (misnamed "thunks") executed before the actual function is attained, to member functions with multiple entry points. Conceptually, the cleanest way to represent the operation is simply to record the required offset value in the vtbl and modify the calling sequence to take the offset into account, as in Figure 7-10.

Figure 7-10. One of many possible implementations of virtual functions under multiple inheritance. This implementation records the fix-up values for the `this` pointer in the virtual function table itself.

graphics/07fig10.gif

The vtbl entries are now small structures containing the member function address (fptr) and an offset (delta) to add to the this value, and the calling sequence becomes



(*(b2p->vptr)[1].fptr)(b2p+(b2p->vptr)[1].delta,12)

This code can be heavily optimized, so it's not as expensive as it might look.

[ Team LiB ]