Gotcha #8: Failure to Distinguish Access and Visibility

The C++ language does not implement data hiding; it implements access protection. Private and protected members of a class are not invisible, just inaccessible. Like many other visible but inaccessible objects (managers come to mind), they can cause problems.

The most obvious problem is the need to recompile code that uses a class even though only an "invisible" aspect of its implementation has changed. Consider a simple class that has added a new data member:



class C { 


 public:


   C( int val ) : a_( val ),


         b_( a_ ) // new


   {}


   int get_a() const { return a_; }


   int get_b() const { return b_; } // new


 private:


   int b_; // new


   int a_;


};

In this case, a number of aspects of the class have changed, some of which are visible and some of which are not.

Visibly, the size of the class has changed, due to the addition of the new data member. This will affect all code that uses an object of the class, dereferences or performs arithmetic on a pointer to the class, or in some way references the size of the class or the names of its members. Notice also that the placement of the new data member will affect the offset of a_ within the class, invalidating all existing references to the a_ member and any pointer to members that refer to it. Additionally, the behavior of the constructor's member initialization list is now incorrect, because b_ is initialized to an undefined value (see Gotcha #52).

The major invisible changes concern the meanings of the implicit copy constructor and copy assignment operator supplied by the compiler for class C. By default, these were defined as inline functions and, therefore, were inserted into any code that initialized or assigned one C with another (see Gotcha #49).

The major effect of the modification of C (aside from the bug mentioned above) is the need to recompile nearly all uses of C. In large projects, such recompilation can be time-consuming. If C is defined in a header file, all code that (transitively) includes that header file must be recompiled. One way to improve this situation is to "forward declare" the class C by using an incomplete class declaration in contexts where more information about the class is not required:



class C;

Such an incomplete declaration will still allow us to declare pointers and references to a C as long as we perform no operations that require the knowledge of C's size or members, including base class subobjects (but see Gotcha #39).

This approach can be effective, but to avoid maintenance problems, it's important to pick up the incomplete class declaration from the same source that supplies the class definition. That is, the provider of a facility of significant complexity used in this way should provide a "forward declaration" header file that supplies an appropriate set of forward declarations.

For example, if the full definition of class C is presented in the header file c.h, we might consider providing a file called cfwd.h that contains the incomplete class declaration. Uses that didn't require the full definition of C would include cfwd.h rather than c.h. The reason for providing the forward declaration file is that the definition of C may change in the future to a form incompatible with a simple forward declaration. For example, C may be reimplemented as a typedef name:



template <typename T> 


class Cbase {


   // . . .


};


typedef Cbase<int> C;

Clearly, the provider of the c.h header file is trying to avoid forcing source code changes on the present users of class C, but any code that contains an incomplete declaration of class C will now be in error:



#include "c.h" 


// . . .


class C; // error! C is a typedef-name

The availability of a cfwd.h file would circumvent these problems. This approach is used in the implementation of the iostream standard library in the header iosfwd that corresponds to the header iostream.

More commonly, the need for recompilation of code that uses C makes it difficult to patch updates (bug fixes, typically) into installed software. Probably the most effective way of separating the interface of a class from its implementation, and thereby achieving true data hiding, is to employ the Bridge pattern.

Applying the Bridge pattern to a class involves separating the class into two parts, an interface and an implementation:

gotcha08/cbridge.h



class C { 


 public:


   C( int val );


   ~C();


   int get_a() const;


 int get_b() const;


 private:


   Cimpl *impl_;


};

gotcha08/cbridge.cpp



class Cimpl { 


 public:


   Cimpl( int val ) : a_( val ), b_( a_ ) {}


   ~Cimpl() {}


   int get_a() const { return a_; }


   int get_b() const { return b_; }


 private:


   int a_;


   int b_;


};


C::C( int val )


   : impl_( new Cimpl( val ) ) {}


C::~C()


   { delete impl_; }


int C::get_a() const


   { return impl_->get_a(); }


int C::get_b() const


   { return impl_->get_b(); }

The interface contains the original interface of class C, but the implementation of the class has been moved to an implementation class hidden from general use. The new version of C contains just a pointer to the implementation, and the entire implementation—including the member functions of C—is hidden from client code. Any change to the implementation of C that doesn't affect the class's interface will now be restricted to a single implementation file.

Employing a Bridge incurs a clear runtime cost, in that a C now requires two objects for its representation rather than one, and each member function call is both indirect and non-inline. However, the advantages of massively reduced compile times and the ability to update client code without recompilation often outweigh the additional runtime cost. This technique has been used extensively for many years and goes by a number of amusing names, including the "pimpl idiom" and the "Cheshire Cat technique."

Inaccessible members can also affect the meanings of derived classes and base classes when accessed through a derived class interface. For example, consider the following base and derived class:



class B { 


 public:


   void g();


 private:


   virtual void f(); // new


};


class D : public B {


 public:


   void f();


 private:


   double g; // new


};

The addition of a private virtual function in the base class B has made a formerly nonvirtual derived class function virtual. The addition of a private data member in D has hidden a function inherited from B. Inheritance is often known as "white-box" reuse, since changes to classes affect the meaning of base and derived classes in a fundamental way.

One way to mitigate these problems is to employ a simple naming convention that partitions names by their general function. Typically, it's best to have different conventions for type names, private data members, and all other names. In this book, our convention is to capitalize type names, append an underscore to class data members (all of which are private!), and (with few exceptions) start other names with a lowercase letter. Following this convention would have prevented our hiding the base class member function g in D, above. Above all, resist the temptation to establish a complex naming convention, because such a convention is unlikely to be followed.

Additionally, never attempt to encode a variable's type in its name. For instance, calling an integer index iIndex is actively damaging to understanding and maintaining the code. First, a name should describe a program entity's abstract meaning, not how it's implemented (data abstraction can apply even to predefined types). Second, in the common case that the variable's type changes, just as common is that its name doesn't change in sync. The variable's name then becomes an effective source of misinformation about its type.

Other approaches are discussed elsewhere, especially in Gotchas #70, #73, #74, and #77.

[ Team LiB ]