Gotcha #56: Direct versus Copy Initialization

I've seen some pretty sloppy initializations in my day. Consider a simple class Y:



class Y { 


 public:


   Y( int );


   ~Y();


};

It's not uncommon to see a simple initialization of a Y object written any of three different ways, as if they were equivalent. As if it didn't matter. As if.



Y a( 1066 ); 


Y b = Y(1066);


Y c = 1066;

In point of fact, all three of these initializations will probably result in the same object code being generated, but they're not equivalent. The initialization of a is known as a direct initialization, and it does precisely what one might expect. The initialization is accomplished through a direct invocation of Y::Y(int).

The initializations of b and c are more complex. In fact, they're too complex. These are both copy initializations. In the case of the initialization of b, we're requesting the creation of an anonymous temporary of type Y, initialized with the value 1066. We then use this anonymous temporary as a parameter to the copy constructor for class Y to initialize b. Finally, we call the destructor for the anonymous temporary. Essentially, we've requested that the compiler generate something like the following code:



Y temp( 1066 ); // initialize temporary 


Y b( temp ); // copy construction


temp.~Y(); // destructor activation

The semantics of the initialization of c are the same, but the creation of the anonymous temporary is implicit.

Let's change the implementation of Y somewhat by adding our own copy constructor and see what happens:



class Y { 


 public:


   Y( int );


   Y( const Y & )


       { abort(); }


   ~Y();


};

Clearly, Y objects have no intention of putting up with any copy construction. However, when we recompile and run our little program, all three initializations may well go off without terminating the process. What gives?

What gives is that the standard explicitly allows the compiler to perform a program transformation to remove the temporary generation and copy constructor call and to generate the same code as in the case of a direct initialization. Note that this is not a simple "optimization," since the actual behavior of the program is altered (in this case, we didn't terminate the process). Most C++ compilers will perform the transformation, but they're not required to do so by the standard. Given this uncertainty, it's always a good idea to say precisely what you mean and to use direct initialization in declaration of class objects:



Y a(1066), b(1066), c(1066);

Perversely, you may want to ensure that the compiler does not perform the transformation, because you want some side effect that temporary generation and copy construction provide, or you may just want to produce a large, slow application. Unfortunately, it's not easy to ensure these semantics, since any standard compiler is free to perform the transformation. Avoiding the transformation in a portable way (without benefit of a platform-specific compile switch or #pragma) is too horrible to contemplate, so let's just have a quick look at it:



struct { 


   char b_[sizeof(Y)];


} aY; // aligned buffer as big as a Y


new (&aY) Y(1066); // create temp


Y d( reinterpret_cast<Y &>(aY) ); // copy ctor


reinterpret_cast<Y &>(aY).~Y(); // destroy temp

This will almost duplicate the meaning of the untransformed initialization. (The storage for aY will probably not be reused later in the stack frame, the way the storage for a compiler-generated temporary might. See Gotcha #66.) But there are easier ways to write big and slow programs.

An important point to understand about this program transformation is that the compiler applies it after the original semantics have been checked. If the untransformed initialization is incorrect, the compiler will issue an error, even if the transformation would have produced correct code. Consider a class X:



class X { 


 public:


   X( int );


   ~X();


   // . . .


 private:


   X( const X & );


};





X a( 1066 ); // OK


X b = 1066; // error!


X c = X(1066); // error!

The untransformed initializations of b and c require access to X's copy constructor, but the designer of X has decided to disallow copy construction of X objects by making the copy constructor private. Even though the transformation would have eliminated the copy constructor calls, the code is still incorrect.

Direct and copy initialization apply to non-class types as well, but the results are predictable and portable:



int i(12); // direct 


int j = 12; // copy, same result

For the initialization of these types, feel free to use whichever form is clearest. However, note that it's usually best to use direct initialization within a template, where the type of variable is not known until template instantiation. Consider a simplified sequence-length generic algorithm parameterized on not only the iterator type of the sequence (In) but also the type of its numeric counter (N):

gotcha56/seqlength.cpp



template <typename N, typename In> 


void seqLength( N &len, In b, In e ) {


   N n( 0 ); // this way, NOT "N n = 0;"


   while( b != e ) {


       ++n;


       ++b;


   }


   len = n;


}

With this implementation, the use of direct initialization allows us to employ an (admittedly unusual) user-defined numeric type that doesn't permit copy construction. An implementation of seqLength that employs copy initialization of an N object will not allow us to do so.

For simplicity and portability, it's a good idea to use direct initialization in declarations of class objects or of objects that might be of class type.

[ Team LiB ]