[ Team LiB ] Previous Section Next Section

Gotcha #44: References and Temporaries

A reference is an alias for its initializer (see Gotcha #7). After initialization, a reference may be freely substituted for its initializer with no change in meaning. Well…mostly.



int a = 12; 


int &r = a;


++a; // same as ++r


int *ip = &r; // same as &a


int &r2 = 12; // error! 12 is a literal


A reference has to be initialized with an lvalue; basically, this means that its initializer must have an address as well as a value (see Gotcha #6). Things are a little more complex with a reference to const. The initializer for a reference to const must still be an lvalue, but the compiler is willing, in this case, to create an lvalue from a non-lvalue initializer:



const int &r3 = 12; // OK. 


The reference r3 is an alias for an anonymous temporary int allocated and initialized implicitly by the compiler. Ordinarily, the lifetime of a compiler-generated temporary is limited to the largest expression that contains it. However, in this case, the standard guarantees that the temporary will exist as long as the reference it initializes. Note that the temporary has no connection to its initializer, so the rather unsightly and dangerous code below will, fortunately, not affect the value of the literal 12:



const_cast<int &>(r3) = 11; // assign to temporary, or abort . . . 


The compiler will also manufacture a temporary for an lvalue initializer that is a different type from the reference it initializes:



const string &name = "Fred"; // OK. 


short s = 123;


const int &r4 = s; // OK.


Here we can run into some semantic difficulties, since the notion of reference as an alias for its initializer is becoming tenuous. It's easy to forget that the reference's initializer is actually an anonymous temporary and not the initializer that appears in the source text. For example, any change to the short s won't be reflected in the reference r4:



s = 321; // r4 still == 123 


const int *ip = &r4; // not &s


Is this really a problem? It can be, if you help it along. Consider an attempt at portability through the use of typedefs. Perhaps a project-wide header file attempts to fix platform-independent standard names for different-sized integers:



// Header big/sizes.h 


typedef short Int16;


typedef int Int32;


// . . .





// Header file small/sizes.h


typedef int Int16;


typedef long Int32;


// . . .


(Please note that we didn't use #if to jam the typedefs for all platforms into a single file. That's evil and will end up ruining your weekends, reputation, and life. See Gotcha #27.) There's nothing wrong with this, as long as all developers use the names consistently. Unfortunately, they don't always do that:



#include <sizes.h> 


// . . .


Int32 val = 123;


const int &theVal = val;


val = 321;


cout << theVal;


If we develop on the "large" platform, theVal is an alias for val, and we'll shift 321 to cout. If we later take advantage of our supposed platform independence and recompile for the "small" platform, theVal will refer to a temporary, and we'll shift 123. This change in meaning will occur silently, of course, and will typically not be as obvious as a changed line of output.

Another potential problem is that the initialization of a reference to constant can open up a temporary-lifetime problem. We've seen that the compiler will make sure that such a temporary lives as long as the reference it initializes, which seems like a safe procedure. Let's look at a simple function:



const string & 


select( bool cond, const string &a, const string &b ) {


   if( cond )


       return a;


   else


       return b;


}


// . . .


string first, second;


bool useFirst = false;


// . . .


const string &name = select( useFirst, first, second ); // OK


At first glance, this function seems innocuous. After all, it simply returns one of its arguments. The problem is in that return. Let's have a look at another function that's more obviously problematic:



const string &crashAndBurn() { 


   string temp( "Fred" );


   return temp;


}


// . . .


const string &fred = crashAndBurn();


cout << fred; // oops!


Here, we're explicitly returning a reference to a local variable. On return, the local variable will be destroyed, and a user of the function will be left with a handle to the destroyed object. Luckily, most compilers will warn about this situation. But they won't warn about the following one, because, in general, they can't:



const string &name = select( useFirst, "Joe", "Besser" ); 


cout << name; // oops!


The problem is that the second and third arguments to the select function are references to constant, so they'll be initialized with temporary string objects. While these temporaries aren't local to the select function, they'll live only until the end of the largest enclosing expression, which is after the return from select but before the return value is used. A working alternative would be to embed the function call in a larger expression:



cout << select( useFirst, "Joe", "Besser" ); // works, fragile 


This is the kind of code that works when written by an expert and breaks when maintained by a novice.

A safer procedure is to avoid returning a formal argument that's a reference to constant. In the case of our select function, we have at least two reasonable choices. The standard string is not a polymorphic type (that is, it has no virtual functions), and therefore we're allowed to assume that the reference arguments are bound to strings and not to objects of a type derived from string. We can therefore return by value without fear of slicing, but we'll incur some cost in invoking string's copy constructor to initialize the return value:



string 


select( bool cond, const string &a, const string &b ) {


   if( cond )


       return a;


   else


       return b;


}


Another alternative would be to declare the formal arguments to be reference to non-const, which would cause a compile-time error if temporaries were required for their initialization. This would simply render our example above illegal:



string & 


select( bool cond, string &a, string &b ) {


   if( cond )


       return a;


   else


       return b;


}


Neither of these options is attractive, but either is better than the alternative of leaving your code open to an insidious bug.

    [ Team LiB ] Previous Section Next Section