Gotcha #14: Evaluation Order Indecision

C++'s C roots are nowhere more evident than in the evaluation order traps it lays for the unwary. This item looks at several manifestations of the same problem: the C and C++ languages permit a lot of leeway in how expressions are evaluated. This flexibility can result in highly optimized code, but it also requires careful attention on the part of the programmer to avoid unfounded assumptions about evaluation order.

Function Argument Evaluation Order



int i = 12; 


int &ri = i;


int f( int, int );


//  . . .


int result1 = f( i, i *= 2 ); // unportable

Function argument evaluation is not fixed to a particular order. Therefore, the values passed to f could be 12 and 24 or 24 and 24. A careful programmer might decide not to modify an argument if it appears more than once in the same argument list, but this isn't safe either:



int result2 = f( i, ri *= 2 ); // unportable 


int result3 = f( p(), q() ); // dicey . . .

In the first case, ri is an alias for i, so the value of result2 is as ambiguous as that of result1. In the second case, we're assuming that the order in which the functions p and q are called doesn't matter. Even if that is currently the case, it may not be in the future, but that constraint on the implementations of p and q isn't documented anywhere.

It's best to minimize side effects in function arguments:



result1 = f( i, i*2 ); 


result2 = f( i, ri*2 );


int a = p();


result3 = f( a, q() );

Subexpression Evaluation Order

The evaluation order of subexpressions isn't fixed either:



a = p() + q();

The function p may be called before q, or vice versa. Precedence and associativity of operators doesn't affect evaluation order:



a = p() + q() * r();

The three functions p, q, and r may be evaluated in any of six different orders. The higher precedence of the multiplication operator ensures only that the results of the calls to q and r will be multiplied before being added to the result of the call to p. Likewise, the left associativity of the plus operator doesn't guarantee the order in which p, q, and r are called below; it ensures only that the results of the calls will be added from left to right:



a = p() + q() + r();

Parentheses don't help either:



a = (p() + q()) * r();

The results of p and q will be added first, but r may (or may not) be the first function called. The only reliable way to fix the order of subexpression evaluation is to use explicit, programmer-defined temporaries:



a = p(); 


int b = q();


a = (a + b) * r();

How often does this problem occur? Often enough to ruin a weekend or two every year. Consider Figure 2-1, a fragment of an abstract syntax tree hierarchy used to implement an arithmetic calculator.

Figure 2-1. An abstract syntax tree node hierarchy for a simple calculator (abbreviated). A plus node has left and right subtrees; an assignment node has a single subtree representing the right side of the assignment.

graphics/02fig01.gif

The following implementation is not portable:

gotcha14/e.cpp



int Plus::eval() const 


   { return l_->eval() + r_->eval(); }


int Assign::eval() const


   { return id->set( e_->eval() ); }

The problem lies in the implementation of Plus::eval, because the order of evaluation of the left and right subtrees isn't fixed. Does this really matter for addition? After all, addition is supposed to be commutative. Consider evaluation of the following expression:



(a = 12) + a

Depending on the order of evaluation of the left and right subtrees within Plus::eval, the value of the expression will be either 24 or the previous value of a + 12. If our calculator requires that the assignment be performed before the addition, the implementation of Plus::eval must use an explicit temporary to fix the evaluation order:

gotcha14/e.cpp



int Plus::eval() const { 


   int lft = l_->eval();


   return lft + r_->eval();


}

Placement `new` Evaluation Order

Admittedly, this one doesn't crop up a lot. The placement syntax for the new operator allows arguments to be passed not only to the initializer (generally a constructor) of the object being allocated but also to the operator new function that performs the allocation.



Thing *pThing = 


   new (getHeap(), getConstraint()) Thing( initval() );

The first argument list is passed to an operator new that can accept the arguments, and the second to a constructor for Thing. Note that the general warning about function argument evaluation order applies to each of these argument lists: we don't know whether getHeap or getConstraint will be called first. Additionally, we don't know whether the arguments for the operator new or for the Thing constructor will be evaluated first, although we do know that operator new will be called before the constructor (since we need to get storage for an object before we can initialize it).

Operators That Fix Evaluation Order

Some operators have a more dependable nature than others, if they're left alone. The comma operator does fix the evaluation order of its subexpressions:



result = expr1, expr2;

This statement evaluates expr1, then evaluates expr2, the result of which is assigned to result. This can be used to write some unusual code:



return f(), g(), h();

This author of this code needs more socialization. Use a more conventional coding style unless you actually want to confuse maintainers of your code:



f(); 


g();


return h();

The only common use of the comma operator is in the increment part of a for-statement, when more than one iteration variable is in use:



for( int i = 0, j = MAX; i <= j; ++i, --j ) // . . .

Note that the first comma in the declaration of i and j is not a comma operator. It's part of the declaration of the two integers i and j.

The "short-circuiting" logical operators && and || are more useful, in that they allow us to write complex conditions in a compact and idiomatic way:



if( f() && g() ) // . . . 


if( p() || q() || r() ) // . . .

The first expression says, "Call f. If the result is false, then the condition is false. If the result is true, then call g, and the value of the condition is the result of g." The second condition says, "Call p, q, and r in that order, but stop as soon as one of them succeeds. If all three calls fail, the condition is false; otherwise, it's true." Given their propensity for writing compact code, it's easy to see why C and C++ programmers use these operators so extensively.

The ternary conditional operator (pronounced "?:") also fixes the evaluation order of its arguments:



expr1 ? expr2 : expr3

The first expression, or condition, is evaluated first; then either the second or third expression is evaluated. The result of the conditional expression is the result of the expression that was evaluated.



a = f()+g() ? p() : q();

In this case, we have some assurance of evaluation order. We know that f and g will be called before p or q (although we don't know in what order they will be called) and that only one of p or q will be called. It might also be a good idea to add some strictly optional parentheses for readability:



a = (f()+g()) ? p() : q();

Otherwise, it's possible that a maintainer of the code, due to ignorance or haste, may make the erroneous assumption that the addition is performed after the conditional:



a = f()+(g() ? p() : q());

Improper Operator Overloading

However, as useful as the built-in versions of these operators are, it's not a good idea to overload them. In C++, operator overloading is "syntactic sugar"; we're just providing a more digestible syntax for a function call. For example, we could overload the && operator to accept two Thing arguments:



bool operator &&( const Thing &, const Thing & );

When we use the operator with infix notation, maintainers of our code will probably assume the short-circuiting behavior of the built-in operator, but they won't get it:



Thing &tf1(); 


Thing &tf2();


// . . .


if( tf1() && tf2() ) // . . .

This code is identical in meaning to a function call:



if( operator &&( tf1(), tf2() ) ) // . . .

As we've seen above, the functions tf1 and tf2 will both be called, and the order in which they're called is not fixed. This problem also occurs when overloading operator || and operator ,. Fortunately, operator ?: can't be overloaded.

[ Team LiB ]