Previous section   Next section

Imperfect C++ Practical Solutions for Real-Life Programming
By Matthew Wilson
Table of Contents
Chapter 17.  Syntax


17.2. Conditional Expressions

17.2.1 Make It Boolean

C++ practitioners have inherited from C the propensity for expressing themselves tersely, sometimes extremely so. It's not uncommon to see such things as:



for(int i=23;i;—i)


  if(x) do{ . . . } while(j && p);



Of course there is a certain appeal in writing powerful code in as few lines/characters as possible, but in all but a very few cases [Dewh2003], the attraction is specious at best. There are three problems with this.

First, by relying on the compiler to convert any integer or pointer type to a Boolean expression, practitioners get in the mind-set that non-zero equates to true and zero equates to false for all types. This is correct for the vast majority of cases, but not for all cases. Needless to say, those minor cases cause many difficult-to-spot errors. Consider the Win32 type HANDLE: for most Win32 kernel objects a HANDLE value of 0 or NULL indicates a null handle. However, when dealing with file handles that is not the case. The symbol INVALID_HANDLE_VALUE (which has a value of 0xFFFFFFFF) represents a null handle; all other values represent valid handles. Thus the following code is entirely wrong:



HANDLE hfile = ::CreateFile( . . . );





if(hfile)


{


  . . .


}



The second problem is that using this syntax has inclined authors of user-defined types to provide certain implicit conversion operators (e.g., operator bool() const) in order to support this terse form. The classic horror in this regard is basic_ios::operator void *(), which returns a meaningless non-null pointer if the fail-bit has not been set on the stream. (We see a better way to represent operator bool() in Chapter 24.)

Imperfection: Implicitly interpreting non-Boolean (sub-)expressions[4] leads to failure to be mindful of the values of certain types, with a consequent propensity to overengineer user defined types.


[4] For the purposes of this discussion, I'll attempt to use fewer trees, so where I say expression, it also means subexpression.

The solution is to always make your expressions overtly Boolean. (Note that this does not mean they have to be explicitly coerced to Boolean type; that would be inefficient—see section 1.4.2). So our previous two examples would look like:



for(int i=23;i!=0;--i)


  if(x != 0) do{ . . . } while(j != 0 && p != NULL);



and



HANDLE hfile = ::CreateFile( . . . );





if(hfile != INVALID_HANDLE_VALUE)


{


  . . .


}



The customary exasperated sigh is followed by the cry of "it's more typing!" However, this fails to recognize the reality in software engineering that the vast majority of coding is in maintenance [Glas2003], so although it's more typing now, it's less in the long run. Frankly, such arguments are rarely heard from library writers or engineers that work on large projects with long lifetimes.

So that's accounted for simple types, but what about those user-defined types and their implicit operators? If we deprecate the use of implicit operators in implicit Boolean-interpreted expressions, we not only create more typing (now), which we know we can live with, but aren't we also reducing the flexibility of the code? For example, say we have some algorithms that work with "smart" and regular pointers. Our classes thankfully do not provide implicit conversion operators to their raw pointer equivalent (see Parts Four and Five for discussion of these issues), but they do provide operator bool() const and operator !() const. So if we dictate that our conditional algorithm expressions do not rely on implicit interpretation as Boolean for the raw pointers, and they cannot test against 0/NULL because such implicit pointer conversions are not provided for the smart pointers, we are left with the unpleasant choice of specializing our algorithms for pointer types and nonpointer types. Depending on the modernity of the compiler(s) we use, this will be two or more definitions. This seems like a convincing counterargument to my recommendation of explicit Boolean conditional expressions. The answer comes in the form of attribute shims (we meet the Shims concept in Chapter 20), such that the algorithm would contain shim-based tests, as in:



template <typename T>


size_t calc_something(T p)


{


  if(is_null(p))


  {


    return . . .;


  }


  return . . . ;


}



The is_null shim handles the differences in nature between the raw and smart pointer types automatically (by virtue of the appropriate shim definitions), the Boolean nature of the conditional is enforced by the shim's return type, and the algorithm is self-documenting because the shim does what it says—in this case it tests whether p "is null."

17.2.2 A Dangerous Assignment

This is such a simple item, but one that invites controversy. Many readers may have real problems with it, and accuse me of nannying; "I don't ever make that mistake!" . . . except that they do. We all do.

Consider the following code:



if ((options == (__WCLONE|__WALL)) && (current->uid = 0))


  retval = -EINVAL;



This code was a recent deliberate attempt [http://lwn.net/Articles/57135/] to introduce an exploitable flaw—a back door—into the Linux kernel code base.

If I'd also included several lines of surrounding code, and not given the game away by the portent in the section title, I'd bet that no more than 10 percent of you would have spotted the problem straight away. I'm not picking on you, because I'd be in that 90 percent group; I recently found one of these in some of my five-year-old code. It's just that human beings are pattern formers, and we often see what we expect to see.

The problem stems from the fact that C and C++ are able to interpret scalar expressions—integral, floating-point, and pointer types—as Boolean (C++-98: 6.4;4). Couple this with the ability to place assignment expressions in within conditional statements, and we have a recipe for disaster. All that it requires is an accidental—or deliberate—omission of the second = in the equality operator, and we end up with assignment. In the example, the author of the code has deliberately used the assignment operator rather than the equality operator to nullify the check of options. Usually these things happen by mistake, but it's an easy mistake to make, and it can be a killer.

Imperfection: The implicit interpretation of scalar types as Boolean in conditional statements facilitates erroneous assignment caused by syntax errors.


A simplistic answer to this would be to disallow assignment statements within expressions, as Python does. However, that really would be nannying; it'd never fly with the C/C++ community. But the language could be changed to enhance robustness, without losing any of the usefulness of assignment within conditional expressions. All that would be needed would be to ensure that when an assignment is in a conditional expression, the expression must be explicitly Boolean (as we discussed in the previous item). This would be a trifling imposition. After all, we're used to doing this in many cases anyway:



while((ch = getchar()) != EOF)


{


  . . .



Back in the world of the here and now, we have two options to avoid this problem. First is to set your compiler warnings high, and use more than one compiler on your code. Most good compilers will spot the blunder and warn you, but it's not a complete solution, because some do not, and some are not usable at their maximum warning level due to the volume of extraneous warnings in system and standard library headers.

Second, whenever either operand in the comparison is an rvalue—one which cannot be assigned to—it should be placed on the left hand side of the equality operator. Then mistakenly using = rather than == will result in a compiler error, on all compilers. Note that this is not always an option, because sometimes one will be comparing two lvalues.

If you're not persuaded by this, there's a much more compelling example. In Chapter 1 we looked at assertions and discussed the dangers of assert expression side effects. Consider the following code:



int i = . . .





assert(i = 1);



The assertion should have been written with ==, rather than with =, in which form it would have verified that i did indeed equal 1. Alas, in its current form the expression is an assignment. Since i = 1 evaluates to 1, the assertion will never fire, and the code will always appear to be correct. Because you have an assertion in there, you will have a false sense of security. Ouch! If it had been written assert(1=i);, the compiler would have balked immediately.

As well as being a complete solution for the erroneous assignment, it also makes expressions a lot more readable when testing the results of long function calls. Which one of the following is easier to read?



if(long_function( argument1, argument 2, argument 3, argument 4,


                  argument5, argument6) == RETURN_CODE_3)


{








if(RETURN_CODE_3 == long_function( argument1, argument 2, argument 3,


                                   argument 4, argument5, argument6))


{



Consider what it's like when the return code is off the right-hand side of the visible area of your editor.

It's simple, it's dull, and it's a bit ugly. It also can be argued to be contrary to the way people think—we naturally think "the number of apples I have is 5" rather than "5 is the number of apples I have"—but it takes surprisingly little time to become accustomed to it.

This is not a debate over how many spaces you indent by, or other such fluff. There is a simple mechanism to improve the robustness of your code. It works. Use it.[5]

[5] One reviewer (who's not exactly a fan of the language) suggested that this is an argument for using lint, and that since lint doesn't work with C++, this is an argument why one shouldn't use C++ and should stick to C. Though I understand his point, I disagree, and presumably you do too, or you wouldn't be reading this book!


      Previous section   Next section