Previous section   Next section

Imperfect C++ Practical Solutions for Real-Life Programming
By Matthew Wilson
Table of Contents
Chapter 13.  Fundamental Types


13.4. Dangerous Types

We've commented on some of the dangers of integer truncation and sign extension. Now it's time to look at what other dangers the C/C++ type system has for us.

13.4.1 References and Temporaries

In Steve Dewhurst's C++ Gotchas [Dewh2003], Gotcha #44—"References and Temporaries"—highlights the danger of mixed use of built-in and typedef'd integer types, when it comes to references. (Actually, it doesn't really have anything to do with typedefs, just that it's more likely to slip under the programmer's radar when using them.) Consider the following code:



int main()


{


  long        l   = 2222;


  short const &s  = l;





  l = 0;


  printf("%ld, %d\n", l, s);


  return 0;


}



As Steve points out, because the types are different (perhaps because of the sizes), the compiler will synthesize a temporary whose lifetime will last at least as long as that of the reference (C++-98: 8.5.3), and copy in the value of its intended rvalue (l in this case). Thus, when l is subsequently set to 0 the temporary is unaffected. Hence, "0, 2222" is printed, rather than "0, 0". Steve states that this "change in meaning will occur [, and will do so] silently." Table 13.1 shows the responses to this code from several popular compilers. Only Borland flags the erroneous behavior by default, Comeau, CodeWarrior, Intel, Visual C++, and Watcom all require a higher than default warning level to flag the potential hazard. Digital Mars and GCC present an interesting case, since they actually provide the intended behavior. However, they are wrong, since the standard states (C++98: 8.5.3) that "a temporary...is created and initialised [, and] the reference is bound to the temporary." This is a good demonstration that no individual compiler can be treated as an ultimate source of language correctness.

Table 13.1. Compiler warnings for mixed-type references

Compiler

Warn at default warning level?

Required warning level

Output

Borland C/C++ 5.6

Yes

-

"0, 2222"

CodeWarrior 8.3

No

-warn implicit

"0, 2222"

Comeau 4.3.0.1

No

--remarks

"0, 2222"

Digital Mars C/C++ 8.34

No

-

"0, 0"

GCC 3.2

No

-

"0, 0"

HP 11.00 aCC 3.39

No

-

"0, 2222"

Intel C/C++ 7.0

No

-W4

"0, 2222"

Sun Solaris 2.7 Forte 6.0

No

-

"0, 2222"

Visual C++ 6.0

No

-W3

"0, 2222"

Visual C++ 7.0

No

-W3

"0, 2222"

Visual C++ 7.1

No

-W3

"0, 2222"

Watcom 12.0

No

-wx

"0, 2222"


The solution in this case is straightforward: don't do it. A corollary to making this abstinence solution workable is to (1) set your compiler warnings high, and (2) use multiple compilers. That's a combination of advice that you're going to hear from me many times throughout this book.

13.4.2 bool

This may be the single most unpopular item with C++ aficionados in this whole book. My position is that bool is a highly desirable type, and useful in the internal implementation of classes and functions but, as it is currently defined in the standard and implemented by compiler vendors, it is a useless type in the specification of functions and the public interfaces of classes. The size of bool type is "implementation defined" (see C++-98: 5.3.3), albeit all of our compilers (see Appendix A) implement it as a single byte (see Table 13.2).

Table 13.2. Compiler warnings for bool TRuncation

Compiler

sizeof(bool)

Warn at default warning level?

Required warning level

Borland C/C++ 5.6

1

No

-

CodeWarrior 8.3

1

No

-warn implicit

Digital Mars C/C++ 8.34

1

No

-

GCC 3.2

1

No

-

Intel C/C++ 7.0

1

No

-

Visual C++ 6.0

1

No

-W3

Visual C++ 7.0

1

No

-W3

Visual C++ 7.1

1

No

-W3

Watcom 12.0

1

No

-


Although conditional expressions are notionally converted to bool, inspection of the generated code of our compilers reveals that they do not do a conversion to their (1-byte) bool type first; they simply test the "truth" of the expressions as is. For example, in the expression if(p), where p is of pointer type, p is evaluated as not being equal to 0. On a 32-bit architecture, with 32-bit pointers and integers, it would clearly be inefficient to convert to a Boolean with logic such as ((p & 0xFFFFFFFF) != 0) ? true : false or similar. Regrettably, when one wishes to assign the truth of an expression to a variable of type bool just such a conversion must be performed. Users of CodeWarrior or Visual C++ will be used to receiving warnings (because you all set warnings to max, right?) regarding the inefficiency of this conversion, and perhaps avoiding precipitating them wherever possible. What is unnerving is that none of the other compilers I use warn about the performance loss at any warning level (see Table 13.2).

We might presume that the C++ standard leaves this decision to compiler implementers because it is prone to do so, which is, in the main, a well-founded strategy. One can further assume that compiler implementers choose to implement bool as a 1-byte type because of space-efficiency concerns.

The number of instances of bools in containers compared with those where it serves as return values or arguments to functions would be low, and if you care about space efficiency in the containment of Boolean values, you should check out Chuck Allison's bitset and bitstring [Alli1993, Alli1994] and their derivative classes.[7]

[7] You could use std::vector<bool>, but it's dishonestly named and tramples over the standard namespace, so I'd suggest you steer well clear of that one.

But the conversion issue does not pertain to correctness, only efficiency, and depending on how you write your code, it is one not encountered especially often anyway. The real problem with bool is its lack of predictable size, especially in terms of interacting with other languages.

Even if you don't ever implement your source in C, you'll no doubt use C as the exchange language between your dynamic libraries; if you don't, you learned some good reasons to do so in Part Two. C compatibility is something that, whether you like it or not, is here to stay in C++, and it is this that is the root of the bool problem. It is hard to conceive of a C/C++ compiler providing different sizes in C and C++ for types that are defined in both languages, so one can mix C and C++ with little concern for these types. But bool has been provided by some (C/C++) compilers for C++ for longer than it has for C (where bool is a #define for the C99 type _Bool), and for some is still not provided in C, so one must synthesize a typedef for it. There is a profusion of such things out there in the real world: BOOL, BOOLEAN, Boolean, boolean_t, Bool, and so on. Frequently these are defined as int, or as enums, and in many cases were done so long before bool entered the C++ vocabulary, when the choice of int was eminently sensible.

The problem is that if one defines data structures or API functions that use Boolean types for operation with both C and C++, it is all too easy to do something like:



#if defined(__cplusplus) || \


    defined(bool) /* for C compilation with C99 bool (macro) */


 typedef bool   bool_t;


#else


 typedef BOOL   bool_t;


#endif /* __cplusplus */



This is an accident waiting to happen, and I've seen this in the software of more than one client. I've also done this myself, once. A fair proportion of the APIs in the Synesis common libraries are implemented in good old C, and so the Synesis Bool type was a commonly used type. The problem was that in one function implemented in C the Bool return was efficiently derived from a called function that returned a 32-bit unsigned integer, which represented a path length. When path lengths stayed within the range 1 -> 255, everything worked swimmingly. However, the function eventually got recycled to work with URLs, and once a path length was longer than 255 and an exact multiple of 256, the whole thing came crashing down.

Because the offending part of that function was its return type, and symbol-naming schemes tend to ignore return types if they incorporate function signatures at all (see Chapter 7), there was no detection of the mismatch at compile/link time.

Naturally, I concede that such a mistake doesn't need a genius to spot, and I was seriously embarrassed to have made it. However, I'm someone who thinks about these things a lot, and that mistake had lain dormant in well-used code for several years before an error was precipitated! Once the error occurred it still took me the best part of two days of debugging before the lights went on.

There are two seemingly conflicting views here. On the one hand I am berating the lack of fixed size for a bool type, and on the other hand I am arguing for a speed-efficiency based size, which, for me, translates to wanting it to be the same size as int. I contend that if bool was defined by the standard to have the same size as int, the "natural size of the architecture," then both requirements would be satisfied in one go. Thus code using bool could rely on its having a predictable size for any given architecture (actually we must say operating environment, since some operating systems can be virtually implemented on others, sometimes of a different word size), which would only restrict its being run on different architectures. All the other fundamental types prevent any such cross-execution anyway, so this is a nonproblem.

Imperfection: bool should be the same size as int.


The solution to this imperfection is, as it so often is, abstinence. I never use bool for anything that can possibly be accessed across multiple link units—dynamic/static libraries, supplied object files—which basically means not in functions or classes that appear outside of header files. The practical answer, such as it is, is to use a pseudo-Boolean type, which is the size of int. The Synesis Software public library headers define a type Boolean that does just that, and the use of bool is restricted to automatic variables and exclusively C++-specific compilation units.


      Previous section   Next section