Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 13. Fundamental Types

13.2. Fixed-Sized Integer Types

In most C and C++ code, the integer type used is int. There are good reasons for this. For one thing, in the very early days of C, the int was the only type. Of more than historical relevance is that int is usually the most efficient integer type to use, because it is defined to be the "natural size [of] the architecture" (C++-98: 3.9.1.2). It would be silly to use a 32-bit type on current 32-bit machines if, when your code was ported to a 64-bit architecture, it performed suboptimally. By using int, you protect your code against such issues in the face of the future.

However, the potential variation in size between different architectures can also cause problems if your code relies on a particular capacity. A variable providing unique keys that may hold 0-4294967295 on one environment could run into some trouble on another where it may hold only 0-65535. This is especially important with embedded software: hardware people need fixed-sized types. The answer is to provide fixed-sized type definitions that correspond to the appropriate fundamental type for the given target environment, using similar techniques as we saw for bytes.

Imperfection: C and C++ need fixed-sized integer types.

The C99 standard provides just such a set of types, called int8_t, int16_t, uint32_t, and so on albeit that they are typedefs rather than built-in types. (It also provides minimum-sized integers, e.g., int_least64_t, and fastest integers of a minimum size, e.g., sint_fastt16_t.) Most libraries that need to be portable employ similar techniques: ZLib defines uLong, Byte and so on; STLSoft has uint16_t, sint8_t and so on; Boost imports the C99 types into the boost namespace if available, otherwise it defines them itself.

This seems like the full picture, but unfortunately the C++ language's fascination for int, its favored firstborn type, can lead to problems. The problem relates to overloading and type resolution of functions overloaded on some, but not all, the integral types. (To be strictly correct, it's not always int: it is possible that this can occur for any of the fundamental integral types that share a size in a particular environment. It's just that the problems raise their ugly heads most often with int.)

13.2.1 Platform Independence

Let's imagine we're writing a component to provide serialization in a cross-platform communications architecture. We want to be able to write the values of variables at one end, and have them read at the destination in the same types in which they were written. Barring the minor complexities of byte-ordering—which we'll assume are taken care of inside our component using ntohl(), htonl() or similar functions—this is straightforward and meaningful for the sized types, because they always have the same size on whatever machine they may be received. However, it is not in the least bit reasonable to write an int to such a stream, since it could be 64 bits on the source and 32 bits on the receiver, resulting in probable loss of information.

Consider the Serializer class and some client code shown in Listing 13.2.

Listing 13.2.



class Serializer


{


  . . .


// Operations


public:


  void Write(int8_t i);


  void Write(uint8_t i);


  void Write(int16_t i);


  void Write(uint16_t i);


  void Write(int32_t i);


  void Write(uint32_t i);


  void Write(int64_t i);


  void Write(uint64_t i);


  . . .


};





void fn()


{


  Serializer  s    = . . .;


  int8_t      i8   = 0;


  int64_t     ui64 = 0;


  int         i    = 0;





  s.Write(si8);


  s.Write(ui64);


  s.Write(i);    // ERROR: Ambiguous call


  s.Write(0);    // ERROR: Ambiguous call


}

In environments where long is 32 bits, we may define 32-bit types in terms of long. In such environments where int is also 32 bits, long should still be used. (This is a deliberate decision, as we'll see.) In such an environment, the above definition of Serializer's overloaded Write() methods do not contain an entry matching either the variable i or the literal constant 0, both of which are of type int. The conversions from int to any of the eight specific (non-int) integrals are all equivalent in priority (see section 15.4.1), so the compiler balks with the ambiguity.

This is a sticky situation. In another environment, it may be the case that int is 32 bits and long is 64 bits, in which case there would be no ambiguity and the above code would compile correctly. One way to deal with this inconsistency is to restrict yourself to using the specific-sized types wherever you may have formerly used int (or short, or unsigned long, etc.). This can be (very) inconvenient, and it means you have to cast literals (which are ints or longs; see section 15.4.1), and you can't be certain that using short, int, long (, long long) with such overloaded functions will be portable.

Imperfection (reprise): C++ needs fixed sized types that are different to, and not implicitly interconvertible with, its built-in integral types.

There are a few solutions to this, as we'll see shortly.

13.2.2 Type-Specific Behavior

Consider another case, where we may want to convert integers to (C-style) strings (see Chapter 31), and cater for all the fundamental numeric integral types, including the nonstandard ones, such as long long^[4]/__int64. Because signed and unsigned integers will naturally convert differently (i.e., signed conversions will prefix a '-' if the value is less than 0), we need to provide two separate functions to cater for sign. We also need to provide 64-bit versions for (unsigned) long long, in addition to those for (unsigned) int (this assumes a 32-bit architecture, where int is 32 bits). So now we have explicitly catered for int, unsigned int, long long, and unsigned long long.

^[4] long long is part of C99, and will undoubtedly feature in the next C++ standard.



char const *integer_to_string(int );


char const *integer_to_string(unsigned int );


char const *integer_to_string(long long );


char const *integer_to_string(unsigned long long );

But there are four more numeric integral types: signed char, unsigned char, short, and unsigned short. Well, that's okay, isn't it? These types will undergo implicit, benign promotion (C++98: 4.5) to (unsigned) int, so they can reuse those two functions. Unfortunately, there are several problems here. By relying on integer promotion, we also allow bool, char, and wchar_t^[5] to be converted. What if we want bool to be converted to (in an English locale) "true" or "false" rather than "1" or "0"? What if it's not meaningful/appropriate for 'A' (whether char or wchar_t) to be converted to "65"?

^[5] Some older, nonstandard compilers don't define wchar_t as a native type, in which case libraries typedef it from another type, usually unsigned short. Naturally, this flies in the face of this and any other code that needs to discriminate between character and numeric types.

The answer here is to define explicitly only the conversion functions that we want, as follows (using C99 types):



char const *integer_to_string(int32_t );


char const *integer_to_string(uint32_t );


char const *integer_to_string(int64_t );


char const *integer_to_string(uint64_t );


char const *integer_to_string(bool ); // return "true" or "false"

and these inline ones:



inline char const *integer_to_string(int8_t i)


{


  return integer_to_string(static_cast<int32_t>(i));


}


char const *integer_to_string(uint8_t i)


{


  return integer_to_string(static_cast<uint32_t>(i));


}


char const *integer_to_string(int16_t i)


{


  return integer_to_string(static_cast<int32_t>(i));


}


char const *integer_to_string(uint16_t i)


{


  return integer_to_string(static_cast<uint32_t>(i));


}

Unfortunately, this will only work for compilers where int32_t is defined as a type other than int and that is not intrinsically treated as int; most compilers which use, say, __int8/16/32/64 in their definitions of the C99 fixed sized types treat them as equivalent to the corresponding standard types, rather than being a peer integer type of the same size (on 32-bit platforms, that is). Where the types are not distinguished, your unwanted conversions from char / wchar_t will proceed unhindered. There are two solutions here: the wicked and the long-winded.

The wicked one is to declare the char / wchar_t functions, but to have them return an incompatible type, as in:



void integer_to_string(char );





char ch = 'A';


puts(integer_to_string(ch)); // Compiler error

That works, but can you imagine the error message, as the poor old compiler tries to fulfil puts()'s requirement for a char const* with a void? It'll hardly be very useful. We can make this a bit more palatable by borrowing a neat compile-error naming technique from Andrei Alexandrescu [Alex2001], as follows:



struct wchar_t_cannot_convert_as_integer {};


wchar_t_cannot_convert_as_integer integer_to_string(wchar_t );





wchar_t ch = L'A';


puts(integer_to_string(ch)); // Error, with a hint 'twixt the noise.

Digital Mars gives the error "Error: need explicit cast for function parameter 1 to get from: wchar_t_cannot_convert_as_integer to: char const *."

It's still not really something one could be proud of, though, is it? The long-winded solution is to use a class to proscribe the unwanted conversions via its access control specifiers, as in:



class integer_convert


{


// Conversions


public:


  static char const *to_string(int8_t );


  . . .


  static char const *to_string(uint64_t );


  static char const *to_string(bool );


#if !defined(ACMELIB_INT_USED_FOR_FIXED_SIZED_TYPES)


  static char const *to_string(int );


  static char const *to_string(unsigned int );


#endif /* !ACMELIB_INT_USED_FOR_FIXED_SIZED_TYPES */


// Proscribed conversions


private:


  static char const *to_string(char );


  static char const *to_string(wchar_t );


};





int32_t i  = 0;


char    ch = 'A';





puts(integer_convert::to_string(i));  // Ok


puts(integer_convert::to_string(ch)); // Compiler error

The messages this time are a lot more on the ball, such as (Intel C/C++): "error #308: function "integer_convert::to_string(char)" is inaccessible." Note that because we've specifically hidden the char / wchar_t overloads, we can now safely cater for the (unsigned) int, via the preprocessor discrimination, in the case where they are not used in the definition of the C99 fixed-sized types for the given compiler.

The third option is to use True Typedefs, which we look at in detail in Chapter 18, where we will also revisit the Serializer component described in section 13.2.1. Despite being a little inconvenient to use, they represent a full solution to such problems.

13.2.3 Fixed-Sized Integer Types: Coda

Hopefully, now you can see why we should define sized integral types, where possible, in terms of types other than int. It is interesting to note that of the compilers to which I have access that provide cstdint / stdint.h, all but one—GCC—either define the C99 fixed-sized types in terms of proprietary fixed-sized types, or they use short and long and eschew int. Both approaches fall down if you do not have another type of the same size as (unsigned) int to use in their place, or if your compiler vendor uses them in its definition of the C99 types. In that case, you either live with the potential subtle errors, or you must use True Typedefs (see section 18.4).

For myself, I choose to either proscribe the (unsigned) int types, or to go for True Typedefs solution, since I prefer the hairshirt to these possibilities for implicit conversions: typing more is fine now, if that means I get to type less later.

But none of it's particularly pretty, is it? It's clear that there are at least two problems with the fundamental integer types: variable size between environments, which is pretty obvious, and the preference for integer conversion of literals, which is subtle, surprising, and, perhaps, the more troublesome. It's also clear that (at least the first) problem has wide recognition, as evidenced by the C99 move to introduce fixed-sized types. We've seen how it is better to avoid the int type in the definition of such types, but also how this may not be achievable in all environments. The answers, then, are:

Always use fixed-sized types if you are using the type in a sense that pertains to its capacity.
When overloading functions to support multiple integral types, overload for all the types you want, and none of the ones you don't.
Specify literals sparingly, and be prepared to cast them when dealing with overloaded functions. It is a concession to the problem that this is a real, albeit minor, pain.
Be aware that these steps are not a complete answer, so you must keep the issues in mind.
Compile your components on as many compilers as you can practically manage, and listen to all their warnings.
When you absolutely must have full control, use True Typedefs (section 18.4). (This is rare, and inconvenient, hence the stress on "must.")