Previous section   Next section

Imperfect C++ Practical Solutions for Real-Life Programming
By Matthew Wilson
Table of Contents
Chapter 1.  Enforcing Design: Constraints, Contracts, and Assertions


1.4. Assertions

I wouldn't call assertions a bona fide error-reporting mechanism, since they usually have profoundly different behavior in debug and release builds of the same software. Notwithstanding, the assertion is one of the most important of the C++ programmer's software quality assurance tools, particularly as it is widely used as the mechanism of enforcement of constraints and invariants. Any chapter on error-reporting mechanisms that did not include it would be decidedly imperfect.

Basically, an assertion is a runtime test that is usually only conducted in debug or testing builds, and tends to take the following form:



#ifdef NDEBUG


# define assert(x)  ((void)(0))


#elif /* ? NDEBUG */


extern "C" void assert_function(char const *expression);


# define assert(x)  ((!x) ? assert_function(#x) : ((void)0))


#endif /* NDEBUG */



It's used in client code to detect any conditions that you are sure should never happen:



class buffer


{


  . . .


  void method1()


  {


    assert((NULL != m_p) == (0 != m_size));


    . . .


  }


private:


  void    *m_p;


  size_t  m_size;


};



The assertion in this class reflects the class author's design assumption that if m_size is not 0, then m_p is not NULL, and vice versa.

When the condition of an assertion evaluates false, the assertion is said to "fire." This may mean that a message box is invoked if you're using a graphical environment, or it may mean that the program exits, or the process experiences a system-specific breakpoint exception.

However the assertion fires, it's nice to be able to display the text of the expression that failed, and, since they're primarily aimed at software developers, the file and line in which they occurred. Most assertion macros provide this ability:



#ifdef NDEBUG


# define assert(x)  ((void)(0))


#elif /* ? NDEBUG */


extern "C" void assert_function( char const *expression


                               , char const *file


                               , int        line);


# define assert(x)  ((!x) ?


                      ? assert_function(#x, _ _FILE_ _, _ _LINE_ _)


                      : ((void)0))


#endif /* NDEBUG */



Since the expression in an assertion is elided from release builds, it is very important that the expression have no side effects. Failure to adhere to this will lead to the curious and vexing situation whereby your debug builds work and your release builds do not.

1.4.1 Getting the Message

The nature of the action taken by the assertion can vary considerably. However, most assertion implementations make use of the stringized form of the expression. This is good as far as it goes, but it can leave the poor tester (who may be you) in a confused state, since all you'll get is some terse message like:



"assertion failed in file stuff.h, line 293: (NULL != m_p) == (0 != m_size));"



But we can take advantage of this simple mechanism to make our assertions that bit more meaningful. In the case where you might use an assertion in a switch case that you expect never to encounter, you can improve the message considerably by using a named 0 constant, as in:



switch(. . .)


{


  . . .


  case CantHappen:


   {


      const int AcmeApi_experienced_CantHappen_condition = 0;


      assert(AcmeApi_experienced_CantHappen_condition);


      . . .



Now when this assertion fires the message will be a lot more descriptive than



"assertion failed in file acmeapi.cpp, line 101: 0"



There's another way we can provide more information, and lose the unattractive underscores in the process. Because C/C++ can implicitly interpret pointers as Boolean (sub-)expressions (see section 15.3), we can rely on the fact that literal strings are non-zero to combine a readable message with the tested expression.



#define MESSAGE_ASSERT(m, e)  assert((m && e))



You'd use this as follows:




MESSAGE_ASSERT("Inconsistency in internal storage. Pointer should be null when size is 0,


 or non-null when size is non-0", (NULL != m_p) == (0 != m_size));



Now we get a much richer failure information. And since the string is part of the expression, it is elided in release builds. All that extra information is free.

1.4.2 Inappropriate Assertions

Assertions are useful for debug-build invariant checking. As long as you remember that, you won't go far wrong.

Alas, all too often we see the use of assertions in runtime error checking. The canonical example of this, which one would hope is first-year undergraduate programming gotcha material, is that of using it to check against memory allocation failures:



char *my_strdup(char const *s)


{


  char *s_copy = (char*)malloc(1 + strlen(s));


  assert(NULL != s_copy);


  return strcpy(s_copy, s);


}



You might think that no one would do such a thing. If so, you might want to get busy with grep inside some of your favorite libraries, wherein you'll find such checks against memory, file handling, and other runtime errors.

Unfortunately, there's a kind of halfway house of badness here, which is that plenty of folks tend to use the assert in addition to a correctly written handling of the failure condition:



char *my_strdup(char const *s)


{


  char *s_copy = (char*)malloc(1 + strlen(s));


  assert(NULL != s_copy);


  return (NULL == s_copy) ? NULL : strcpy(s_copy, s);


}



I really can't understand this one. Given that just about everybody develops software on desktop hardware, with virtual memory systems, the only way you're ever likely actually to experience a memory system during debugging is when you've plugged in a low-stock allocation mechanism or stipulated low-stock behavior to your runtime library's debugging APIs.

But the ramifications are more significant when used with other, more commonly firing, conditions. When used with file handling, for example, this practice simply teaches you to put your test files in the right place, rather than to bulletproof your error-response functionality. This virtually guarantees that the response in deployment will be inadequate.

If the problem is a runtime failure condition, why would you want to catch the failure in an assertion? Wouldn't you want it to experience a crash if you'd failed to code it correctly, thereby being representative of the release mode behavior? Even if you've got a super smart assert [Robb2003], it still sets a bad example for yourself and anyone reviewing your team to see such things.

In my opinion, applying assertions to runtime failure conditions, even if accompanied by subsequent release-time handling code, is at best a distraction, and at worst bad practice. Don't do it!

Recommendation: Use assertions to assert truths about the structure of the code, not about runtime behavior.


1.4.3 Syntax and 64-Bit Pointers

Another issue[8] regards the use of pointers in assertions. On environments where an int is 32 bits and a pointer is 64 bits, using plain pointers in assertions can, depending on the definition of the assert() macro, result in truncation warnings:[9]

[8] I know I'm throwing everything but the kitchen sink into this section, but I think it's all worth knowing.

[9] I've experienced this with Dec Alpha in the past, and I've seen newsgroup posts reporting similar experiences with other architectures.



void *p = . . .;


assert(p); // Warning: truncation



Of course, this is just grist to my syntax mill of section 17.2.1, and is in fact the experience that started my obsession with all the bothersome issues of Boolean expressions. The answer is to be explicit in your intentions:



void *p = . . .;


assert(NULL != p); // Peachy now



1.4.4 Avoid verify()

A while back, I was chatting with someone about the definition of their own assertion macro, which they intended to call verify() to avoid conflict with the standard library macro. Alas, there are two problems with this.

First, the VERIFY() macro is a well-known part of Microsoft's Foundation Classes (MFC). It's used for the same thing as assert(), but its condition is not elided; it is executed under all circumstances, as in:



#ifdef NDEBUG


# define verify(x)  ((void)(x)) /* x still "is" */


#elif /* ? NDEBUG */


# define verify(x)  assert(x)


#endif /* NDEBUG */



If one was to define a verify() macro with assert() behavior, people accustomed to established verify behavior would be quite perplexed when well-tested debug mode code failed miserably in release mode. It'd be quite a while before they realized that the verify macro was eliding the expression from release mode builds, but only a few minutes thereafter they'd be chasing you through the parking lot with blunt instruments.

The second problem is that the word "assert" is only used for assertions. This is because you can use grep and similar tools to search for your assertions with almost no ambiguity, and because it stands out to the programmer. Seeing it in code, they are immediately mindful that some invariant (see section 1.3.3) is being tested. To start using assertions with different names will merely muddy this relatively clear picture.

Although in the past I've defined my own share of verify() macros—with the semantics of MFC's VERIFY()—I now consider them dangerous. An assertion expression must have no side effects, and it's not too hard to train yourself to adhere to this: I think it's several years since I made that mistake. But if you use a mix of assertion macros, some of which must have side effects, and others that must not, it's just too easy to get confused, and too hard to form unbreakable habits. I no longer use any form of verify() macros, and I'd advise you to do the same.

1.4.5 Naming Your Assertions

Since the issue of naming was raised in the last section, let's address it now. As I mentioned, an assertion macro should contain within it the word "assert." I've seen and used _ASSERTE(), ASSERT(), ATLASSERT(), AuAssert(), stlsoft_assert(), SyAssert(), and many others.

The standard assert macro for C and C++ is called assert(). In the STLSoft libraries I have stlsoft_assert() and a couple of others, all of which are lowercase. In the Synesis libraries the assert is called SyAssert(). In my opinion, all of these are wrong.

By convention, macros are all uppercase, and this is a really good convention, since they stand out from functions and methods. Although it's perfectly possible to write assert() as a function:



// Assume C++ compilation only


#ifdef ACMELIB_ASSERT_IS_ACTIVE


extern "C" void assert(bool expression);


#else /* ? ACMELIB_ASSERT_IS_ACTIVE */


inline void assert(bool )


{}


#endif /* ACMELIB_ASSERT_IS_ACTIVE */



This is not seen, as it would provide few of the benefits of current assertion macros. First, the compiler would not be able to optimize out the assertion expression. Well, to be strict, it would be able to do so in many cases, but not with all things, even with the best optimizing compilers. Whatever the precise differences between compilers, and projects, in principle there could be large amounts of wasted code involved.

Another problem would be that some types would not be implicitly convertible to bool, or int, or whatever you chose as your expression type. Since the canonical assert() macro incorporates the given expression within the if or while statements or the conditional expression of the for statement or the conditional operator (?:), then all the usual implicit Boolean conversions (see section 13.4.2 and Chapter 24) come into play. This is quite different from passing such conditional expressions to a function taking bool or int.

The final reason is that it would not be possible to display the expression as part of the assertion's error message at run time, since the stringizing is part and parcel of the preprocessor, and not the C++ (or C) language.

Assertions are now, and probably always will be, macros, so they should be uppercase. Not only is that a consistent coding standard, but they're also easier to spot which makes life simpler all round.

1.4.6 Avoid #ifdef _DEBUG

In section 25.1.4 I mention that the default condition is left out of the switch statement for performance reasons and that an assertion is used in its place. One of my esteemed reviewers, who's got a much more distinguished history than l'il ole me, queried this and suggested that I should have gone for the simpler:



switch(type)


{


  . . .


#ifdef _DEBUG


  default:


    assert(0);


    break;


#endif // _DEBUG


}



This just goes to show how easy it is for us all, even the most experienced, to become victims of the assumptions of our own development environment. There are several small things wrong with this. First, assert(0) can lead to some pretty uninformative error messages, depending on the given compiler's assertion support. This can easily be souped up:



. . .


default:


  { const int unrecognized_switch_case = 0;


  assert(unrecognized_switch_case); }


. . .



but it's still unlikely to be more informative with most compilers than the original, more verbose, form:



assert( type == cstring || type == single ||


        type == concat || type == seed);



The main problem with the use of _DEBUG is that it may not be the definitive symbol instructing the compiler to generate an assertion. For a start, _DEBUG is, to the best of my knowledge, something that is only prevalent on PC compilers. For many compilers, debug builds are the default, and only the definition of the symbol NDEBUG causes the compilation to be release mode and assertions to be elided. Naturally, the correct way is to use a compiler-independent abstraction of the build mode, so you could get away with:



#ifdef ACMELIB_BUILD_IS_DEBUG


  default:


    assert(0);


    break;


#endif // ACMELIB_BUILD_IS_DEBUG



But even that's not the full picture. It's entirely reasonable to keep various subsets of debug functionality in the builds of prerelease versions of your product. You might use your own assertions, which may be active or inactive unrelated to the definition of _DEBUG, NDEBUG or even ACMELIB_BUILD_IS_DEBUG.

1.4.7 DebugAssert() vs int 3

Although this is a Win32 + Intel architecture specific point, it's worth noting because it's very useful and surprisingly little known. The Win32 API function DebugBreak() causes the execution of the calling process to fault with a breakpoint exception. This allows a standalone process to be debugged, or it causes the currently debugging process within your IDDE to halt, thereby allowing you to inspect the call stack or whatever other debugging delights take your fancy.

On the Intel architecture, the function simply executes the machine instruction int 3, which causes the breakpoint exception within the Intel processor.

The slight pain is that when control is given to your debugger, the execution point is inside DebugBreak(), rather than nicely with the code that caused the exception. The simple answer to this is to use inline assembler when compiling for the Intel architecture. The Visual C++ C run time library provides the _CrtDebugBreak() function as part of its debugging infrastructure, which is defined for the Intel architecture as:



#define _CrtDbgBreak() __asm { int 3 }



Using int 3 means that the debugger stops exactly where it's needed, on the offending line of code.

1.4.8 Static/Compile-Time Assertions

So far we've just looked at runtime assertions. But catching bugs at runtime is a poor second best to catching them at compile time. In many parts of the book we've mentioned static assertions, also called compile-time assertions, so now's a good time to look at them in detail.

Basically, a static assertion provides a compile-time validation of an expression. Needless to say, for it to be validated at compile time, it needs to be capable of being evaluated at compile time. This reduces the scope of expressions to which static assertions can be applied. For example, you might use a static assertion to ensure that your expectation of the sizes of int and long, for your compiler, are adhered to:



STATIC_ASSERT(sizeof(int) == sizeof(long));



but note that they cannot be used to evaluate runtime expressions:



. . . Thing::operator [](index_type n)


{


  STATIC_ASSERT(n <= size()); // Compiler error – for real!


  . . .



The firing of a static assertion is an inability to compile. Since static assertions are, like most modern features of C and C++, not a feature of the language, but rather a side effect of a language feature, the error messages can be anything but obvious. We'll see how weird they can be momentarily.

The usual mechanism for a static assertion is to define an array, using the truth of the expression as the array dimension. Since C and C++ accord a true expression the value 1, when converted to an integer, and a false expression the value 0, the expression may be used to either define an array of size 1 or 0. An array dimension of 0 is not legal C or C++, so the compiler will fail to compile. Consider an example:



#define STATIC_ASSERT(x)   int ar[x]


. . .


STATIC_ASSERT(sizeof(int) < sizeof(short));



An int is never smaller than a short (C++-98:3.9.1;2), so the expression sizeof(int) < sizeof(short) evaluates to 0. Hence, the STATIC_ASSERT() line evaluates to:



int ar[0];



which is not legal C or C++.

Clearly there are a couple of problems with this. The array ar is declared but not used, which will cause most compilers to give you a warning, and screw up your build.[10] Second, using STATIC_ASSERT() twice or more within the same scope will result in ar being multiply defined.

[10] You do set warnings to "high," and treat them as errors, don't you?

To obviate these concerns I define static assertions as follows:



#define STATIC_ASSERT(ex)  \


          do { typedef int ai[(ex) ? 1 : 0]; } while(0)



This works fine for most compilers. However, some compilers don't balk with an array of dimension 0, so there tends to be some conditional compilation to handle all cases:



#if defined(ACMELIB_COMPILER_IS_GCC) || \


    defined(ACMELIB_COMPILER_IS_INTEL)


# define STATIC_ASSERT(ex)  \


          do { typedef int ai[(ex) ? 1 : -1]; } while(0)


#else /* ? compiler */


# define STATIC_ASSERT(ex)  \


          do { typedef int ai[(ex) ? 1 : 0]; } while(0)


#endif /* compiler */



The invalid array dimension is not the only mechanism for providing static assertions. There are two other interesting mechanisms I'm aware of [Jagg1999], although I've not used either in anger.

The first relies on the requirement for each case clause to have a different value:



#define STATIC_ASSERT(ex) \


    switch(0) { case 0: case ex:; }



The second relies on the fact that bitfields must have a length of one or more:



#define STATIC_ASSERT(ex) \


    struct x { unsigned int v : ex; }



All three forms have similarly inscrutable error messages when they "fire." You'll see something like "case label value has already appeared in this switch" or "the size of an array must be greater than zero", so it can take a while to comprehend the problem when you're in a nest of templates.

In an attempt to ameliorate this confusion, Andrei Alexandrescu, in [Alex2001], describes a technique for providing better error messages, and gets as far as is probably possible within the current limitations of the language.[11]

[11] You should check it out. It's quite nifty.

For my part, I tend to shy away from that level of complexity for three reasons. First, I'm lazy, and like to avoid complexity where possible.[12] Second, I write a lot of C code as well as C++, and I prefer wherever possible to have the same facilities available to me with both languages.

[12] I also think it's good to have your supporting techniques as simple as possible, although I admit that there have been sojourns to the land of a trillion brain cells in several parts of the book, so I can't seriously claim that as a good reason in this case. It's just laziness.

Finally, static assertions fire as a result of a coding time misuse of a component. This means that they're both rare and within the purview of the programmer who caused them to fire. Therefore, I figure it's only going to cost a given developer a couple of minutes to track the error (though not, perhaps, the solution), and they'll suffer this cost rarely.

Before we finish this item, it's worth noting that both the invalid index and the bit field forms have the advantage that they may be used outside of functions, whereas the switch form (and runtime assertions) may not.

1.4.9 Assertions: Coda

This section has described the basics of assertions, but there are many more interesting things that assertions can do that are outside the scope of this book. Two quite different, but equally useful, techniques are SMART_ASSERT [Torj2003] and SUPER_ASSERT [Robb2003], and I'd advise you to read up on both of them.


      Previous section   Next section