Gotcha #27: Overuse of `#if`

`#if` for Debugging

How do we insert debugging code into our programs? Everyone knows we use the preprocessor:



void buggy() { 


#ifndef NDEBUG


   // some debugging code . . .


#endif


   // some actual code . . .


#ifndef NDEBUG


   // more debug code . . .


#endif


}

Everyone's wrong. Most veteran programmers have long and tedious stories about how the debug version of a program worked perfectly, but "simply" defining NDEBUG caused the production version to fail mysteriously.

Well, there's nothing mysterious about it. We're actually discussing two unrelated programs that happen to be generated from the same source files. You'd have to compile the same source code twice to see if it's even syntactically correct. The correct way to write the code is to dispense with the idea of a debugging version and write just a single program:



void buggy() { 


   if( debug ) {


       // some debugging code . . .


   }


   // some actual code . . .


   if( debug ) {


       // more debugging code . . .


   }


}

What about the problem of all the debug code remaining in the executable of the production version? Isn't that going to waste space? Aren't the unnecessary conditional branches going to cost time? Not if the debug code isn't present in the executable. Compilers are very, very good at identifying and removing unusable code. They're a whole lot better at this than we are with our pathetic #ifndefs. All we have to do is make things unambiguous:



const bool debug = false;

The expression debug is what the standard calls an integer constant-expression. Every C++ compiler must be able to evaluate constant-expressions like this at compile time to translate array bound expressions, case labels, and bitfield lengths. Every minimally competent compiler can perform elimination of unreachable code of the form



if( false ) { 


   // unreachable code . . .


}

Yes, even the compiler you've been complaining about to your management for the last five years can handle this. Even though the compiler removes the unreachable code, it must still perform a full parse and static semantic check. Given the definition of constant-expression in the standard, your compiler can even eliminate unreachable code guarded by more complex expressions:



if( debug && debuglvl > 5 && debugopts&debugmask ) { 


   // potentially unreachable code . . .


}

Your compiler may even perform the code elimination in more complex cases. For example, we might attempt to involve my favorite inline function in the conditional expression:



typedef unsigned short Bits; 


inline Bits repeated( Bits b, Bits m )


   { return b & m & (b & m)-1; }


// . . .


if( debug && repeated( debugopts, debugmask ) ) {


   // potentially unreachable code . . .


   error( "One option only" );


}

However, with the use of a function call (whether inline or not), the expression is no longer a constant-expression, we have no guarantee that the compiler will be able to evaluate it at compile time, and therefore the code elimination may not take place. If you require the code elimination, this approach is not portable. Some programmers who have been coding in C for too long may suggest the following fix:



#define repeated(b, m) ((b) & (m) & ((b) & (m))-1)

Don't do it. (See Gotcha #26.)

Note that it may be advisable to have some conditionally compiled code in an application, to be able to set the values of constants from the compile line:



const bool debug = 


#ifndef NDEBUG


   false


#else


   true


#endif


;

Even the presence of this minimal conditionally compiled code is not necessary, however. A generally better approach would be to select between debug and production versions in a makefile or similar facility.

Using `#if` for Portability

"However," you state with a knowing look, "my code is platform independent. I have to use #if to handle the different platform requirements." To prove your point, such as it is, you display the following code:



void operation() { 


   // some portable code . . .


#ifdef PLATFORM_A


   // do something . . .


   a(); b(); c();


#endif


#ifdef PLATFORM_B


   // do same thing . . .


   d(); e();


#endif


}

This code is not platform-independent. It's multiplatform dependent. Any change to any of the platforms requires not only a recompilation of the source but change to the source for all platforms. You've achieved maximal coupling among platforms: a remarkable achievement, if somewhat impractical.

But that's a minor annoyance compared to the real problem that lurks inside this implementation of operation. Functions are abstractions. The operation function is an abstraction of an operation that has different implementations on different platforms. When we use high-level languages, we can often use the same source code to implement the same abstraction for different platforms. For example, the expression a = b + c, where a, b, and c are ints, has to be rendered in different ways for different processors, but the meaning of the expression is sufficiently close across processors that we can (generally) use the same source code for all platforms. This isn't always the case, particularly when our operation must be defined in terms of operating-system or library-specific operations.

The implementation of operation indicates that the "same" thing is supposed to happen under both supported platforms, and this may even be the case initially. Under maintenance, however, bug reports tend to be reported and repaired on a platform-specific basis. Over a breathtakingly short period of time, the meaning of operation on different platforms will diverge, and you really will be maintaining totally different applications for each platform. Note that these different behaviors are different required behaviors, because users will come to depend on the platform-specific meanings of operation. A correct initial implementation of operation would have accessed platform dependent code through a platform-independent interface:



void operation() { 


   // some portable code . . .


   doSomething(); // portable interface . . .


}

In making the abstraction explicit, it's far more likely that, under maintenance, different platforms will remain in conformance with the meaning of the operation. The declaration of doSomething belongs in the platform-independent portion of the source. The various implementations of doSomething are defined in the various platform-dependent portions of the source (if doSomething is inline, then it will be defined in a platform-specific header file). Selection of platform is handled in the makefile. No #ifs. Note also that adding or removing a particular platform requires no source code changes.

What About Classes?

Like a function, a class is an abstraction. An abstraction has an implementation that can vary at either compile time or runtime, depending on its implementation. As with a function, use of #if for varying a class's implementation is fraught with peril:



class Doer { 


#      if ONSERVER


   ServerData x;


#      else


   ClientData x;


#      endif


   void doit();


   //  . . .


};


void Doer::doit() {


#      if ONSERVER


   // do server things . . .


#      else


   // do client things . . .


#     endif


}

Strictly speaking, this code is not illegal unless the Doer class is defined with the ONSERVER symbol both defined and undefined in different translation units. But sometimes it would be nice if it were illegal. It's common for different versions of Doer to be defined in different translation units and then linked together without error. The runtime errors that appear are unusually arcane and difficult to track down.

Fortunately, this technique for introducing bugs is not now as common as it once was. The most obvious way to express variation of this kind is to use polymorphism:



class Doer { // platform-independent 


 public:


   virtual ~Doer();


   virtual void doit() = 0;


};


class ServerDoer : public Doer { // platform-specific


   void doit();


   ServerData x;


};


class ClientDoer : public Doer { // platform-specific


   void doit();


   ClientData x;


};

Reality Check

We've looked at some fairly simple manifestations of attempts to make a single source represent different programs. From these simple examples, it looks like a straightforward task to reengineer the source code to be more maintainable through application of the idioms and patterns illustrated above.

Unfortunately, the reality is often far worse and far more complex. Typically, the source is not parameterized by a single symbol (like NDEBUG) but is subject to several symbols, each of which may take on a number of values; these symbols may also be used in combination. As we illustrated above, each combination of symbols and symbol values gives rise to an essentially different application with different required, abstract behaviors. From a practical standpoint, even if it's possible to tease apart the separate applications defined by these symbols, reengineering will unavoidably result in a change in behavior of the application on at least one platform.

However, such reengineering eventually becomes necessary when the abstract meaning of the program can no longer easily be determined and when many hundreds of compilations with different symbol settings are required simply to determine whether the source code is syntactically correct. It's far better to avoid the use of #if for versioning of source.

[ Team LiB ]

Gotcha #27: Overuse of #if

#if for Debugging

Using #if for Portability

What About Classes?

Reality Check

Gotcha #27: Overuse of `#if`

`#if` for Debugging

Using `#if` for Portability