Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 15. Values

15.1. NULL—The Keyword That Wasn't

In the C programming language, the macro NULL, located in stddef.h, is used to represent a null pointer. By using this specific symbol to denote a null pointer, its meaning is obvious to the reader. Furthermore, because it is defined to be of type void*, it can help to avoid potential problems. Consider the following C API for generating and managing some kind of identified tokens:



struct Token *lookup(char const *tokenId);





void fn()


{


  struct Token *token = lookup(NULL);





  /* use token here */


}

lookup() will return a matching existing token, or will create a new one if tokenId is the null pointer. Because the tokenId parameter is of type char const* (to which void* is convertible in C), the author of the client code (who may or may not be the author of the Token API) passes NULL to create a new Token. Suppose that once the API reached a certain level of maturity, its author decided to speed up the system by changing the lookup mechanism to be based on an integer index.^[1] Naturally, the index would be a 0-based, with the sentinel value of –1 used to request a new token. Now we get a compile error (or at least a warning) if we use NULL:

^[1] In the ideal world, such breaking changes are never made; a new function or a new API is born. However, the real world is seldom ideal.



struct Token *lookup(int tokenId);





void fn()


{


  struct Token *token = lookup(NULL); // error





  /* use token here */


}

This is great news. Had the client code author used 0 instead of NULL, we could be heading for all kinds of trouble. The 0 would be happily converted into an int as it was to char const*, and fn would do who-knows-what as token receives an invalid or null pointer rather than a newly created Token instance.

In C++, the picture is very different. Ironically, it is C++'s tightening of the type system over C that causes it to actually lose type strength in such cases. Because in C a type void* can be implicitly converted to any other pointer type, it is feasible to define NULL as ((void*)0) and achieve interconvertibility with any pointer type. However, because C++, for good reasons, does not allow an implicit conversion from void* to any other pointer type without use of a cast (usually static_cast), it means that NULL can no longer be usefully defined as it is in C.

In C++, 0 may be converted to any pointer type, so the C++ standard (C++-98: 18.1;4) stipulates that "The macro NULL is an implementation-defined C++ null pointer constant. . . . Possible definitions include 0 and 0L, but not (void*)0". But 0 can, of course, also be converted to any integral type (including wchar_t and bool, and floating point types), which means that the type checking we saw in the C version would not happen if it were compiled in C++, and we'd be heading down Trouble Boulevard with not even a warning of our impending doom.

Though it does confer useful meaning to code maintainers,^[2] it is clear that using NULL instead of 0 in C++ is a bit like putting your head under the sheets to keep the monsters away; hence nearly everywhere you look the advice is to use 0 rather than the "arcane" NULL. But apart from not being lulled into a false sense of (pointer) security, this is not really much better, and you can end up in a similar mess because the literal 0 is always preferentially matched to an int. Imagine that you have a string class like the following:

^[2] Except where it is erroneously used for integers, in which case it is positively harmful



class String


{


  explicit String(char const *s);


};

Unlike the standard library string, you decide to make the constructor have valid semantics when passed a null pointer, so you expect to see expressions such as



String  s(0);

All well so far. However, you then decide to add a constructor to initialize the String's underlying storage based on an estimate of the number of characters that may be used.



class String


{


  explicit String(char const *s);


  explicit String(int cch, char chInit = '\0');


};

Without a change to your client code, or to the function it was originally calling, a recompilation will result in an entirely different constructor being called. That can't be good, surely? Perversely, if you change the type from int to size_t (or short, or long, or any built-in other than int), the compiler will no longer have a preference for either conversion, and you will get an ambiguity error. It seems like C++ has managed to get hobbled with respect to null pointers such that it is less safe than C.

Imperfection: C++ needs a null keyword, which can be assigned to, and equality compared with, any pointer type, and with no nonpointer types.

So what's to be done? In [Dewh2003] Steve Dewhurst states, "[T]here is no way to represent a null pointer directly in C++." He also says that use of NULL marks one as being "hopelessly démodé." Well, I always like a challenge. This being a book about how to get what you want from the language, and make the compiler your best friend, I do of course have a remedy.

We want a fully fledged null pointer keyword, with the semantics described in the imperfection above. The solution, unsurprisingly, relies on templates.

Alas, a good part of the solution was already spelled out, a good five years previously, in Scott Meyers's Effective C++, Second Edition [Meye1998], which I own, but had not read.^[3] I only came across it while checking on something else during the writing of this book.

^[3] I'd read a friend's copy of the first edition, and purchased it on the strength of that. I don't recall the NULL issue being in the first edition, but of course it may have been, making me a witless plagiarist, as well as an incompetent researcher.

Once the solution dawned on me—a member template conversion operator—I was almost there with the first attempt:



struct NULL_v


{


// Conversion


public:


    template <typename T>


    operator T *() const


    {


        return 0;


    }


};

Now we can write.



String    s(NULL_v());

Because NULL_v is a nontemplate class that has a member-template conversion operator, it can be applied anywhere, without any qualification or restriction. Now it is entirely in the purview of the compiler to resolve the correctness of its application, which is what we're after. (Note that it has to be given an operator T *() const, in order to force conversions only to pointer types. If it was just operator T() const it would convert to numeric types as well, and we'd have a very elegant mechanism for doing precisely nothing.)

So we have our "way to represent a null pointer...in C++". The picture is far from complete, though. There are two problems. First, while it is possible to write expressions such as



double *dp;





if(dp == NULL_v())


{}

converse expressions such as the following will fail



if(NULL_v() == dp)


{}

This is where the modest originality in my solution begins. The treatment so far is incomplete, since it only works in statements and on the right-hand side of expressions. We see in section 17.2 that in conditional expressions, we should prefer rvalues on the left-hand side in order to prevent erroneous assignment.

In fact, there is a surprising amount of equivocation between the compilers on which this was tested. (This, in and of itself, suggested more work was needed.) Some work fine with both expressions, some with neither.

To fully support these comparisons for the broad spectrum of compilers, we must expand the definition of NULL_v, to include the equals() method and four free functions. It also includes a pointer-to-member conversion operator.

Listing 15.1.



struct NULL_v


{


// Construction


public:


  NULL_v()


  {}


// Conversion


public:


  template <typename T>


  operator T *() const


  {


    return 0;


  }


  template <typename T2, typename C>


  operator T2 C::*() const


  {


    return 0;


  }


  template <typename T>


  bool equals(T const &rhs) const


  {


    return rhs == 0;


  }


// Not to be implemented


private:


  void operator &() const;


  NULL_v(NULL_v const &);


  NULL_v &operator =(NULL_v const &);


};





template <typename T>


inline bool operator ==(NULL_v const &lhs, T const &rhs)


{


    return lhs.equals(rhs);


}


template <typename T>


inline bool operator ==(T const &lhs, NULL_v const &rhs)


{


    return rhs.equals(lhs);


}


template <typename T>


inline bool operator !=(NULL_v const &lhs, T const &rhs)


{


    return !lhs.equals(rhs);


}


template <typename T>


inline bool operator !=(T const &lhs, NULL_v const &rhs)


{


    return !rhs.equals(lhs);


}

The equals() method compares the rhs to 0, and is called in the two overloaded operator ==() free template functions. These two functions facilitate the two expressions that were troubling us. For completeness, two corresponding operator !=() functions are also provided.

NULL_v purports to represent the null pointer value, so the copy constructor and copy assignment operator are hidden: Since we would not assign NULL to NULL it makes no sense to allow NULL_v to be able to assign to itself. Now a default constructor is also required, since we are declaring, but not defining, the copy constructor. (Note that I've borrowed Scott's void operator &() const, since taking the address of something that is purely a value makes no sense. An excellent point!)

So we have our solution, NULL_v with operator T *() const and operator T2 C::*() const, and four (in-)equality free functions. operator T2 C::*() const handles pointers to members.



class D


{


public:


   void df0()


   {}


};





void (D::*pfn0)() = NULL_v();

There are no <, <=, and so forth operators, since that is meaningless for the null pointer. Now we just have to slot in NULL_v() everywhere we'd formerly used NULL. It's still not really a keyword, and unfortunately, human inertia and forgetfulness being what it is, this would likely be nothing but a nice theoretical treatment that gets put on the shelf. On cosmetic grounds alone we would never get this accepted by the C++ community: It's too many characters, and something that looks like a function call will always look inefficient, even if it's not. I know I certainly wouldn't use it!

But we're imperfect practitioners, and we are sworn to never say die. My original thought was to have a new preprocessor symbol, perhaps null. But defining a small preprocessor symbol for oneself that will be exposed in a header is fraught with danger. Without the power of a mega-large software corporation, one cannot hope to introduce anything as indistinguishable as null^[4] into the global preprocessor namespace. So what's to be done?

^[4] If I admit that I once did exactly this (with a different definition) before I'd taken off the training wheels, do you promise to not mention it until we get to the appendixes?

I don't like the idea of having a more unique macro such as null_k, and there's still no guarantee against its being defined by someone somewhere else. We need a preprocessor symbol that no one's ever going to redefine, but how can we guarantee that? Maybe it's staring us in the face.

By now you may have guessed, so I'll just lay it straight out. The STLSoft header file stlsoft_nulldef.h contains the following:^[5]

^[5] There's actually a bit more in there to ensure that #pragma message is not presented to compilers that do not understand it. Those that do include Borland, Digital Mars, Intel, and Visual C++.

Listing 15.2.

[View full width]




#include "stlsoft_null.h" // whence stlsoft::NULL_v


#include <stddef.h>





#ifndef NULL


# pragma message("NULL not defined. This is potentially dangerous. You are advised to


 include its defining header before stlsoft_nulldef.h")


#endif /* !NULL */





#ifdef __cplusplus


# ifdef NULL


#  undef NULL


# endif /* NULL */


# define NULL   stlsoft_ns_qual(NULL_v)()


#endif /* __cplusplus */

Seems kind of obvious now, doesn't it? We hijack NULL! We can safely do this because no one would ever redefine it, would they? Now we have a plug-and-play null pointer type-safety-enhancer that we can activate with nothing more than a single #include.

This file is never included in any other STLSoft header files, because that would be too arrogant an assumption. It also guards against the naivety in assuming people will respect the inviolability of NULL. But if you wish to use it, you simply include it somewhere high in your application header hierarchy, and you'll get all the null safety you can handle. I don't use it as a matter of course, but I do have it on the checklist of prerelease tests: "—a build with NULL++". Now the advice can be turned on its head, and you should prefer NULL to 0 for null pointers. It will mark you as très moderne.

But you may still be skeptical, and not buying the motivating example I gave. You may be fortunate to work in a development environment whose procedures ensure that APIs are never changed in the way I described. There're two more reasons to consider NULL. The first, eminently prosaic, one is that it aids the search and replace (to null) when porting algorithms from C++ to one of its (less object-oriented) cousins. The second one is that it makes a great sniffer dog for detecting shoddy implementations. If you don't believe me, try including stlsoft_nulldef.h at the head of your application inclusion hierarchy, and wait for the fireworks. I've run it over several popular libraries—no names, no pack-drill—and I can tell you there's a lot of dodgy code out there. If people are using NULL for integral types, it makes you question what other little oversights are creeping in.

Before I conclude I should address a point Scott Meyers makes in his treatment of the subject. He states that using such a NULL is of limited use because it is something that protects the writer of the calling code, rather than the called code. But I see this as being exactly what is called for, since it is changes to libraries that one needs to protect against in this case.

I would suggest that if library code also needs to be protected in this way, it simply declares short, long, char const * versions, and declares a private int version. Given our established preference (see section 13.1) for avoiding int and going with sized types, this fits nicely. Anyway, I don't think this is a problem for saving library code from client code, rather that it is the other way round. The users of libraries can now protect themselves against changes in the libraries' public interfaces.