Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 14. Arrays and Pointers

14.1. Don't Repeat Yourself

In The Pragmatic Programmer, Andrew Hunt and David Thomas describe their DRY, or Don't Repeat Yourself, principle. In essence, this stipulates that you should only define anything once, since more than one definition will inevitably lead to inconsistencies when one definition is updated without the other(s). This is a basic code quality measure, and can be readily seen in the definition and manipulation of arrays:



char  ar[23];





strnset(ar, ' ', 23);

If we change the dimension of ar without making a corresponding exact change to the third argument in the call to strnset()then, the program would fail to fill the entire array, or possibly worse, overwrite objects other than the array. The usual advice in such situations is to declare a constant (via #define in C, or const in C++), from which both points are derived, and only a single change is necessitated,^[1] as in:

^[1] Actually, there are two if you are, like the example shows, providing C and C++ compatible forms, as is often the case in library headers. Notwithstanding this, the principle of one change is well founded.



#ifdef __cplusplus


 const size_t DIM_A = 23;


#else


# define DIM_A        (23)


#endif /* __cplusplus */





char  ar[DIM_A];





strnset(ar, ' ', DIM_A);

Now a change to the size of ar is effected by a change to DIM_A, which will be reflected in the argument passed to strnset(), and any other places where the constant is used. This is a good solution, but it is still fragile to erroneous changes in the code. Perhaps the maintainer wants to add space for a null terminator, and mistakenly changes the wrong line



char  ar[DIM_A + 1];





strnset(ar, ' ', DIM_A); // What about ar[DIM_A] ?

Naturally code reviews can pick this up, but as we all know, reviews are conducted far less regularly than they should be.^[2] It would be really nice if arrays in C++ had, as they do in other languages, a length property, as in:

^[2] In any case, if you let ten reviewers review the same piece of code, they will come with ten different sets of problems. So the problem is not just frequency, but also the "findability" of problems like these.



char  ar[DIM_A];





strnset(ar, ' ', ar.length);

C and C++ have the sizeof() operator, but this returns only the size in bytes. In this case we could use sizeof() because sizeof(char) == sizeof(byte) (see section 13.1). However, if the example used wchar_t and wcsnset(), then using sizeof() would result in only a half or a quarter (depending on the number of bytes in a wchar_t) of ar being acted upon by wcsnset(). What we need is an operator that will yield the number of elements in an array, rather than the number of bytes.

Imperfection: C and C++ do not provide a dimensionof() operator (for array types).

This is a well-trodden path, and the classic^[3] solution (that works for C and C++) takes a macro form, as with the NUM_ELEMENTS() macro,^[4] which is defined as:

^[3] Until recently many compilers were unable to provide the newer form, which we meet in section 14.3.

^[4] This is the only bit of code that survives from my postgraduate research days in the early nineties. Everything else has been jettisoned in embarrassment long ago. I'm given to understand that the Solaris kernel development uses pretty much the same thing.



#define NUM_ELEMENTS(x)   (sizeof((x)) / sizeof((x)[0]))

The number of elements is calculated by dividing the total number of bytes by the number of bytes in a single element. Our example becomes:



char  ar[DIM_A];





strnset(ar, ' ', NUM_ELEMENTS(ar));

This is a workable general solution, but it still has a flaw, which we deal with in section 14.3, after a brief detour.