Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 14. Arrays and Pointers

14.2. Arrays Decay into Pointers

For reasons of efficiency, convenience, and, perhaps, historical accident [Lind1994], C and C++ have a somewhat blurred distinction between pointers and arrays. A pointer is a single storage location, whose value refers to some point in the addressable memory space. An array is a contiguous block of one or more instances of a particular type. A pointer of the same type as an array's element type may be set to point to any element in the array, and dereferencing the pointer yields the same value as would indexing the array. (Actually, there's a little too much flexibility in the interconvertibility between pointers and arrays, which leads us to another problem. See section 14.6.) Hence:



int ar[5];


int *p = ar;

ar is an array of int, which has five elements. p is a pointer to int. Since an array is convertible to a pointer (C++-98: 4.2), it is perfectly legal to assign ar to p, but what that actually means is that p points to the address of the first element of ar.



int *q = &ar[0];


assert(p == q);

It is also legal, and quite common, to apply the indexing operator to p, so the following two are semantically equivalent.



int v1 = ar[3];


int v2 = p[3];


assert(v1 == v2);

14.2.1 Subscript Operator Commutativity

The reason that a pointer can be indexed just as well as an array has to do with the way C and C++ interpret indexing expressions. A compiler interprets the expression ar[n], at compile time, to be *(ar + n) [Lind1994]. Since pointers can enter into arithmetic, p can substitute for ar to form *(p + n), and thus p[n] is valid. An interesting quirk [Dewh2003, Lind1994] is that the built-in subscript operator is commutative, and so both array and pointer subscript expressions can have the reverse form, as in n[ar] and n[p]. It's sometimes remarked [Lind1994] that this is nothing more than an amusing piece of arcana with which to confuse novices, or win an obfuscated C Code Contest, but it does have a practical benefit in one of the most modern of C++ activities, generic programming. Indeed, as mentioned in [Dewh2003], because it only occurs for the built-in subscript operator, it can be used to constrain a piece of code to only work with array/pointer types, and reject class types with an overloaded subscript operator, as in:



template <typename T>


void reject_subscript_operator(T const &t)


{


  sizeof(t[0]); // Compiler will balk here if T is not subscriptable


  sizeof(0[t]); // Compiler will balk here if T only has user-defined subscript


  operator


}





void reject_subscript_operator(void const * const)


{}





void reject_subscript_operator(void * )


{}

The overloads for void are there because it's illegal to dereference a void (const) pointer. These functions are used to reject a user-defined type with a subscript operator. Consider the code in Listing 14.1.

Listing 14.1.



struct Pointer


{


  operator short *() const;


};





struct Subscript


{


  int operator [](size_t offset) const;


};





  void        *pv = &pv;


  void const  *pcv = pv;


  int         ai[100];


  int         *pi = ai;


  Pointer     ptr;


  Subscript   subscr;





  reject_subscript_operator(pv);


  reject_subscript_operator(pcv);


  reject_subscript_operator(ai);


  reject_subscript_operator(pi);


  reject_subscript_operator(ptr);


  reject_subscript_operator(subscr); // This one fails to compile!

You're probably wondering why we'd want to detect, and reject, a user-defined type with a subscript operator. Well, it's always nice to discover new ways to detect and enforce characteristics (see Chapter 12); generic programming's here now, and it's not going away. Looking back at our definition of NUM_ELEMENTS(), we see that if we reverse the subscript operator we can reject its application to user-defined types, which the previous definition did not.



#define NUM_ELEMENTS(x)   (sizeof((x)) / sizeof(0[(x)]))





template <typename T>


struct vect


{


  T &operator [](size_t index);


  . . .


};





vect<int>  vi;


int        ai[NUM_ELEMENTS(vi)]; // Compiler rejects when using new


NUM_ELEMENTS

14.2.2 Preventing Decay

I don't wish to say that this decay from arrays into pointers is a full-fledged imperfection, as without it code would be a great deal less terse, and a lot of flexibility would be lost. Nonetheless, it can be irksome, as we note throughout this chapter and in Chapters 27 and 33.

For built-in arrays, there is nothing that can be done to prevent the assignment of an array to a pointer. When passing them to functions, one can declare the functions to take pointers or references to arrays, rather than a pointer, but there's still nothing preventing the code inside the function from converting them to a pointer.

You're probably wondering why you should care. Well, the very fact that arrays do decay into pointers has led to the ubiquitous use of the decay form p = ar, rather than the more precise p = &ar[0]. While this is convenient, it can have a negative effect on coding for genericity. If you define a class type that acts, in some respects, as an array, you would usually provide subscript operators, as in



class IntArray


{


  . . .


  int const &operator [](size_t offset) const;


  int       &operator [](size_t offset);


  . . .

This is, in general, preferable to providing an implicit conversion operator. (We talk about this more in Chapters 32 and 33.) Although this class now supports arraylike subscript syntax, it does not provide implicit conversion to a pointer. If you attempt to use such a type with code that uses the decay form it will fail.

There's a similar issue with random access iterators [Aust1999, Muss2001] provided by sequence containers. Since it is legal for iterators to be implemented as class types, you can get into scrapes by relying on decay form syntax. For example, the pod_vector container (see section 32.2.8) implements insertion operations via memmove(), as in:



memmove(&first[0], &last[0], . . .

If you assume, as I did myself when writing it, that the iterators are pointers, rather than types that act like pointers only to the degree prescribed by the STL Iterator concept [Aust1999, Muss2001], then you might write the following:



memmove(first, last, . . . // Iterators won't convert to void *

which does not compile when used with those standard libraries that define class-type iterators. Thus, it's worth your while to refrain from using the incorrect array-to-pointer syntax. Even when you don't run foul of such things, it's still a good way to remind yourself what's happening behind the syntax.