Previous section   Next section

Imperfect C++ Practical Solutions for Real-Life Programming
By Matthew Wilson
Table of Contents
Chapter 19.  Casts


19.7. union_cast

In [Stro1997], Bjarne Stroustrup observes that it is possible to use a union to cast between unrelated types, even where such a conversion would not be supported by any of the C++ cast operators.

Listing 19.10.


template< typename TO


        , typename FROM


        >


union union_cast


{


  union_cast(FROM from)


    : m_from(from)


  {}


  operator TO () const


  {


    return m_to;


  }


private:


  FROM  m_from;


  TO    m_to;


};



Bjarne's message is that such things are evil hackery and mark their users as naïve and/or dangerously cavalier. I couldn't agree more . . . except that sometimes it's helpful.

The two specific problems Bjarne describes are size and alignment. On some architectures the size of a pointer and an int are different, so conversion could result in dangerous truncation. Also, several architectures require that pointers have specific alignments, and attempting to dereference a misaligned pointer will result in something nasty, for example, a hardware exception.



long    l   = 3; // Not a likely address for a string . . .


string  *ps = union_cast<string*, long>(l); // Eek!



Using union_cast offers no more guarantees than does reinterpret_cast, and in some aspects fewer, since it will perform some casts that reinterpret_cast will reject.

So if the picture's so bad, why are we even considering using union casts? There are two reasons, both eminently prosaic. First, as was noted in section 19.3, some conversions can require more than one cast, which leads to long-winded code that is hard to read and hard to maintain. Second, some compilers warn on the use of any casts, not just implicit casts and C-style casts, which can leave the adherent to the zero-warning philosophy tearing at his (or her!) beard.

There are widely used system architectures that simply force the use of casts. The most obvious one that springs to mind is the Windows API. Each message in the system is associated with two opaque data values of type WPARAM and LPARAM, which are, on Win32, uint32_t and sint32_t, respectively. These are used to pass all manner of things around, including system object handles, C-string pointers, and pointers to C++ objects. On Win32, therefore, it's useful to define constructs that help to enhance readability and maintainability, as well as to introduce as much type safety as is possible in such circumstances. Therefore, use of specific parameterizations of union_cast can be appropriate, as in:



typedef union_cast<LPARAM, wchar_t const*>  StrW2LPARAM;


typedef union_cast<HDROP, WPARAM>           WPARAM2HDROP;



Because union_cast is such a dangerous tool, we need to take some serious measures to curb its power before we can use it in good conscience. The first one is that it is never used in its raw form. If you were to grep my source control system, you would not find a single instance of union_cast in implementation code for any products or components. The only place it will be used is in the definition of specific typedefs in the header files of technology/operating system-specific libraries, such as the two shown above. Such typedefs may reasonably be presumed to have had more consideration than a single specific instantiation lurking deep within an implementation file.

The second measure used to minimize the risk of misuse is that there are several constraints built into the cast class, as can be seen in Listing 19.11. The first of these is that the conversion types are the same size. This removes the danger of truncation. The second constraint we would want is that the types must be POD, but since this is one of the things we use a union for, it's a done deal. For pedagogical purposes, the constraint constraint_must_be_pod() (see section 1.2.4) is applied to the two types, even though it is unnecessary: the constraint works by attempting to define a union containing the given type.

Listing 19.11.


template< typename TO


        , typename FROM


        >


union union_cast


{


  explicit union_cast(FROM from)


    : m_from(from)


  {


    // 1. Sizes must be the same


    STATIC_ASSERT(sizeof(FROM) == sizeof(TO));


    // 2. Both must be of POD type


    constraint_must_be_pod(FROM);


    constraint_must_be_pod(TO);


# if defined(ACMELIB_TEMPLATE_PARTIAL_SPECIALIZATION_SUPPORT)


    // 3. Both must be non-pointers, or must point to POD types


    typedef typename base_type_traits<FROM>::base_type


                                                 from_base_type;


    typedef typename base_type_traits<TO>::base_type


                                                 to_base_type;


    constraint_must_be_pod_or_void(from_base_type);


    constraint_must_be_pod_or_void(to_base_type);


# endif /* ACMELIB_TEMPLATE_PARTIAL_SPECIALIZATION_SUPPORT */


  }


  . . .


};



The third concern is that a union cast should not allow casts to or from pointers to class type, because this would facilitate nonchecked down or cross casts that should properly be dealt with by dynamic_cast. This is policed by applying another constraint to the base types of the cast's types. In other words, in addition to ensuring that the manipulated types are POD, we also ensure that any types that are pointers—pointers are also POD, remember (see Prologue)—only point to POD (or to void). They do so by determining the base type and applying the constraint_must_be_pod_or_void() (see section 1.2.4).

For compilers that support partial specialization, the base_type_traits template is able to deduce the base type from any type. It is a simple template with appropriate (partial) specializations, as follows:[7]

[7] The full implementation of this and other constraints used in the book are provided on the CD.

Listing 19.12.


template <typename T>


struct base_type_traits


{


  enum { is_pointer       =   0 };


  enum { is_reference     =   0 };


  enum { is_const         =   0 };


  enum { is_volatile      =   0 };


  typedef T  base_type;


  typedef T  cv_type;


};


. . .  // Various cv & ptr/ref specialisations


template <typename T>


struct base_type_traits<T const volatile *>


{


  enum { is_pointer       =   1 };


  enum { is_reference     =   0 };


  enum { is_const         =   1 };


  enum { is_volatile      =   1 };


  typedef T  base_type;


  typedef T const volatile  cv_type;


};



We've now dealt with most of the problems raised by the union_cast, and any potential misuses are handled at compile time. The last issue is that the cast can be used to produce a misaligned pointer (which reinterpret_cast can also do). Naturally this cannot be checked at compile time, but we can use the base_type_traits to help us here as well. In my implementation there's an assertion that is tested when the from-type is nonpointer and the to-type is pointer, which ensures that the from-value is aligned on the size of the to-type's base type. In other words, if the to-type is uint64_t (const)(volatile)*, then the from-value must be an integral of eight. You may choose to have an exception thrown in your own implementation.

With all four of these measures thrown in, it's arguable that union_cast is safer than reinterpret_cast, and the only cost in usability is that pointers to or from class types cannot be used. Since doing so represents the greatest danger to C++ programmers, I think this is a boon: the cast class handles (and validates) the benign conversions, leaving the scary ones to be done explicitly by the programmer.

Note that it does not prevent you casting from, say, char const* to wchar_t*, since it's the use of more than one cast we're trying to encapsulate. This represents a danger, hence my strict rule to only use it via typedefs, for example, WPARAM2HDROP, since creating and using such typedefs can be reasonably assumed to have been done in a thoughtful manner.[8]

[8] There's nothing much else that we can do to guard against potential Machiavellian behavior of programmers. At some point we have to rely on professionalism.


      Previous section   Next section