Previous section   Next section

Imperfect C++ Practical Solutions for Real-Life Programming
By Matthew Wilson
Table of Contents
Chapter 20.  Shims


20.6. Composite Shim Concepts

Definition: Composite Shims

Composite shims are a combination of two or more fundamental shim concepts.

The names of composite shims do not have a fixed convention, but rather take a name indicative of their purpose.

Composite shims obey the most restrictive rule, or combination of rules, from their constituent shims.


We'll illustrate the composite shim concept with the first, and most well used, composite shim concept, the access shim.

20.6.1 Access Shims

Definition: Access Shims

Access shims are a combination of Attribute and Conversion shims, which are used to access the values of the instances of the types for which they are defined.

The values may have to be synthesized via conversion shims.

The values returned from access shims may be provided by intermediate temporary objects, so must only ever be used from within the expression containing the shim.


In order to illustrate the access shims concept, we'll need to revisit our current_directory_scope class. To answer the criticisms we had earlier, we now add another constructor with the following form:

Listing 20.5.


template<typename C, . . .>


class current_directory_scope


{


public:


  explicit current_directory_scope(C const *dir)


  {


    init_(c_str_ptr(dir));


  }


  template <typename S>


  explicit current_directory_scope(S const &dir)


  {


    init_(c_str_ptr(dir));


  }


private:


  void init_(C const *dir);


  . . .



Both constructors[10] use the c_str_ptr[11] shim to convert their argument type to C const*, which is passed to the init_() method, so that the class can be used with any string type for which the c_str_ptr shim is defined and accessible. Given the following c_str_ptr shims definitions:

[10] You might think that only the second, template, constructor should be required. In an ideal world that is so, but some compilers get confused. In practice, it is necessary to provide both for some compilers, and only the second one for others, by resorting to the preprocessor.

[11] The c_str_ptr shim was originally called just c_str, but it had to be changed since users found it too confusing, and it was difficult to search for it and/or the string class method of the same name. Hindsight suggests there could still be a better and yet similarly succinct name, perhaps c_string, but we're stuck with it now.

Listing 20.6.


inline char const *c_str_ptr(char const *s)


{


  return s;


}


inline wchar_t const *c_str_ptr(wchar_t const *s);


{


  return s;


}


template <typename T>


inline T const *c_str_ptr(std::basic_string<T> const &s)


{


  return s.c_str();


}


template <typename T>


inline T const


    *c_str_ptr(stlsoft::basic_frame_string<T> const &s)


{


  return s.c_str();


}



we are able to use the current_directory_scope class with any of the following types:



char const                        *dir1 = "/";


std::basic_string<char>           dir2("/");


stlsoft::basic_frame_string<char> dir3("/");





current_directory_scope<char>     scope1(dir1); // ok


current_directory_scope<char>     scope2(dir2); // ok


current_directory_scope<char>     scope2(dir3); // ok



20.6.2 Return Value Lifetime

The keen-eyed among you may have spotted that the four c_str_ptr shims we just looked at are all attribute shims. Why cannot c_str_ptr just be defined as an attribute shim? To understand this we need to look at a shim with a nontrivial implementation, whose proper use relies on adherence to the rule regarding return values.

One of the reasons that the standard library's String model (C++-98: 21.3) does not stipulate a conversion operator is to facilitate storage of the contained sequence without a null terminator. This is a well-known idea that can be found in a variety of libraries. One such is the Win32's Security API's LSA_UNICODE_STRING type, which is defined as follows:



typedef struct _LSA_UNICODE_STRING


{


  unsigned short  Length;


  unsigned short  MaximumLength;


  wchar_t         *Buffer;


} LSA_UNICODE_STRING;



The extent of the string described by the structure is defined not by a null-terminating character, but by the Length member. In fact Buffer may not contain a null terminator at all. Given this, how do we access a C-style string in a generic (i.e., via c_str_ptr) fashion?[12]

[12] We can't simply write a terminating null character into the buffer because we have no idea what other code might be doing with it. Even if we did not care about that, sometimes we'll have to operate with const instances. And even if we were reckless enough to const_cast that away, sometimes there will be no spare space to accommodate any more characters (i.e., Length == MaximumLength); that's the whole point of this type.

The answer lies in the use of a proxy class, an instance of which is returned by the c_str_ptr shim function for this type, as in:



inline c_str_ptr_LSA_UNICODE_STRING_proxy


    c_str_ptr(LSA_UNICODE_STRING const &s)


{


  return c_str_ptr_LSA_UNICODE_STRING_proxy(s);


}



To acquire a C-style string from this requires that the proxy class provides and instantiates an appropriate character buffer and, in order to support identical syntax to other c_str_ptr shims, implements an implicit conversion operator. A simplified definition of this class is shown in Listing 20.7.

Listing 20.7.


class c_str_ptr_LSA_UNICODE_STRING_proxy


{


public:


  typedef c_str_ptr_LSA_UNICODE_STRING_proxy  class_type;


public:


  explicit c_str_ptr_LSA_UNICODE_STRING_proxy(


                                    LSA_UNICODE_STRING const &s)


    : m_buffer(new WCHAR[1 + s.Length])


  {


    wcsncpy(m_buffer, s.Buffer, s.Length);


    m_buffer[s.Length] = L'\0';


  }


  ~c_str_ptr_LSA_UNICODE_STRING_proxy()


  {


    delete [] m_buffer;


  }


  operator LPCWSTR () const


  {


    return m_buffer;


  }


private:


  LPWSTR  m_buffer;


// Not to be implemented


private:


  void operator =(class_type const &rhs);


};



When c_str_ptr is applied to an instance of LSA_UNICODE_STRING, the string contents are copied into a null-terminated buffer. The shim's value is retrieved via the implicit conversion operator. In this way, this c_str_ptr and the proxy class work together to synthesize and provide access to a C-style string, which is the expected type obtained from the c_str_ptr shim.

Now we can see the reason for the restriction regarding the use of its return value. The proxy object exists only for the lifetime of the expression within which the shim is used. If the return value of the shim was to be saved and used outside of this expression, then undefined behavior would ensue.



LSA_UNICODE_STRING  lsa = . . .;


wchar_t const       *s  = c_str_ptr(lsa);


wputs(s);                                 // Danger, Will Robinson!



We look at the issue of Return Value Lifetime in detail in Chapter 31.

20.6.3 Generalized Type Manipulation

By now you hopefully see the power of shims, and how easy they usually are to implement. We're now able to work with any kind of string type in a consistent and generic fashion. This is extremely useful stuff. But it's possible to extend this a lot further. Let's look at a different scenario.

UNIX provides two common ways of inspecting file-system contents. The opendir() API provides functions to open a directory for enumeration via its readdir() function. This function returns a pointer to a dirent structure, which is required to provide a single member d_name, a character buffer containing the name of the current entry. The other API is based on the very powerful glob() function, which returns to its caller an array of pointers to entries that match a given search criteria.

UNIXSTL[13] provides wrapper sequence classes for these two APIs in the form of the readdir_sequence and glob_sequence classes. The value_types for them are, respectively, struct dirent const* and char const*. Writing code for one or the other, including functors and algorithms, is straightforward. But because of the difference in their value type, writing code that works with both can be a real pain.

[13] The STLSoft subproject that maps UNIX APIs to STL concepts.

Let's imagine that we want to write an algorithm—sub_dir_count—that can count the number of subdirectories of a given directory in the file system and also record their names in a container. It might look something like this:

Listing 20.8.


bool is_dir(char const *entry); // Deduces entry type





template< typename S


        , typename C


        >


size_t sub_dir_count(S const &s, C &c)


{


  typedef typename S::const_iterator  const_it_t;


  const_it_t  begin   =   s.begin();


  const_it_t  end     =   s.end();


  size_t      cDirs   =   0;


  for(; begin != end; ++begin)


  {


    if(is_dir(*begin))


    {


      c.push_back(*begin);


      ++cDirs;


    }


  }


  return cDirs;


}



It could be used like this:



void process_entry(string const &s);





findfile_sequence entries("/");


vector<string>    directories;





size_t cDirs = sub_dir_count(entries, directories);





printf("Number of dirs = %u\n", cDirs);


for_each(directories.begin(), directories.end(), process_entry);



This compiles without any complaint for glob_sequence because its value type is char const*. However, it does not compile for readdir_sequence. What can we do about it?

We could create a separate version of the algorithm for readdir_sequence, but this is Nightmare on Maintenance Street. Furthermore, it requires programming effort in linear proportion to the number of mutually incompatible types we require such an algorithm to support.

Wouldn't it be far nicer to centralize the generality and rewrite it only once? By using the c_str_ptr access shim, we can do just that and can create a version that will work for any type:

Listing 20.9.


template< typename S


        , typename C


        >


size_t sub_dir_count(S const &s, C &c)


{


  . . .


  for(; begin != end; ++begin)


  {


    if(is_dir(c_str_ptr(*begin)))


    {


      c.push_back(c_str_ptr(*begin));


      ++cDirs;


    }


  }


  . . .



It can now work for both the glob_sequence and for the readdir_sequence by virtue of the c_str_ptr shim for struct dirent.



inline char const *c_str_ptr(struct dirent const *d)


{


  return (NULL != d) ? d->d_name : "";


}



20.6.4 Efficiency Concerns

In the sub_dir_count function, there were actually two calls to the c_str_ptr shim. For types, such as struct dirent, where the shim acts as an attribute shim, the cost of obtaining the c-string pointer is very low, and will be optimized out on most compilers. However, where the shim acts as a conversion shim, as with LSA_UNICODE_STRING, there will be nontrivial amounts of processing. Although each one of these will be virtually equivalent to any manual string access, doing it additional times hardly represents efficient coding. What we can do is rewrite the algorithm.

Listing 20.10.


template< typename CH


        , typename C


        >


size_t record_if_dir(CH const *entry, C &c)


{


  return is_dir(entry) ? (c.push_back(entry), 1) : 0;


}


template< typename S


        , typename C


        >


size_t sub_dir_count(S const &s, C &c)


{


  . . .


  for(; begin != end; ++begin)


  {


    cDirs += record_if_dir(c_str_ptr(*begin), c);


    }


  }


  return cDirs;


}



Now there is only one possible shim, in the call to record_if_dir(), within which the C-string is used twice in the raw, with maximal efficiency.


      Previous section   Next section