Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson | |
Table of Contents | |
Chapter 20. Shims |
20.6. Composite Shim Concepts
We'll illustrate the composite shim concept with the first, and most well used, composite shim concept, the access shim. 20.6.1 Access Shims
In order to illustrate the access shims concept, we'll need to revisit our current_directory_scope class. To answer the criticisms we had earlier, we now add another constructor with the following form: Listing 20.5.template<typename C, . . .> class current_directory_scope { public: explicit current_directory_scope(C const *dir) { init_(c_str_ptr(dir)); } template <typename S> explicit current_directory_scope(S const &dir) { init_(c_str_ptr(dir)); } private: void init_(C const *dir); . . . Both constructors[10] use the c_str_ptr[11] shim to convert their argument type to C const*, which is passed to the init_() method, so that the class can be used with any string type for which the c_str_ptr shim is defined and accessible. Given the following c_str_ptr shims definitions:
Listing 20.6.inline char const *c_str_ptr(char const *s) { return s; } inline wchar_t const *c_str_ptr(wchar_t const *s); { return s; } template <typename T> inline T const *c_str_ptr(std::basic_string<T> const &s) { return s.c_str(); } template <typename T> inline T const *c_str_ptr(stlsoft::basic_frame_string<T> const &s) { return s.c_str(); } we are able to use the current_directory_scope class with any of the following types: char const *dir1 = "/"; std::basic_string<char> dir2("/"); stlsoft::basic_frame_string<char> dir3("/"); current_directory_scope<char> scope1(dir1); // ok current_directory_scope<char> scope2(dir2); // ok current_directory_scope<char> scope2(dir3); // ok 20.6.2 Return Value LifetimeThe keen-eyed among you may have spotted that the four c_str_ptr shims we just looked at are all attribute shims. Why cannot c_str_ptr just be defined as an attribute shim? To understand this we need to look at a shim with a nontrivial implementation, whose proper use relies on adherence to the rule regarding return values. One of the reasons that the standard library's String model (C++-98: 21.3) does not stipulate a conversion operator is to facilitate storage of the contained sequence without a null terminator. This is a well-known idea that can be found in a variety of libraries. One such is the Win32's Security API's LSA_UNICODE_STRING type, which is defined as follows: typedef struct _LSA_UNICODE_STRING { unsigned short Length; unsigned short MaximumLength; wchar_t *Buffer; } LSA_UNICODE_STRING; The extent of the string described by the structure is defined not by a null-terminating character, but by the Length member. In fact Buffer may not contain a null terminator at all. Given this, how do we access a C-style string in a generic (i.e., via c_str_ptr) fashion?[12]
The answer lies in the use of a proxy class, an instance of which is returned by the c_str_ptr shim function for this type, as in: inline c_str_ptr_LSA_UNICODE_STRING_proxy c_str_ptr(LSA_UNICODE_STRING const &s) { return c_str_ptr_LSA_UNICODE_STRING_proxy(s); } To acquire a C-style string from this requires that the proxy class provides and instantiates an appropriate character buffer and, in order to support identical syntax to other c_str_ptr shims, implements an implicit conversion operator. A simplified definition of this class is shown in Listing 20.7. Listing 20.7.class c_str_ptr_LSA_UNICODE_STRING_proxy { public: typedef c_str_ptr_LSA_UNICODE_STRING_proxy class_type; public: explicit c_str_ptr_LSA_UNICODE_STRING_proxy( LSA_UNICODE_STRING const &s) : m_buffer(new WCHAR[1 + s.Length]) { wcsncpy(m_buffer, s.Buffer, s.Length); m_buffer[s.Length] = L'\0'; } ~c_str_ptr_LSA_UNICODE_STRING_proxy() { delete [] m_buffer; } operator LPCWSTR () const { return m_buffer; } private: LPWSTR m_buffer; // Not to be implemented private: void operator =(class_type const &rhs); }; When c_str_ptr is applied to an instance of LSA_UNICODE_STRING, the string contents are copied into a null-terminated buffer. The shim's value is retrieved via the implicit conversion operator. In this way, this c_str_ptr and the proxy class work together to synthesize and provide access to a C-style string, which is the expected type obtained from the c_str_ptr shim. Now we can see the reason for the restriction regarding the use of its return value. The proxy object exists only for the lifetime of the expression within which the shim is used. If the return value of the shim was to be saved and used outside of this expression, then undefined behavior would ensue.
LSA_UNICODE_STRING lsa = . . .;
wchar_t const *s = c_str_ptr(lsa);
wputs(s); // Danger, Will Robinson!
We look at the issue of Return Value Lifetime in detail in Chapter 31. 20.6.3 Generalized Type ManipulationBy now you hopefully see the power of shims, and how easy they usually are to implement. We're now able to work with any kind of string type in a consistent and generic fashion. This is extremely useful stuff. But it's possible to extend this a lot further. Let's look at a different scenario. UNIX provides two common ways of inspecting file-system contents. The opendir() API provides functions to open a directory for enumeration via its readdir() function. This function returns a pointer to a dirent structure, which is required to provide a single member d_name, a character buffer containing the name of the current entry. The other API is based on the very powerful glob() function, which returns to its caller an array of pointers to entries that match a given search criteria. UNIXSTL[13] provides wrapper sequence classes for these two APIs in the form of the readdir_sequence and glob_sequence classes. The value_types for them are, respectively, struct dirent const* and char const*. Writing code for one or the other, including functors and algorithms, is straightforward. But because of the difference in their value type, writing code that works with both can be a real pain.
Let's imagine that we want to write an algorithm—sub_dir_count—that can count the number of subdirectories of a given directory in the file system and also record their names in a container. It might look something like this: Listing 20.8.bool is_dir(char const *entry); // Deduces entry type template< typename S , typename C > size_t sub_dir_count(S const &s, C &c) { typedef typename S::const_iterator const_it_t; const_it_t begin = s.begin(); const_it_t end = s.end(); size_t cDirs = 0; for(; begin != end; ++begin) { if(is_dir(*begin)) { c.push_back(*begin); ++cDirs; } } return cDirs; } It could be used like this: void process_entry(string const &s); findfile_sequence entries("/"); vector<string> directories; size_t cDirs = sub_dir_count(entries, directories); printf("Number of dirs = %u\n", cDirs); for_each(directories.begin(), directories.end(), process_entry); This compiles without any complaint for glob_sequence because its value type is char const*. However, it does not compile for readdir_sequence. What can we do about it? We could create a separate version of the algorithm for readdir_sequence, but this is Nightmare on Maintenance Street. Furthermore, it requires programming effort in linear proportion to the number of mutually incompatible types we require such an algorithm to support. Wouldn't it be far nicer to centralize the generality and rewrite it only once? By using the c_str_ptr access shim, we can do just that and can create a version that will work for any type: Listing 20.9.template< typename S , typename C > size_t sub_dir_count(S const &s, C &c) { . . . for(; begin != end; ++begin) { if(is_dir(c_str_ptr(*begin))) { c.push_back(c_str_ptr(*begin)); ++cDirs; } } . . . It can now work for both the glob_sequence and for the readdir_sequence by virtue of the c_str_ptr shim for struct dirent. inline char const *c_str_ptr(struct dirent const *d) { return (NULL != d) ? d->d_name : ""; } 20.6.4 Efficiency ConcernsIn the sub_dir_count function, there were actually two calls to the c_str_ptr shim. For types, such as struct dirent, where the shim acts as an attribute shim, the cost of obtaining the c-string pointer is very low, and will be optimized out on most compilers. However, where the shim acts as a conversion shim, as with LSA_UNICODE_STRING, there will be nontrivial amounts of processing. Although each one of these will be virtually equivalent to any manual string access, doing it additional times hardly represents efficient coding. What we can do is rewrite the algorithm. Listing 20.10.template< typename CH , typename C > size_t record_if_dir(CH const *entry, C &c) { return is_dir(entry) ? (c.push_back(entry), 1) : 0; } template< typename S , typename C > size_t sub_dir_count(S const &s, C &c) { . . . for(; begin != end; ++begin) { cDirs += record_if_dir(c_str_ptr(*begin), c); } } return cDirs; } Now there is only one possible shim, in the call to record_if_dir(), within which the C-string is used twice in the raw, with maximal efficiency. |