Imperfect C++ Practical Solutions for Real-Life Programming By Matthew Wilson
	Table of Contents

	Chapter 34. Functors and Ranges

34.3. Local Functors

So far our concerns have been largely syntactic. But there's a second, and more significant, troubling aspect of using STL algorithms, which represent a more serious imperfection. When you have a suitable function or functor available, then you can get code that is very succinct, as well as being powerful:



std::for_each(c.begin(), c.end(), fn());

However, as soon as you need to do something more sophisticated or specific to the elements in a range, you have two choices, neither of which is particularly attractive. Either you unroll the loop and provide the functionality yourself, or you wrap up that functionality in a custom function or functor.

34.3.1 Unrolled Loops

The normal option is to unroll the algorithm and do it longhand, as in the following code from an early version of the Arturius compiler multiplexer (see Appendix C):

Listing 34.1.



void CoalesceOptions(. . .)


{


  . . .


  { OptsUsed_map_t::const_iterator  b = usedOptions.begin();


    OptsUsed_map_t::const_iterator  e = usedOptions.end();


  for(; b != e; ++b)


  {


    OptsUsed_map_t::value_type const &v = *b;


    if( !v.second &&


        v.first->bUseByDefault)


    {


      arcc_option option;


      option.name             = v.first->fullName;


      option.value            = v.first->defaultValue;


      option.bCompilerOption  = v.first->type == compiler;


      arguments.push_back(option);


    }


  }}


  . . .

Naturally, such things can get quite verbose; I could have grabbed another from the same file that was considerably longer.

34.3.2 Custom Functors

The alternative to unrolling loops is to write custom functors to provide the behavior you need. If the behavior is something that can be used in a variety of places that's great, but oftentimes you're writing a new class for just one, or a small number of cases.

Because it's a separate class, it will be physically separate from where it is being used, leading to the code being hard to comprehend and maintain. The best way you can handle this is to define the functor class in the compilation unit as the code that's going to be using it, preferably just before the function where it is used.

Listing 34.2.



struct argument_saver


{


public:


  argument_saver(ArgumentsList &args)


    : m_args(args)


  {}


  void operator ()(OptsUsed_map_t::value_type const &o) const


  {


    if( !o.second &&


        o.first->bUseByDefault)


    {


      arcc_option option;


      . . .


      m_args.push_back(option);


    }


  }


private:


  ArgumentsList &m_args;


};





void CoalesceOptions(. . .)


{


  . . .


  std::for_each( usedOptions.begin(), usedOptions.end()


               , argument_saver(arguments));


  . . .

But the domain specific code encapsulated within the functor is physically separate from the only place(s) where it is used, and meaningful, and this is not ideal. The separation reduces maintainability and, worse, encourages overeager engineers to try and reuse it or to refactor it.

34.3.3 Nested Functors

One thing that could be done to improve the problem of separation of specific functors from their point of its use might be to allow them to be defined within the function in which they're used. For example, we might want argument_saver to be defined within fn(), as in:



void CoalesceOptions(. . .)


{


  . . .


  struct argument_saver


  {


    . . .


  };


  std::for_each( usedOptions.begin(), usedOptions.end()


               , argument_saver(arguments));


  . . .

Alas, local functor classes are not legal in C++. This is because a template argument must refer to an entity with external linkage (C++-98: 14.3.2).^[4]

^[4] This applies to all templates, not just template algorithms and functors.

Imperfection: C++ does not support local functor classes (to be used with template algorithms).

Notwithstanding the illegality, some compilers do allow them: CodeWarrior, Digital Mars, and Watcom. Furthermore, Borland and Visual C++ can be tricked into supporting it by the simple rouse of encapsulating the next class within another nested class, as in

Listing 34.3.



void CoalesceOptions(. . .)


{


  . . .


  struct X


  {


    struct argument_saver


    {


      . . .


    };


  };





  std::for_each( usedOptions.begin(), usedOptions.end()


               , X::argument_saver(arguments));


  . . .


}

Table 34.1 summarizes the support for both forms for several popular compilers. Comeau, GCC, and Intel do not supported nested functions in either guise.^[5]

^[5] If those three tell you your code is wrong, the odds are good that you're doing something wrong.

Table 34.1.
Compiler
Local class
Nested local class
Borland
No
Yes
CodeWarrior
Yes
Yes
Comeau
No
No
Digital Mars
Yes
Yes
GCC
No
No
Intel
No
No
Visual C++
No
Yes
Watcom
Yes
Yes

If you're using only one or more of Borland, CodeWarrior, Digital Mars, Visual C++, and Watcom, then you might choose this approach. But it's not legal, and your portability will be compromised if you do so.

34.3.4 Legally Bland

The only legal way I know of to make such things work is turgid beyond bearing. Since the template's parameterizing type must be an external type, we define the function type outside the function. Given that we want to specialize the behavior in a local class, we must relate the internal and external classes. Since we cannot use templates, we can fall back on the old C++ workhorse: polymorphism. The local class argument_saver inherits from the external class argument_processor, and overrides its operator ()() const, as in:

Listing 34.4.



struct argument_processor


{


public:


  virtual void operator ()(OptsUsed_map_t::value_type const &o) const = 0


};





void CoalesceOptions(. . .)


{


  . . .


  struct argument_saver


    : argument_processor


  {


    virtual void


          operator ()(OptsUsed_map_t::value_type const &o) const


    {


      . . .


    }


  };

That doesn't seem so bad, I suppose. However, to use it in a template algorithm requires that the parameterization be done in terms of the external (parent) class. And since the parent is an abstract class, the functor must be passed as a (const) reference, rather than by value.

Further, std::for_each() takes the functor type as the second template parameter, so it is necessary to explicitly stipulate the iterator type as well. Thus, using the "convenient" for_each() isn't very succinct, or general, and it certainly isn't pretty:



for_each< OptsUsed_map_t::const_iterator


        , argument_processor const &>( &ari[0], &ari[10]


                                     , argument_saver());

You'd have to agree that manual enumeration would be preferable. There's one last gasp at making this prettier, which is to define a for_each() equivalent that takes the template parameters in the reverse order so that the function type can be deduced implicitly:



template< typename F


        , typename I


        >


inline F for_each_l(I first, I last, F fn)


{


  return std::for_each<I, F>(first, last, fn);


}

which brings us to the final barely chewable form:



for_each_l<argument_processor const &>( &ari[0], &ari[10]


                                      , argument_saver());

But for my money this is still nowhere near good enough. Imagine the poor maintenance programmer trying to follow this stuff!^[6]

^[6] There were a few weeks between starting this chapter and doing the final version. In that short time I forgot how this worked, and I wrote it!

34.3.5 Generalized Functors: Type Tunnelling

If we can't make functors more local, maybe we can improve the situation by making them more general? Let's look at an example where we can expand the generality of the is_large functor (see section 34.1). We can use this with a sequence whose value_type is, or may be implicitly converted to, char const*, such as glob_sequence. Unfortunately, it can only be used with such types. If we want to use the same function with a type that uses Unicode encoding and the wchar_t type, it won't work.

One answer to this is to make is_large a template, parameterizable via its character type, as in:



template <typename C>


  : public std::unary_function<C const *, bool>


struct is_large


{



  bool operator ()(C const *file) const;


};

Now this will work with sequences using either char or wchar_t (using the fictional Unicode globw_sequence), so long as we stipulate the appropriate instantiation:



glob_sequence  gs("/usr/include/", "impcpp*");


n = std::count_if(gs.begin(), gs.end(), is_large<char>());





globw_sequence gsw(L"/usr/include/", L"impcpp*");


n = std::count_if(gsw.begin(), gsw.end(), is_large<wchar_t>());

That's a lot more useful, but it's still not the full picture. Looking back to section 20.6.3, we also looked at another file system enumeration sequence readdir_sequence, whose value_type—struct dirent const*—is not implicitly convertible to char const*. The solution for the problem in that section was to use Access Shims (see section 20.6.1), and we can apply them here to the same end. However, it's a little more complex now, because we've got templates involved, as shown in Listing 34.5.

Listing 34.5.



template< typename C


        , typename A = C const *


        >


struct is_large


       : public std::unary_function<A, bool>


{


  template <typename S>


  bool operator ()(S const &file) const


  {


    return is_large_(c_str_ptr(file)); // apply c_str_ptr shim


  }


private:


  static bool is_large_(C const *file)


  {


    . . . // determines whether large or not


  }


};

The function-call operator—operator ()() const—is now a template member function, which attempts to convert whatever type is applied to it via the c_str_ptr() shim to C const*, which is then passed to the static implementation method is_large_(). Now we can use the functor with any type for which a suitable c_str_ptr() definition exists and is visible, hence:



readdir_sequence  rs("/usr/include/");


n = std::count_if(rs.begin(), rs.end(), is_large<char>());

I call this mechanism Type Tunneling.

Definition: Type Tunneling is a mechanism whereby two logically related but physically unrelated types can be made to interoperate via the use of Access Shims. The shim allows an external type to be tunneled through an interface and presented to the internal type in a recognized and compatible form.

I've used this mechanism to great effect throughout my work over the last few years. As well as facilitating the decoupled interoperation of a large spectrum of physically unrelated types via C-string forms, there is also the generalized manipulation of handles, pointers, and even synchronization objects. Type tunneling (and shims in general) goes to town on the principal of making the compiler one's batman. We saw another example of type tunneling in section 21.2, whereby virtually any COM-compatible type can be tunneled into the logging API through the combination of generic template constructors and InitialiseVariant() overloads, which combine to act as an access shim.

34.3.6 A Step too Far, Followed by a Sure Step Beyond

You may be wondering whether we can take this one further step, and remove the need to stipulate the character type. The answer is that we can, and with ease, as shown in Listing 34.6.

Listing 34.6.



struct is_large


  : public std::unary_function<. . ., bool>


{


  template <typename S>


  bool operator ()(S const &file) const


  {


    return is_large_(c_str_ptr(file));


  }


private:


  static bool is_large_(char const *file);


  static bool is_large_(wchar_t const *file);


};

It is now simpler to use, as in:



n = std::count_if(rs.begin(), rs.end(), is_large());


n = std::count_if(gs.begin(), gs.end(), is_large());


n = std::count_if(gsw.begin(), gsw.end(), is_large());

However, there's a good reasons why we don't do this. This functor is a predicate—a functor whose function-call operator returns a Boolean result reflecting some aspect of its argument(s). One important aspect of predicates is that they may be combined with adaptors [Muss2001], as in the following statement that counts the number of small files:



n = std::count_if( gs.begin(), gs.end()


                 , std::not1(is_large<char>()));

In order for adaptors to work with predicates, they must be able to elicit the member types argument_type and result_type from the predicate class. This is normally done by deriving from std::unary_operator. Now we can see why the final refinement shown in Listing 34.6 cannot be done. There's no way to specify the argument type, other than to define the predicate class as a template with a single template parameter to define the predicate. But this would have to be provided in every use as there's no sensible default, which would be onerous to use and confusing to read.

This is why the actual functor definition is a two-parameter template, where the first parameter C represents the character type and the second parameter A, which defaults to C const*, represents the argument_type of the predicate.



template< typename C


        , typename A = C const *


        >


struct is_large


       : public std::unary_function<A, bool>


{


  . . .

Now when we want to use this with an adaptor and a sequence whose value_type is not C const*, we do something like the following:



n = std::count_if( rs.begin(), rs.end() // rs: readdir_sequence


            , std::not1(is_large<char, struct dirent const*>()));

It doesn't have beauty that will stop a man's heart, but it's bearable considering the fact that the combination of sequence and adaptor that necessitates it is low incidence, and the advantages of the generality and consequent reuse are high. It also facilitates a high degree of generality, since we can write a template algorithm that would be perfectly compatible with any sequence, and maintain the type tunneling, as in:



template< typename C // character type


        , typename S // sequence type


        >


void do_stuff(. . .)


{


  S s = . . .;


  size_t n = std::count_if( s.begin(), s.end()


            , std::not1(is_large<C, typename S::value_type>()));


  . . .

Okay, before you think I've gone completely cracked, I confess that this is hardly something that gambols into the brain through the merest wisp of rapidly clearing mental fog. It's something that you definitely have to think about. But there are times when we simply have to have complex bits of code; check out some of your friendly neighborhood open-source C++ libraries if you don't believe me.

The point is that we have a mechanism for writing highly generalized—reusable, in other words—components, which are very digestible—in that they follow accepted idiomatic forms—in most cases where they are used. This generality is bought for the acceptable, in my opinion, cost of requiring the specification of the given sequence's value_type when used with adaptors.

34.3.7 Local Functors and Callback APIs

Just because local functors are not allowed for STL algorithms does not mean that they cannot find use in enumeration. In fact, when dealing with callback enumeration APIs, local classes are eminently solutions. Consider the implementation of the function FindChildById() (see Listing 34.7), which provides a deep-descendent equivalent to the well-known Win32 function GetdlgItem(). GetdlgItem() returns a handle to an immediate child window bearing the given id. FindChildById() provides the same functionality, but is able to locate the id in any descendent windows, not just immediate children.

Listing 34.7.



HWND FindChildById(HWND hwndParent, int id)


{


  if(::GetDlgCtrlID(hwndParent) == id)


  {


    return hwndParent; // Searching for self


  }


  else


  {


    struct ChildFind


    {


      ChildFind(HWND hwndParent, int id)


        : m_hwndChild(NULL)


        , m_id(id)


      {


        // Enumerate, passing "this" as identity structure


        ::EnumChildWindows( hwndParent,


                            FindProc,


                            reinterpret_cast<LPARAM>(this));


      }





      static BOOL CALLBACK FindProc(HWND hwnd, LPARAM lParam)


      {


        ChildFind &find = *reinterpret_cast<ChildFind*>(lParam);





        return (::GetDlgCtrlID(hwnd) == find.m_id)


                  ? (find.m_hwndChild = hwnd, FALSE)


                  : TRUE;


      }





      HWND      m_hwndChild;


      int const m_id;





    } find(hwndParent, id);





    return find.m_hwndChild;


  }


}

The class ChildFind is declared within the function, maximizing encapsulation. The instance find is passed the parent window handle and the id to find. The constructor records the id in the m_id member, and sets the search result member m_hwndChild to NULL. It then calls the Win32 callback enumeration function EnumChildWindows(), which takes the parent window handle, a callback function, and a caller-supplied parameter. The instance passes the static method FindProc() and itself as the parameter. FindProc() then responds to each callback by determining whether the desired id has been located, and, if so, it records the handle and terminates the search.

When the construction of find is complete, it will either contain the requested handle, or NULL, in the m_hwndChild member. In either case this is returned to the caller of FindChildById(). The entire enumeration has been carried out in the constructor of the local class, whose definition is not accessible to any outside context. FindChildById() perfectly encapsulates the ChildFind class.