Section 10.6. General-Purpose DSELs

One of the nicest features of DSELs is that we can apply them in the domain of general-purpose programming idioms. In other words, a DSEL can function as a kind of extension to the general-purpose host language. Although it may seem contradictory to use the terms "general-purpose" and "domain-specific" when discussing the same library, it makes sense when you consider the domain to be the specific programming idiom enabled by the DSEL.

10.6.1. Named Parameters

Named parameters is a feature of many languages that allows arguments to be passed by name rather than by position. We'd love to see it supported directly in C++. For example, in an imaginary C++ that supports named parameters, given the declaration:

Note that the role of each actual argument is now perfectly clear at the call site, and defaults can be used for any parameter without regard to its position or the other parameters being passed. A similar principle can of course be applied to template parameters. As you can imagine, named parameters really begin to pay off in interfaces that accept several independent arguments, each of which has a nontrivial default. Many such functions can be found in the algorithms of the Boost Graph library.

The Graph library's original named parameter DSL used a technique called "member function chaining" to aggregate parameter values into a single function argument, essentially forming a tuple of tagged values. The usage in our example would look something like:

Here, the expression slew(.799) would build a instance of class named_params<slew_tag, float, nil_t> having the empty class nil_t as its only base class, and containing the value .799 as a single data member. Then, its name member function would be called with "z" to produce an instance of:

having a copy of the instance just described as its only base, and containing a reference to "z" as its only data member. We could go into detail about how each tagged value can be extracted from such a structure, but at this point in the book we're sure your brain is already working that out for itself, so we leave it as an exercise. Instead, we'd like to focus on the chosen syntax of the DSL, and what's required to make it work.

If you think for a moment about it, you'll see that not only do we need a top-level function for each parameter name (to generate the initial named_params instance in a chain), but named_params must also contain a member function for each of the parameter names we might want to follow it with. After all, we might just as well have written:

Since the named parameter interface pays off best when there are many optional parameters, and because there will probably be some overlap in the parameter names used by various functions in a given library, we're going to end up with a lot of coupling in the design. There will be a single, central named_params definition used for all functions in the library that use named parameter interfaces. Adding a new parameter name to a function declared in one header will mean going back and modifying the definition of named_params, which in turn will cause the recompilation of every translation unit that uses our named parameter interface.

While writing this book, we reconsidered the interface used for named function parameter support. With a little experimentation we discovered that it's possible to provide the ideal syntax by using keyword objects with overloaded assignment operators:

Not only is this syntax nicer for users, but adding a new parameter name is easy for the writer of the library containing f, and it doesn't cause any coupling. We're not going to get into the implementation details of this named parameter library here; it's straightforward enough that we suggest you try implementing it yourself as an exercise.

Before moving on, we should also mention that it's possible to introduce similar support for named class template parameters [AS01a, AS01b], though we don't know of a way to create such nice syntax. The best usage we've been able to come up with looks like this:

10.6.2. Building Anonymous Functions

For another example of "library-based language extension," consider the problem of building function objects for STL algorithms. We looked briefly at runtime lambda expressions in Chapter 6. Many computer languages have incorporated features for generating function objects on-the-fly, the lack of which in C++ is often cited as a weakness. As of this writing, there have been no fewer than four major DSL efforts targeted at function object construction.

10.6.2.1 The Boost Bind Library

The simplest one of these, the Boost Bind library [Dimov02], is limited in scope to three features, a couple of which should be familiar to you from your experience with MPL's lambda expressions. To understand the analogy you'll need to know that, just as MPL has placeholder types that can be passed as template arguments, the Bind library has placeholder objects that can be passed as function arguments.

The first feature of Boost.Bind is partial function (object) application, that is, binding argument values to a function (object), yielding a new function object with fewer parameters. For example, to produce a function object that prepends "hello, " to a string, we could write:

Note that it's not very realistic to see the outer function argument ("world") right next to the bind invocation. In real code we'll usually pass the result of calling bind to some algorithm that will proceed to invoke it multiple times.

The second feature of Boost.Bind is function composition. For example, the following expression produces a function object that computes y = x(x- 0.5):

To us, it seems so natural that bind should operate this way that we have to think hard to imagine the alternative: If the inner bind expression were not given special treatment by the library, the function object it produces would be passed as the first argument to the std::multiplies<float> instance, causing an error.

Lastly, Boost.Bind allows us to invoke member functions with ordinary function call syntax. The basic ideathat member functionscan be seen as free functions accepting an initial class argument is supported by languages such as Dylan, but once again, not by native C++. This is more than an aesthetic concern, though: The different syntax for invoking free and member functions can be a serious problem for generic code that may need to work with both.

One of the most popular ways to use bind is to partially apply a member function to a class instance. For example, the following calls v.visit(x) on each element x in [first, last):

This limited use of partial application is so important in event-based software that Borland implemented a C++ language extension closures to support it directly in their compiler.

Before moving on, let's briefly compare the syntax of the bind expressions used above with what we'd get using the STL binders and composers:^[8]

We think there's a good argument that even the small amount of syntactic sugar provided by Boost.Bind begins to look like a domain-specific language by comparison.

10.6.2.2 The Boost Lambda Library

The Boost Lambda library, by Jaakko Järvi and Gary Powell, was the original inspiration for Boost.Bind, and for the design of MPL's compile time lambda expressions. The Lambda Library extends the basic facilities of Boost Bind with syntax so sweet that some of the examples we've covered become almost transparent. For example:

What's interesting about this code is that operator* doesn't multiply, and operator+ doesn't add or even concatenate! Instead, the operators construct function objects that can be called later. The result of "hello, " + _1 is a function object accepting one argumentcall it xand returning the result of "hello, " + x. If this is beginning to sound familiar, that's good: Function objects built on-the-fly are just another example of the "expression templates" idiom first introduced by Blitz++.

The goals of the Lambda library are much more ambitious than those of Boost.Bind. Even if you found it hard to see the syntax of Boost.Bind as a DSL, we think it's clear that Boost.Lambda syntax is a little language unto itself. Its features go way beyond support for operators by implementing control structures and even exception handling! Here are just a few examples.

Halve each element of a two-dimensional array.



    float a[5][10];


    int i;


    std::for_each(a, a+5,


      for_loop(var(i)=0, var(i)<10, ++var(i),


         _1[var(i)] /= 2


      )


    );

Print a sequence, replacing odd elements with periods.



    std::for_each(a.begin(), a.end(),


        if_then_else(_1 % 2 != 0,


             std::cout << _1


           , std::cout << constant('.')


        )


    );

Print "zero," "one" or "other: n" for each element n of v.



std::for_each(v.begin(), v.end(),


  (


    switch_statement(


      _1,


      case_statement<0>(std::cout << constant("zero")),


      case_statement<1>(std::cout << constant("one")),


      default_statement(std::cout << constant("other: ") << _1)


    ),


    std::cout << constant("\n")


  )


);

In the examples above, var and constant each wraps its argument in a special class template that prevents it from being evaluated greedily. For example, if we had written std::cout << "\n" in the last example, it would have been evaluated once, outside the for_each invocation. That's just how C++ works. The result of constant("\n"), however, is a nullary function object that returns "\n". The standard library doesn't provide a stream inserter (operator<<(ostream&, T)) for T, the type of that function object, but the Lambda library does provide an overloaded operator<< that works on T. Rather than performing stream insertion, the Lambda library's operator<< just produces another nullary function object: This one evaluates std::cout << "\n" when it's called.

The need for var and constant, and the need to use such functions as for_loop in place of C++'s built-in for, are compromises forced on us by the limitations of the C++ language. Still, the expressivity of Boost Lambda, combined with the fact that the function objects it builds are typically about as efficient as hand-coded functions, is impressive.

10.6.2.3 The Phoenix Library

Never satisfied, C++ library designers continue to search for more expressive ways to program. Before moving on to other domains, we'd like to touch on some of the innovations of two other functional programming libraries. The first is Phoenix, which was developed as part of the Boost.Spirit parser framework [Guz04], discussed later in this chapter. Besides adding some valuable new functionality, the authors of Phoenix invented new syntax for some of the same control structures supported by Boost.Lambda. For example, in Phoenix, the if_then_else example above might be written as follows (note that in Phoenix placeholders are called "arg1," "arg2",...):

The authors of the Boost Lambda library found this syntax so attractive that they have incorporated it as an alternative to if_then_else. As you can see, there is a great deal of cross-pollination between these designs.

10.6.2.4 The FC++ Library

FC++ [MS00b]short for "Functional C++"enables C++ programmers to use the idioms of hardcore functional programming languages like Haskell, including lazy sequences, partial function application, and higher-order polymorphic functions.^[9] These paradigms are so general-purpose, and so different from those most C++ programmers are used to, that using FC++ almost amounts to using a whole new programming language. We don't have space here to do justice to FC++, but we can present a few examples to give you a sense of it.

First, a look at FC++ lambda expressions. As in most traditional functional programming languages, but unlike C++ lambda expressions you've seen so far, FC++ supports the use of named parameters to improve readability in lambda expressions. For example:^[10]

Now, this is really mind-bending! The names Fun and X have both a meaning at the level of the C++ program, and a meaning in the program (function object) generated by the lambda expression. In fact, it's not very different from what Boost's Bind and Lambda libraries do with their placeholders. Placeholders implement a mapping from input argument positions to the position of arguments passed to the function being "bound." You could almost think of X as _1 and Fun as _2. All lambda(Fun,X)[ ... ] does is to add another layer of indirection that exchanges the positions represented by the placeholders.

FC++ doesn't stop with named lambda arguments, though. The next example shows a lambda expression with what are essentially named local constants:

The example above shows a few other features of the FC++ DSL. First, you can see partial application at work in the expression multiplies[2], which yields a unary function object that computes multiplies[2,x] for its argument x. Next, the % operator is overloaded to make the expression x %f% y equivalent to f[x,y], so any FC++ binary function object (e.g., plus) can act as a kind of "named infix operator."

The (domain-specific) language designers of FC++ made another interesting choice as well: They decided they didn't like the way that, in certain contexts, libraries like Boost.Lambda demand the use of constant(...) or variable(...) to prevent greedy evaluation of any expression that doesn't involve a placeholder. They reasoned that having to remember that only one of the two expressions below will work as expected is too error-prone:

Instead, they chose a simple rule: Function invocations using round parentheses are evaluated immediately, and those using square brackets are evaluated lazily:

As a result, the syntax used to delay evaluation is at once terser than what the Lambda and Phoenix libraries use, and more explicit.

It may seem odd to see %plus% used to name the good old infix + operator. In fact, it has some clear drawbacks, as we can see by comparing these two roughly equivalent expressions:

The first one is shorter, simpler, and for anyone working in a problem domain that normally uses operator notation, clearer. Within the context of the FC++ language design, though, there are good reasons to use plus instead of +. To understand them, we have to consider the kind of C++ entity that plus refers to. What will allow us to write both plus[2,X] and plus(2,x)? Not a function, or a function pointer, or an array. Only a class instance can support that: plus must be a global class instance in the FC++ library.

Now, recalling that FC++ is all about higher order functional programming, it becomes clear that + isn't a name for addition that can be used in all contexts. How do you pass + to a function? If you mean the + operator that adds two ints, well, you can't even name it. If you try to pass the address of operator+, and it's overloaded, your C++ compiler will ask you which one you mean. If you mean a particular templated operator+, once again, there's no way to pass a function template as a runtime function argument. Further recalling that FC++ supports higher-order polymorphic functions, it's easy to see that if we want to pass an entity that actually represents the abstract + operation, it has to be a class instance, something like

In fact, just about every special feature of FC++, from implicit partial application to explicit lazy notation, is only possible in C++ with function objects. To meet the goals of its designers, it was much more important for FC++ to use function objects than for mathematical expressions to use operator notation. The point of all this is not to say that one of these domain-specific languages is better than another, but to illustrate the wide range of syntactic and semantic choices available to you, the DSEL designer.