Section 10.3. DSLs, Inside Out

10.3. DSLs, Inside Out

The original Make program contained a very weak programming language of its own, adequate only for the basic software construction jobs to which it was first applied. Since then, Make variants have extended that language, but they all remain somewhat crippled by their origins, and none approaches the expressivity of what we'd call a general-purpose language. Typical large-scale systems using Make dispatch some of the processing work to Perl scripts or other homebrew add-ons, resulting in a system that's often hard to understand and modify.

The designers of YACC, on the other hand, recognized that the challenge of providing a powerful language for expressing semantic actions was better left to other tools. In some sense, YACC's input language actually contains all the capability of whichever language you use to process its output. You're writing a compiler and you need a symbol table? Great, add #include <map> to your initial %{...%} block, and you can happily use the STL in your semantic actions. You're parsing XML and you want to send it to a SAX (Simple API for XML) interpreter on-the-fly? It's no problem, because the YACC input language embeds C/C++.

However, the YACC approach is not without its shortcomings. First of all, there is the cost of implementing and maintaining a new compiler: in this case, the YACC program itself. Also, a C++ programmer who doesn't already know YACC has to learn the new language's rules. In the case of YACC it mostly amounts to syntax, but in general there may be new rules for all sorts of thingsvariable binding, scoping, and name lookup, to name a few. If you want to see how bad it can get, consider all the different kinds of rules in C++. Without an additional investment in tools development, there are no pre-existing facilities for testing or debugging the programs written in the DSL at their own level of abstraction, so problems often have to be investigated at the low level of the target language, in machine-generated code.

Lastly, traditional DSLs impose serious constraints on language interoperability. YACC, for example, has little or no access to the structure of the C/C++ program fragments it processes. It simply finds nonquoted $ symbols (which are illegal in real C++) and replaces them with the names of corresponding C++ objectsa textual substitution. This simple approach works fine for YACC, because it doesn't need the ability to make deductions about such things as C++ types, values, or control flow. In a DSL where general-purpose language constructs themselves are part of the domain abstraction, trivial text manipulations usually don't cut the mustard.

These interoperability problems also prevent DSLs from working together. Imagine that you're unhappy with Make's syntax and limited built-in language, and you want to write a new high-level software construction language. It seems natural to use YACC to express the new language's grammar. Make is still quite useful for expressing and interpreting the low-level build system concepts (targets, dependencies, and build commands), so it would be equally natural to express the language's semantics using Make. YACC actions, however, are written in C or C++. The best we can do is to write C++ program fragments that write Makefiles, adding yet another compilation phase to the process: First YACC code is compiled into C++, then the C++ is compiled and executed to generate a Makefile, and finally Make is invoked to interpret it. Whew! It begins to look as though you'll need our high-level software construction language just to integrate the various phases involved in building and using the language itself!

One way to address all of these weaknesses is to turn the YACC approach inside out: Instead of embedding the general-purpose language in the DSL, embed the domain-specific language in a general-purpose host language. The idea of doing that in C++ may seem a little strange to you, since you're probably aware that C++ doesn't allow us to add arbitrary syntax extensions. How can we embed another language inside C++? Sure, we could write an interpreter in C++ and interpret programs at runtime, but that wouldn't solve the interoperability problems we've been hinting at.

Well, it's not that mysterious, and we hope you'll forgive us for making it seem like it is. After all, every "traditional" library targeting a particular well-defined domainbe it geometry, graphics, or matrix multiplicationcan be thought of as a little language: its interface defines the syntax, and its implementation, the semantics. There's a bit more to it, but that's the basic principle. We can already hear you asking, "If this is just about libraries, why have we wasted the whole chapter discussing YACC and Make?" Well, it's not just about libraries. Consider the following quote from "Domain-Specific Languages for Software Engineering" by Ian Heering and Marjan Mernick [Heer02]:

In combination with an application library, any general purpose programming language can act as a DSL, so why were DSLs developed in the first place? Simply because they can offer domain-specificity in better ways:

Appropriate or established domain-specific notations are usually beyond the limited user-definable operator notation offered by general purpose languages. A DSL offers domain-specific notations from the start. Their importance cannot be overestimated as they are directly related to the suitability for end user programming and, more generally, the programmer productivity improvement associated with the use of DSLs.
Appropriate domain-specific constructs and abstractions cannot always be mapped in a straightforward way on functions or objects that can be put in a library. This means a general purpose language using an application library can only express these constructs indirectly. Again, a DSL would incorporate domain-specific constructs from the start.

In short:

Definition

A true DSL incorporates domain-specific notation, constructs, and abstractions as fundamental design considerations. A domain-specific embedded language (DSEL) is simply a library that meets the same criteria.

This inside-out approach addresses many of the problems of translators like YACC and interpreters like Make. The job of designing, implementing, and maintaining the DSL itself is reduced to that of producing a library. However, implementation cost isn't the most important factor, since both DSLs and traditional library implementations are long-term investments that we hope will pay off over the many times the code is used. The real payoff lies in the complete elimination of the costs usually associated with crossing a language boundary.

The DSEL's core language rules are dictated by the host language, so the learning curve for an embedded language is considerably flatter than that of its standalone counterpart. All of the programmer's familiar tools for editing, testing, and debugging the host language can be applied to the DSEL. By definition, the host language compiler itself is also used, so extra translation phases are eliminated, dramatically reducing the complexity of software construction. Finally, while library interoperability presents occasional issues in any software system, when compared with the problems of composing ordinary DSLs, integrating multiple DSELs is almost effortless. A programmer can make seamless transitions between the general-purpose host language and any of several domain-specific embedded languages without giving it a second thought.