Section 10.1.  A Little Language ...
Team LiB
Previous Section Next Section

10.1. A Little Language ...

By now you may be wondering, "What is a domain-specific language, anyway?" Let's start with an example (we'll get to the "embedded" part later).

Consider searching some text for the first occurrence of any hyphenated word, such as "domain-specific." If you've ever used regular expressions,[1] we're pretty sure you're not considering writing your own character-by-character search. In fact, we'd be a little surprised if you aren't thinking of using a regular expression like this one:

[1] For an introduction to regular expressions, you might want to take a half-hour break from this book and grab some fine manual on the topic, for instance Mastering Regular Expressions, 2nd Edition, by Jeffrey E. F. Friedl. If you'd like a little theoretical grounding, you might look at The Theory of Computation, by Bernard Moret. It also covers finite state machines, which we're going to discuss in the next chapter.



    \w+(-\w+)+



If you're not familiar with regular expressions, the incantation above may look rather cryptic, but if you are, you probably find it concise and expressive. The breakdown is as follows:

  • \w means "any character that can be part of a word"

  • + (positive closure) means "one or more repetitions"

  • - simply represents itself, the hyphen character

  • Parentheses group subexpressions as in arithmetic, so the final + modifies the whole subexpression -\w+

So the whole pattern matches any string of words separated by single hyphens.

The syntax of regular expressions was specifically designed to allow a short and effective representation of textual patterns. Once you've learned it, you have in your arsenal a little toola language, in fact, with its own alphabet, rules, and semantics. Regular expressions are so effective in their particular problem domain that learning to use them is well worth the effort, and we always think twice before abandoning them for an ad hoc solution. It shouldn't be hard to figure out where we are going hereregular expressions are a classic example of a domain-specific language, or DSL for short.

There are a couple of distinguishing properties here that allow us to characterize something as a DSL. First, of course, it has to be a language. Perhaps surprisingly, though, that property is easy to satisfyjust about anything that has the following features constitutes a formal language.

  1. An alphabet (a set of symbols).

  2. A well-defined set of rules saying how the alphabet may be used to build well-formed compositions.

  3. A well-defined subset of all well-formed compositions that are assigned specific meanings.

Note that the alphabet doesn't even have to be textual. Morse code and UML are well-known languages that use graphical alphabets. Both are not only examples of somewhat unusual yet perfectly valid formal languages, but also happen to be lovely DSLs.

Now, the domain-specific part of the language characteristic is more interesting, and gives DSLs their second distinguishing property.

Perhaps the simplest way to interpret "domain-specific" would be "anything that isn't general-purpose." Although admittedly that would make it easy to classify languages ("Is HMTL a general-purpose language? No? Then it's domain-specific!"), that interpretation fails to capture the properties of these little languages that make them so compelling. For instance, it is obvious that the language of regular expressions can't be called "general-purpose"in fact, you might have been justifiably reluctant to call it a language at all, at least until we presented our definition of the word. Still, regular expressions give us something beyond a lack of familiar programming constructs that makes them worthy of being called a DSL.

In particular, by using regular expressions, we trade general-purposeness for a significantly higher level of abstraction and expressiveness. The specialized alphabet and notations allow us to express pattern-matching at a level of abstraction that matches our mental model. The elements of regular expressionscharacters, repetitions, optionals, subpatterns, and so onall map directly onto concepts that we'd use if asked to describe a pattern in words.

Making it possible to write code in terms close to the abstractions of the problem domain is the characteristic property of, and motivation behind, all DSLs. In the best-case scenario, the abstractions in code are identical to those in the mental model: You simply use the language's domain-specific notation to write down a statement of the problem itself, and the language's semantics take care of generating a solution.

That may sound unrealistic, but in practice it's not as rare as you might think. When the FORTRAN programming language was created, it seemed to some people to herald the end of programming. The original IBM memo [IBM54] about the language said:

Since FORTRAN should virtually eliminate coding and debugging, it should be possible to solve problems for less than half the cost that would be required without such a system.

By the standards of the day, that was true: FORTRAN did "virtually" eliminate coding and debugging. Since the major problems of most programmers at the time were at the level of how to write correct loops and subroutine calls, programming in FORTRAN may have seemed to be nothing more than writing down a description of the problem. Clearly, the emergence of high-level general-purpose languages has raised the bar on what we consider "coding."

The most successful DSLs are often declarative languages, providing us with notations to describe what rather than how. As you will see further on, this declarative nature plays a significant role in their attractiveness and power.

    Team LiB
    Previous Section Next Section