Building Domain Specific Languages in C#
At the JAOO conference in Aarhus, Denmark this year, domain specific languages came up in virtually every conversation. Every keynote mentioned them, a lot of sessions discussed them (including a pre-conference workshop by Martin Fowler and myself), and you could hear “DSL” in most of the hallway conversations. Why this, and why now?
To hear some people, DSLs solve world hunger, cure cancer, and make software write itself. Perhaps this is a bit of an exaggeration. DSLs are really nothing more than an abstraction mechanism. The current interest lies in the dawning realization that some abstractions resist easy representation in modern languages like C#. For the last 20 years or so, ’developers have used objects as their primary abstraction mechanism. Objects work really well because it turns out that much of the world is hierarchical. But edge cases still pop up. For example, what about querying relational data in a way that fits the object paradigm nicely? Of course, LINQ provides an elegant solution to that problem. And it’s a DSL, one for building queries for structured data in a way that fits in nicely with C#.
A more formal definition appears later, but, for now, a working definition for a Domain Specific Language is a computer language limited to a very specific problem domain. In essence, a DSL is an abstraction mechanism that allows very concise representations of complex data. This article covers some definitions of what constitutes a DSL, what kinds of DSLs exist, and how to build a particular type of DSL known as a fluent interface. First, though, let me define some terms.
Does Starbucks Use a DSL?
DSLs use language as an abstraction mechanism the way that objects use hierarchy. One of the challenges when you talk about an abstraction mechanism as flexible as language lies with defining it. The thing that makes a DSL such a compelling abstraction is the parallel with a common human communication technique, jargon. Here’s a pop quiz. What “languages” are these?
- Venti, half-caf no foam latte with whip
- Scattered, smothered, and covered
- Just before the Tea Interval, the batsman was out LBW.
The first is easy: Starbucks. The second you probably only know if you’ve ever eaten at a Waffle House: it’s the way you order hash browns (the Waffle House hash brown language consists of eight keywords, all transitive verbs: scattered, smothered, covered, chunked, topped, diced, peppered, and capped). I hear something like the third example all the time because I have lots of colleagues who play cricket, but it makes no sense to me. And that’s really the point: people have created jargon as a short-hand way to convey lots of information. Consider the Waffle House example. Here’s an alternative way to order hash browns:
There is a plant called a potato, which is a tuber, a root plant. Take the potato, harvest it, wash it off, and chop it into little cubes. Put those in a pan with oil and fry them until they are just turning brown, and then drain the grease away and put them on a plate. OK, next I want cheese. There’s an animal called a cow...
Don’t ever try to order hash browns like this in a Waffle House because the person who’s next in line will kill you. All these examples represent jargon-common abbreviated ways that people talk. You could consider this a domain specific language (after all, it is a language specific to a domain), but doing so leads to the slippery slope where everything is a DSL. Thus, I’m going to lean on Martin Fowler’s definition of a DSL, extracted from the book on DSL patterns he’s writing.
A domain specific language is a limited form of computer language designed for a specific class of problems.
He adds another related definition to this:
Language-oriented programming is a general style of development which operates about the idea of building software around a set of domain specific languages.
With this definition in hand, I’ll limit the scope of DSLs as a software terminology to keep it inside reasonable bounds.
Why use language as an abstraction mechanism? It allows you to leverage one of the key features of jargon. Consider the elaborate Waffle House example above. You don’t talk like that to people because people already understand context, which is one of the important aspects of DSLs. When you think about writing code as speech, it is more like the context-free version above. Considering ordering coffee using C#:Latte coffee = new Latte();
coffee.Size = Size.VENTI;
coffee.Whip = true;
coffee.Decaf = DecafLimit.HALF;
coffee.Foam = false;
Compare that to the more human way of ordering in the example above. DSLs allow use to leverage implicit context in our code. Think about all the LINQ examples you’ve seen. One of the nicest things about it is the concise syntax, eliminating all the interfering noise of the actual underlying APIs that it calls.
Now let me offer some additional definitions before I delve into code examples. Two types of DSLs exist (again borrowing Martin’s terms): internal and external. Internal DSLs are little languages built on top of another underlying language. LINQ is a good example of an internal DSL because the LINQ syntax you use is legal C# syntax, but an extended (domain specific) syntax. An external DSL describes a language created with a lexer and parser, where you create your own syntax. SQL is a good example of an external DSL: someone had to write a grammar for SQL, and a way to interpret that grammar into some other executable code. Lexers and parsers make people flee in fear so I’m not going to talk about external DSLs here, but focus instead on the surprisingly rich environment of internal DSLs.
By: Neal Ford
Neal Ford is Software Architect and Meme Wrangler at ThoughtWorks, a global IT consultancy with an exclusive focus on end-to-end software development and delivery. He is also the designer and developer of applications, instructional materials, magazine articles, courseware, video/DVD presentations, and author and/or editor of five books spanning a variety of technologies. He focuses on designing and building of large-scale enterprise applications. He is also an internationally acclaimed speaker, speaking at over 100 developer conferences worldwide, delivering more than 600 talks. Check out his web site at http://www.nealford.com. He welcomes feedback and can be reached at firstname.lastname@example.org.
Domain specific languages have been around since Lisp, and abound in the Unix world of “little languages”. Recently, a convergence of research has brought them to the forefront of both language and API design.