Experiment 08

24 minute read

The final post explains how to use an external domain specific language to load records after compile time!

Introduction

In the previous two posts we explored several ways to model a simple stock trading domain using syntax that is available (or extendable) within F#. This can be described as an internal domain specific language; the biggest limitation with this approach is that all data must be entered before the program is compiled - not very realistic if you want to trade stocks regularly. This post explores how to create an external domain specific language that can be used to load records after compile time.

Parser combinators

The most common way to load text data is a delimted text loader, but this only works if your input is structured (i.e. every column of data is aligned). For this post, I want to use a more powerful tool - parser combinators. Parser combinators implement a formal grammar over the input, which we need to interpret a domain specific language. But this post is not a full parser combinator tutorial. In fact, we will use the parse computation expression from the FParsec library instead of the more tradional (and recommended) parser combinator symbols (e.g <|>, .<<., … ). For a deeper dive, see these tutorials 1 2. To demonstrate the power of this approach, I will use a parser that can interpret the syntax from either data model 2a or 2b from post #2 in this series. Here are a few examples of valid inputs:

Buy 4 SharesOf MSFT At 258.32 AllOrNone
Sell 3 SharesOf META At 158.71 
Sell 6 GOOGL 106.08

Parser combinator workflow overview

I find it easiest to follow this code by starting at the result and working backwards. Here is an overview of how the pieces fit together before we dig into the code (arrows are labeled with output type).

Parser combinator program flow

Step 3: runParserOnFile

We begin with the final step - to run the complete parser on our input file. Here is the code:

let result: ParserResult<Trade list,unit> = runParserOnFile (many parseTrade) () "input.txt" System.Text.Encoding.ASCII

let trades: Trade list = 
  match result with
  | Success (x: Trade list,_,_) -> x
  | _ -> []

The code uses the a built-in FParsec function called runParserOnFile, which takes the following arguments:

  • parser (many parseTrade)
  • initial user state ()
  • input file input.txt
  • file encoding System.Text.Encoding.ASCII

The parser argument combines two functions: the built-in FParsec function many3 and our custom parser parseTrade, which we describe in the next section. If the file is parsed successfully then we can extract our list of trades from the ParseResult type.

Step 2: parseTrade

The basic idea with this parser is that we are composing4 many simple parsers into a more complex parser that captures the grammar/rules of our domain specific language. Here is the finished parser:

let parseTrade =
  parse {
    let! buyOrSell = parseTransaction
    let! numShares = parseNumShares         
    do! optionalIgnore "SharesOf"    
    let! ticker = parseTicker
    do! optionalIgnore "At"
    let! price = parsePrice
    let! allOrNone = parsePortion 

    return {buyOrSell = buyOrSell; numShares = numShares; ticker = ticker; price = price; allOrNone = allOrNone}
  }

The parse computation expression does a lot for us here. Under the hood, it threads together the sequence of assignments so that consecutive parsers are linked. The basic idea is shown in the diagram below.

Parse computation expression

Each call parses a different part of the transaction. We will examine a few of these parsers more closely in the following section, but for now just try to understand how this works at a high level.

parseTrade proceeds by parsing (or ignoring) the necessary values to create a Trade type and return it5.

Step 1: individual token parsers

Each of snippet of code below is a parser that detects a specific token from our domain specific language. We will look at a few of these smaller parsers to highlight some fundamental parsing elements. The full code listing can be found here. The first snippet detects the token Buy or Sell.

do!, let!, pstring, choice

1
2
3
4
5
6
7
8
9
10
11
let convertTransaction inputString = 
  match inputString with
  | "Sell" -> Sell
  | _ -> Buy

let parseTransaction = 
  parse {
    do! spaces
    let! buyOrSellString = choice [(pstring "Buy");(pstring "Sell")]
    return (convertTransaction buyOrSellString)
  } 

Notice that parseTransaction is itself another parse computation expression. First I will describe the purpose of each line, then we will discuss syntax.

  • Line 08 Skip any whitespace
  • Line 09 Check if the input begins with “Buy” or “Sell”
  • Line 10 If successful, convert the string to its corresponding type (e.g. Buy type) and return

Now let’s revist any new syntax. The do keyword in F# requires the following expression to return unit. Similarly, the do! notation on Line 08 is used in a computation expression when the following expression returns a “unit-like” value6.

Line 09 uses two new commands. pstring creates a parser that succeeds if it encounters its argument (i.e. “Buy” or “Sell”) and fails otherwise. choice composes these two parsers in a way such that it returns the value of the first successful parser. If both fail then the choice parser fails.

The difference between let and let! is analogous to do and do! - let! binds a name to an value that is within a computation expression context 7.

pint32, pfloat

1
2
3
4
5
6
let parseNumShares = 
  parse {
    do! spaces
    let! numShares = pint32
    return numShares
  } 

Stepping through this snippet:

  • Line 03 Skip any whitespace
  • Line 04 Read an integer
  • Line 05 Return the integer

You will recognize most of the syntax here, with the exception of pint32, which parses 1 or more digits as an integer. Although not shown here parsePrice uses the related function pfloat.

skipMany, return!

1
2
3
4
5
let optionalIgnore str = 
  parse {
    do! spaces
    return! skipMany (pstring str)    
  }

This code snippet:

  • Line 03 Skips any whitespace
  • Line 04 Creates a parser for the function argument str and skips it if found and returns without wrapping in a parser context

This function is used to create parsers for our placeholder types (i.e. SharesOf and At). skipMany will apply the parser 0 or more times and throw away any tokens found. The careful reader will also notice the use of return! instead of return. The simple rule is use return if you need to wrap a value in the context of the computation expression (i.e. a parser) and use return! if the value already has the correct context8.

many1, asciiUpper

1
2
3
4
5
6
7
8
9
10
11
12
13
14
let convertTicker inputString = 
  match inputString with
  | "GOOGL" -> GOOGL
  | "META" -> META
  | _ -> MSFT

let parseTicker = 
  parse{
    do! spaces
    let! tickerCharList = (many1 asciiUpper) 
    let tickerString =  tickerCharList |> List.map string |> List.reduce (+)
    let ticker = convertTicker tickerString
    return ticker
  } 

This code snippet:

  • Line 09 Skips any whitespace
  • Line 10 Creates a parser that accepts one or more capitalized characters in ‘A’ - ‘Z’
  • Line 11 Converts the character list into a single string
  • Line 12 Maps the string to the corresponding Stock type

This parser demonstrates two more primitive functions - many1 and asciiUpper. Combined they create a parser that accepts 1 or more upper case ASCII characters. The other notable feature about parseTicker is that it mixes the use of let! and let. The let! on Line 10 unwraps the parser context from the expresson on the right to bind ticketCharList to a list of characters. The following two lines perform operations on regular F# types so they use the let keyword.

preturn

1
2
3
4
5
6
let parsePortion = 
  parse {
    do! spaces
    let! portion = choice [(pstring "AllOrNone"); (pstring "Partial"); (preturn "AllOrNone")]
    return portion = "AllOrNone"
  } 

Stepping through this snippet:

  • Line 03 Skip any whitespace
  • Line 04 Check if the input begins with “AllOrNone” or “Partial”; if neither return “AllOrNone”
  • Line 05 Return true if previous assignment was “AllOrNone”

This parser is very similar to parseTransaction but it demonstrates use of the preturn primitive. preturn always succeeds with the provided value; here I use it as a default value choice parser by providing it as a final value (only used if all other choices fail).

At this point it is probably worthwhile to revist Step 2 for a better understanding of the composition of parseTrade. The code listing for this series can be found here.

Conclusion

“A complex system that works is invariably found to have evolved from a simple system that worked.” - Gall’s law (John Gall)

My favorite three things about parser combinators are:

  1. There are many built-in primitive parsers
  2. Simple parsers are easy to create and test
  3. Complex parsers are easy to create by composing simple parsers

I hope this post has demonstrated a useful application of parser combinators. But it may not have been as successful in convincing you the value of external domain specific languages - that’s probably because I can’t honestly make a good argument for them. If your program reads input from another machine then it will certainly be of a structured form (e.g. JSON). If your program reads input from a human then I doubt a domain specific language is the most natural way for the human to input data. I can’t think of a realistic example where a domain specific language would be better than a graphical user interface.

Stock App

On the other hand, I have benefited from using embedded domain modeling. Using natural notions about the world within my code has made it easier to write, reason about, and revisit. If I have piqued your interest in domain modeling, then I would recommend browsing the related topics below. I believe each topic has something different to offer (like the parable of the blind men and the elephant).

  Beginner resources Author
TDD Type Driven Development Mark Seemann
DDD-light Domain Modeling Made Functional Scott Wlascin
APIs How to Design a Good API & Why it Matters Josh Bloch
  Advanced resources Author
DDD Domain Driven Design Eric Evans
DSL Domain Specific Languages Martin Fowler
MDD Model-driven development: The good, the bad, and the ugly Hailpern/Tarr
LOP Language Oriented Programming Ward

Footnotes

  1. FParsec Tutorial 

  2. fsharpforfunandprofit  

  3. many indicates that we expect to run parseTrade zero or more times (depending on the number of lines in our input file). 

  4. Instead of using the more traditional parser combinator functions (e.g. <|>, .>>., …) I elected to use a more familiar syntax with the parse computation expression from FParsec. 

  5. Within a computation expression, return performs an operation that is the opposite of let! - it wraps the value within a context specified by the computation expression. In this case the value has a Trade type, so parseTrade actually returns a value of type Parser<Trade, unit>

  6. The function spaces has type Parser<unit,'u>, which is unit-like within this context. 

  7. In this case the context is ParserResult<T>; let! binds the type T. This pattern is commonly used with the async {} computation expression. 

  8. Note that parseTrade calls this function with do! since it returns a unit-like parser. 

Tags: , ,

Updated: