Experiment 08
The final post explains how to use an external domain specific language to load records after compile time!
Introduction
In the previous two posts we explored several ways to model a simple stock trading domain using syntax that is available (or extendable) within F#. This can be described as an internal domain specific language; the biggest limitation with this approach is that all data must be entered before the program is compiled - not very realistic if you want to trade stocks regularly. This post explores how to create an external domain specific language that can be used to load records after compile time.
Parser combinators
The most common way to load text data is a delimted text loader, but this only works if your input is structured (i.e. every column of data is aligned). For this post, I want to use a more powerful tool - parser combinators. Parser combinators implement a formal grammar over the input, which we need to interpret a domain specific language. But this post is not a full parser combinator tutorial. In fact, we will use the parse
computation expression from the FParsec library instead of the more tradional (and recommended) parser combinator symbols (e.g <|>
, .<<.
, … ). For a deeper dive, see these tutorials 1 2. To demonstrate the power of this approach, I will use a parser that can interpret the syntax from either data model 2a or 2b from post #2 in this series. Here are a few examples of valid inputs:
Parser combinator workflow overview
I find it easiest to follow this code by starting at the result and working backwards. Here is an overview of how the pieces fit together before we dig into the code (arrows are labeled with output type).
Step 3: runParserOnFile
We begin with the final step - to run the complete parser on our input file. Here is the code:
The code uses the a built-in FParsec
function called runParserOnFile
, which takes the following arguments:
- parser
(many parseTrade)
- initial user state
()
- input file
input.txt
- file encoding
System.Text.Encoding.ASCII
The parser argument combines two functions: the built-in FParsec
function many
3 and our custom parser parseTrade
, which we describe in the next section. If the file is parsed successfully then we can extract our list of trades from the ParseResult
type.
Step 2: parseTrade
The basic idea with this parser is that we are composing4 many simple parsers into a more complex parser that captures the grammar/rules of our domain specific language. Here is the finished parser:
The parse
computation expression does a lot for us here. Under the hood, it threads together the sequence of assignments so that consecutive parsers are linked. The basic idea is shown in the diagram below.
Each call parses a different part of the transaction. We will examine a few of these parsers more closely in the following section, but for now just try to understand how this works at a high level.
parseTrade
proceeds by parsing (or ignoring) the necessary values to create a Trade
type and return it5.
Step 1: individual token parsers
Each of snippet of code below is a parser that detects a specific token from our domain specific language. We will look at a few of these smaller parsers to highlight some fundamental parsing elements. The full code listing can be found here. The first snippet detects the token Buy or Sell.
do!
, let!
, pstring
, choice
1
2
3
4
5
6
7
8
9
10
11
let convertTransaction inputString =
match inputString with
| "Sell" -> Sell
| _ -> Buy
let parseTransaction =
parse {
do! spaces
let! buyOrSellString = choice [(pstring "Buy");(pstring "Sell")]
return (convertTransaction buyOrSellString)
}
Notice that parseTransaction
is itself another parse
computation expression. First I will describe the purpose of each line, then we will discuss syntax.
Line 08
Skip any whitespaceLine 09
Check if the input begins with “Buy” or “Sell”Line 10
If successful, convert the string to its corresponding type (e.g.Buy
type) and return
Now let’s revist any new syntax. The do
keyword in F# requires the following expression to return unit
. Similarly, the do!
notation on Line 08
is used in a computation expression when the following expression returns a “unit-like” value6.
Line 09
uses two new commands. pstring
creates a parser that succeeds if it encounters its argument (i.e. “Buy” or “Sell”) and fails otherwise. choice
composes these two parsers in a way such that it returns the value of the first successful parser. If both fail then the choice
parser fails.
The difference between let
and let!
is analogous to do
and do!
- let!
binds a name to an value that is within a computation expression context 7.
pint32
, pfloat
1
2
3
4
5
6
let parseNumShares =
parse {
do! spaces
let! numShares = pint32
return numShares
}
Stepping through this snippet:
Line 03
Skip any whitespaceLine 04
Read an integerLine 05
Return the integer
You will recognize most of the syntax here, with the exception of pint32
, which parses 1 or more digits as an integer. Although not shown here parsePrice
uses the related function pfloat
.
skipMany
, return!
1
2
3
4
5
let optionalIgnore str =
parse {
do! spaces
return! skipMany (pstring str)
}
This code snippet:
Line 03
Skips any whitespaceLine 04
Creates a parser for the function argumentstr
and skips it if found and returns without wrapping in a parser context
This function is used to create parsers for our placeholder types (i.e. SharesOf
and At
). skipMany
will apply the parser 0 or more times and throw away any tokens found. The careful reader will also notice the use of return!
instead of return
. The simple rule is use return
if you need to wrap a value in the context of the computation expression (i.e. a parser) and use return!
if the value already has the correct context8.
many1
, asciiUpper
1
2
3
4
5
6
7
8
9
10
11
12
13
14
let convertTicker inputString =
match inputString with
| "GOOGL" -> GOOGL
| "META" -> META
| _ -> MSFT
let parseTicker =
parse{
do! spaces
let! tickerCharList = (many1 asciiUpper)
let tickerString = tickerCharList |> List.map string |> List.reduce (+)
let ticker = convertTicker tickerString
return ticker
}
This code snippet:
Line 09
Skips any whitespaceLine 10
Creates a parser that accepts one or more capitalized characters in ‘A’ - ‘Z’Line 11
Converts the character list into a single stringLine 12
Maps the string to the correspondingStock
type
This parser demonstrates two more primitive functions - many1
and asciiUpper
. Combined they create a parser that accepts 1 or more upper case ASCII characters. The other notable feature about parseTicker
is that it mixes the use of let!
and let
. The let!
on Line 10
unwraps the parser context from the expresson on the right to bind ticketCharList
to a list of characters. The following two lines perform operations on regular F# types so they use the let
keyword.
preturn
1
2
3
4
5
6
let parsePortion =
parse {
do! spaces
let! portion = choice [(pstring "AllOrNone"); (pstring "Partial"); (preturn "AllOrNone")]
return portion = "AllOrNone"
}
Stepping through this snippet:
Line 03
Skip any whitespaceLine 04
Check if the input begins with “AllOrNone” or “Partial”; if neither return “AllOrNone”Line 05
Return true if previous assignment was “AllOrNone”
This parser is very similar to parseTransaction
but it demonstrates use of the preturn
primitive. preturn
always succeeds with the provided value; here I use it as a default value choice
parser by providing it as a final value (only used if all other choices fail).
At this point it is probably worthwhile to revist Step 2 for a better understanding of the composition of parseTrade
. The code listing for this series can be found here.
Conclusion
“A complex system that works is invariably found to have evolved from a simple system that worked.” - Gall’s law (John Gall)
My favorite three things about parser combinators are:
- There are many built-in primitive parsers
- Simple parsers are easy to create and test
- Complex parsers are easy to create by composing simple parsers
I hope this post has demonstrated a useful application of parser combinators. But it may not have been as successful in convincing you the value of external domain specific languages - that’s probably because I can’t honestly make a good argument for them. If your program reads input from another machine then it will certainly be of a structured form (e.g. JSON). If your program reads input from a human then I doubt a domain specific language is the most natural way for the human to input data. I can’t think of a realistic example where a domain specific language would be better than a graphical user interface.
On the other hand, I have benefited from using embedded domain modeling. Using natural notions about the world within my code has made it easier to write, reason about, and revisit. If I have piqued your interest in domain modeling, then I would recommend browsing the related topics below. I believe each topic has something different to offer (like the parable of the blind men and the elephant).
Beginner resources | Author | |
---|---|---|
TDD | Type Driven Development | Mark Seemann |
DDD-light | Domain Modeling Made Functional | Scott Wlascin |
APIs | How to Design a Good API & Why it Matters | Josh Bloch |
Advanced resources | Author | |
---|---|---|
DDD | Domain Driven Design | Eric Evans |
DSL | Domain Specific Languages | Martin Fowler |
MDD | Model-driven development: The good, the bad, and the ugly | Hailpern/Tarr |
LOP | Language Oriented Programming | Ward |
Footnotes
-
many
indicates that we expect to runparseTrade
zero or more times (depending on the number of lines in our input file). ↩ -
Instead of using the more traditional parser combinator functions (e.g.
<|>
,.>>.
, …) I elected to use a more familiar syntax with theparse
computation expression from FParsec. ↩ -
Within a computation expression,
return
performs an operation that is the opposite oflet!
- it wraps the value within a context specified by the computation expression. In this case the value has aTrade
type, soparseTrade
actually returns a value of typeParser<Trade, unit>
. ↩ -
The function
spaces
has typeParser<unit,'u>
, which is unit-like within this context. ↩ -
In this case the context is
ParserResult<T>
;let!
binds the typeT
. This pattern is commonly used with theasync {}
computation expression. ↩ -
Note that
parseTrade
calls this function withdo!
since it returns a unit-like parser. ↩