Slide 1

Slide 1 text

Understanding Parser Combinators @ScottWlaschin fsharpforfunandprofit.com/parser

Slide 2

Slide 2 text

let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let point = pchar '.' let e = pchar 'e' <|> pchar 'E' let optPlusMinus = opt (pchar '-' <|> pchar '+') let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest let intPart = zero <|> nonZeroInt let fractionPart = point >>. manyChars1 digit let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit Typical code using parser combinators

Slide 3

Slide 3 text

let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let point = pchar '.' let e = pchar 'e' <|> pchar 'E' let optPlusMinus = opt (pchar '-' <|> pchar '+') let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest let intPart = zero <|> nonZeroInt let fractionPart = point >>. manyChars1 digit let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

Slide 4

Slide 4 text

Overview 1. What is a parser combinator library? 2. The foundation: a simple parser 3. Three basic parser combinators 4. Building combinators from other combinators 5. Improving the error messages 6. Building a JSON parser

Slide 5

Slide 5 text

Part 1 What is a parser combinator library?

Slide 6

Slide 6 text

Something to match Parser Create step in parsing recipe Creating a parsing recipe A “Parser-making" function This is a recipe to make something, not the thing itself

Slide 7

Slide 7 text

Parser Combining parsing recipes A recipe to make a more complicated thing Parser Parser combined with A "combinator"

Slide 8

Slide 8 text

Parser Run Running a parsing recipe input Success or Failure

Slide 9

Slide 9 text

Why parser combinators? • Written in your favorite programming language • No preprocessing needed – Lexing, parsing, AST transform all in one. – REPL-friendly • Easy to create little DSLs – Google "fogcreek fparsec" • Fun way of understanding functional composition

Slide 10

Slide 10 text

Part 2: A simple parser

Slide 11

Slide 11 text

Version 1 – parse the character 'A' input pcharA remaining input true/false

Slide 12

Slide 12 text

Version 1 – parse the character 'A' input pcharA remaining input true/false

Slide 13

Slide 13 text

let pcharA input = if String.IsNullOrEmpty(input) then (false,"") else if input.[0] = 'A' then let remaining = input.[1..] (true,remaining) else (false,input)

Slide 14

Slide 14 text

Version 2 – parse any character matched char input pchar remaining input charToMatch failure message

Slide 15

Slide 15 text

let pchar (charToMatch,input) = if String.IsNullOrEmpty(input) then "No more input" else let first = input.[0] if first = charToMatch then let remaining = input.[1..] (charToMatch,remaining) else sprintf "Expecting '%c'. Got '%c'" charToMatch first

Slide 16

Slide 16 text

Fix – create a choice type to capture either case Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result<'a> = | Success of 'a | Failure of string

Slide 17

Slide 17 text

Fix – create a choice type to capture either case Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result<'a> = | Success of 'a | Failure of string

Slide 18

Slide 18 text

Fix – create a choice type to capture either case Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result<'a> = | Success of 'a | Failure of string

Slide 19

Slide 19 text

let pchar (charToMatch,input) = if String.IsNullOrEmpty(input) then Failure "No more input" else let first = input.[0] if first = charToMatch then let remaining = input.[1..] Success (charToMatch,remaining) else let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch firs Failure msg

Slide 20

Slide 20 text

Version 3 – returning a function Success: matched char input pchar Success: remaining input charToMatch Failure: message

Slide 21

Slide 21 text

Version 3 – returning a function Success: matched char input pchar Success: remaining input charToMatch Failure: message

Slide 22

Slide 22 text

Version 3 – returning a function input pchar charToMatch

Slide 23

Slide 23 text

Version 3 – returning a function charToMatch pchar

Slide 24

Slide 24 text

Version 3 – returning a function charToMatch pchar

Slide 25

Slide 25 text

Version 4 – wrapping the function in a type charToMatch pchar Parser

Slide 26

Slide 26 text

Version 4 – wrapping the function in a type charToMatch pchar Parser type Parser<'a> = Parser of (string -> Result<'a * string>) A function that takes a string and returns a Result

Slide 27

Slide 27 text

Version 4 – wrapping the function in a type charToMatch pchar Parser type Parser<'a> = Parser of (string -> Result<'a * string>) Wrapper

Slide 28

Slide 28 text

Creating parsing recipes

Slide 29

Slide 29 text

charToMatch input Parser A parsing recipe for a char

Slide 30

Slide 30 text

Parser Run Running a parsing recipe input Success, or Failure

Slide 31

Slide 31 text

Running a parsing recipe input Parser Parser Run input Success, or Failure

Slide 32

Slide 32 text

let run parser input = // unwrap parser to get inner function let (Parser innerFn) = parser // call inner function with input innerFn input

Slide 33

Slide 33 text

Enough talk, show me some code

Slide 34

Slide 34 text

Part 3: Three basic combinators

Slide 35

Slide 35 text

What is a combinator? • A “combinator” library is a library designed around combining things to get more complex values of the same type. • integer + integer = integer • list @ list = list // @ is list concat • Parser ?? Parser = Parser

Slide 36

Slide 36 text

Basic parser combinators • Parser andThen Parser => Parser • Parser orElse Parser => Parser • Parser map (transformer) => Parser

Slide 37

Slide 37 text

AndThen parser combinator • Run the first parser. – If there is a failure, return. • Otherwise, run the second parser with the remaining input. – If there is a failure, return. • If both parsers succeed, return a pair (tuple) that contains both parsed values.

Slide 38

Slide 38 text

let andThen parser1 parser2 = let innerFn input = // run parser1 with the input let result1 = run parser1 input // test the 1st parse result for Failure/Success match result1 with | Failure err -> Failure err // return error from parser1 | Success (value1,remaining1) -> // run parser2 with the remaining input (continued on next slide..)

Slide 39

Slide 39 text

let andThen parser1 parser2 = [...snip...] let result2 = run parser2 remaining1 // test the 2nd parse result for Failure/Success match result2 with | Failure err -> Failure err // return error from parser2 | Success (value2,remaining2) -> let combinedValue = (value1,value2) Success (combinedValue,remaining2) // return the inner function Parser innerFn

Slide 40

Slide 40 text

OrElse parser combinator • Run the first parser. • On success, return the parsed value, along with the remaining input. • Otherwise, on failure, run the second parser with the original input... • ...and in this case, return the result (success or failure) from the second parser.

Slide 41

Slide 41 text

let orElse parser1 parser2 = let innerFn input = // run parser1 with the input let result1 = run parser1 input // test the result for Failure/Success match result1 with | Success result -> // if success, return the original result result1 | Failure err -> // if failed, run parser2 with the input (continued on next slide..)

Slide 42

Slide 42 text

let orElse parser1 parser2 = [...snip...] | Failure err -> // if failed, run parser2 with the input let result2 = run parser2 input // return parser2's result result2 // return the inner function Parser innerFn

Slide 43

Slide 43 text

Map parser combinator • Run the parser. • On success, transform the parsed value using the provided function. • Otherwise, return the failure

Slide 44

Slide 44 text

let mapP f parser = let innerFn input = // run parser with the input let result = run parser input // test the result for Failure/Success match result with | Success (value,remaining) -> // if success, return the value transformed by f let newValue = f value Success (newValue, remaining) (continued on next slide..)

Slide 45

Slide 45 text

let mapP f parser = [...snip...] | Failure err -> // if failed, return the error Failure err // return the inner function Parser innerFn

Slide 46

Slide 46 text

Parser combinator operators pcharA .>>. pcharB // 'A' andThen 'B' pcharA <|> pcharB // 'A' orElse 'B' pcharA |>> (...) // map ch to something

Slide 47

Slide 47 text

Demo

Slide 48

Slide 48 text

Part 4: Building complex combinators from these basic ones

Slide 49

Slide 49 text

[ 1; 2; 3] |> List.reduce (+) // 1 + 2 + 3 [ pcharA; pcharB; pcharC] |> List.reduce ( .>>. ) // pcharA .>>. pcharB .>>. pcharC [ pcharA; pcharB; pcharC] |> List.reduce ( <|> ) // pcharA <|> pcharB <|> pcharC Using reduce to combine parsers

Slide 50

Slide 50 text

let choice listOfParsers = listOfParsers |> List.reduce ( <|> ) let anyOf listOfChars = listOfChars |> List.map pchar // convert char into Parser |> choice // combine them all let parseLowercase = anyOf ['a'..'z'] let parseDigit = anyOf ['0'..'9'] Using reduce to combine parsers

Slide 51

Slide 51 text

/// Convert a list of parsers into a Parser of list let sequence listOfParsers = let concatResults p1 p2 = // helper p1 .>>. p2 |>> (fun (list1,list2) -> list1 @ list2) listOfParsers // map each parser result to a list |> Seq.map (fun parser -> parser |>> List.singleton) // reduce by concatting the results of AndThen |> Seq.reduce concatResults Using reduce to combine parsers

Slide 52

Slide 52 text

/// match a specific string let pstring str = str // map each char to a pchar |> Seq.map pchar // convert to Parser |> sequence // convert Parser to Parser |>> List.toArray // convert Parser to Parser |>> String Using reduce to combine parsers

Slide 53

Slide 53 text

Demo

Slide 54

Slide 54 text

Yet more combinators

Slide 55

Slide 55 text

“More than one” combinators let many p = ... // zero or more let many1 p = ... // one or more let opt p = ... // zero or one // example let whitespaceChar = anyOf [' '; '\t'; '\n'] let whitespace = many1 whitespaceChar

Slide 56

Slide 56 text

“Throwing away” combinators p1 .>> p2 // throw away right side p1 >>. p2 // throw away left side // keep only the inside value let between p1 p2 p3 = p1 >>. p2 .>> p3 // example let pdoublequote = pchar '"' let quotedInt = between pdoublequote pint pdoublequote

Slide 57

Slide 57 text

“Separator” combinators let sepBy1 p sep = ... /// one or more p separated by sep let sepBy p sep = ... /// zero or more p separated by sep // example let comma = pchar ',' let digit = anyOf ['0'..'9'] let oneOrMoreDigitList = sepBy1 digit comma

Slide 58

Slide 58 text

Demo

Slide 59

Slide 59 text

Part 5: Improving the error messages

Slide 60

Slide 60 text

input Parser Named parsers Name: “Digit” Parsing Function:

Slide 61

Slide 61 text

Named parsers let ( > ) = setLabel // infix version run parseDigit "ABC" // without the label // Error parsing "9" : Unexpected 'A' let parseDigit_WithLabel = anyOf ['0'..'9'] > "digit" run parseDigit_WithLabel "ABC" // with the label // Error parsing "digit" : Unexpected 'A'

Slide 62

Slide 62 text

input Parser Extra input context Input: * Stream of characters * Line, Column

Slide 63

Slide 63 text

Extra input context run pint "-Z123" // Line:0 Col:1 Error parsing integer // -Z123 // ^Unexpected 'Z' run pfloat "-123Z45" // Line:0 Col:4 Error parsing float // -123Z45 // ^Unexpected 'Z'

Slide 64

Slide 64 text

Part 6: Building a JSON Parser

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

// A type that represents the previous diagram type JValue = | JString of string | JNumber of float | JObject of Map | JArray of JValue list | JBool of bool | JNull

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

Parsing JSON Null

Slide 69

Slide 69 text

// new helper operator. let (>>%) p x = p |>> (fun _ -> x) // runs parser p, but ignores the result // Parse a "null" let jNull = pstring "null" >>% JNull // map to JNull > "null" // give it a label

Slide 70

Slide 70 text

Parsing JSON Bool

Slide 71

Slide 71 text

// Parse a boolean let jBool = let jtrue = pstring "true" >>% JBool true // map to JBool let jfalse = pstring "false" >>% JBool false // map to JBool // choose between true and false jtrue <|> jfalse > "bool" // give it a label

Slide 72

Slide 72 text

Parsing a JSON String

Slide 73

Slide 73 text

No content

Slide 74

Slide 74 text

Call this "unescaped char"

Slide 75

Slide 75 text

/// Parse an unescaped char let jUnescapedChar = let label = "char" satisfy (fun ch -> (ch <> '\\') && (ch <> '\"') ) label

Slide 76

Slide 76 text

Call this "escaped char"

Slide 77

Slide 77 text

let jEscapedChar = [ // each item is (stringToMatch, resultChar) ("\\\"",'\"') // quote ("\\\\",'\\') // reverse solidus ("\\/",'/') // solidus ("\\b",'\b') // backspace ("\\f",'\f') // formfeed ("\\n",'\n') // newline ("\\r",'\r') // cr ("\\t",'\t') // tab ] // convert each pair into a parser |> List.map (fun (toMatch,result) -> pstring toMatch >>% result) // and combine them into one |> choice > "escaped char" // set label

Slide 78

Slide 78 text

Call this "unicode char"

Slide 79

Slide 79 text

"unescaped char" or "escaped char" or "unicode char"

Slide 80

Slide 80 text

let quotedString = let quote = pchar '\"' > "quote" let jchar = jUnescapedChar <|> jEscapedChar <|> jUnicodeChar // set up the main parser quote >>. manyChars jchar .>> quote let jString = // wrap the string in a JString quotedString |>> JString // convert to JString > "quoted string" // add label

Slide 81

Slide 81 text

Parsing a JSON Number

Slide 82

Slide 82 text

No content

Slide 83

Slide 83 text

"int part" "sign part"

Slide 84

Slide 84 text

let optSign = opt (pchar '-') let zero = pstring "0" let digitOneNine = satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9" let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest // set up the integer part let intPart = zero <|> nonZeroInt

Slide 85

Slide 85 text

"fraction part"

Slide 86

Slide 86 text

// set up the fraction part let point = pchar '.' let fractionPart = point >>. manyChars1 digit

Slide 87

Slide 87 text

"exponent part"

Slide 88

Slide 88 text

// set up the exponent part let e = pchar 'e' <|> pchar 'E' let optPlusMinus = opt (pchar '-' <|> pchar '+') let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit

Slide 89

Slide 89 text

"exponent part" "int part" "fraction part" "sign part"

Slide 90

Slide 90 text

// set up the main JNumber parser optSign .>>. intPart .>>. opt fractionPart .>>. opt exponentPart |>> convertToJNumber // not shown > "number" // add label

Slide 91

Slide 91 text

Parsing JSON Arrays and Objects

Slide 92

Slide 92 text

Completing the JSON Parser

Slide 93

Slide 93 text

No content

Slide 94

Slide 94 text

// the final parser combines the others together let jValue = choice [ jNull jBool jNumber jString jArray jObject ]

Slide 95

Slide 95 text

Demo: the JSON parser in action

Slide 96

Slide 96 text

Summary • Treating a function like an object – Returning a function from a function – Wrapping a function in a type • Working with a "recipe" (aka "effect") – Combining recipes before running them. • The power of combinators – A few basic combinators: "andThen", "orElse", etc. – Complex parsers are built from smaller components. • Combinator libraries are small but powerful – Less than 500 lines for combinator library – Less than 300 lines for JSON parser itself

Slide 97

Slide 97 text

Want more? • For a production-ready library for F#, search for "fparsec" • There are similar libraries for other languages

Slide 98

Slide 98 text

Thanks! @ScottWlaschin fsharpforfunandprofit.com/parser Contact me Slides and video here Let us know if you need help with F#