Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Understanding Parser Combinators

Sponsored · SiteGround - Reliable hosting with speed, security, and support you can count on.

Understanding Parser Combinators

Traditionally, writing parsers has been hard, involving arcane tools like Lex and Yacc. An alternative approach is to write a parser in your favourite programming language, using a "parser combinator" library and concepts no more complicated than regular expressions.

In this talk, we'll do a deep dive into parser combinators. We'll build a parser combinator library from scratch in F# using functional programming techniques, and then use it to implement a full featured JSON parser.

Code and video at https://fsharpforfunandprofit.com/parser/

Avatar for Scott Wlaschin

Scott Wlaschin

June 15, 2016
Tweet

More Decks by Scott Wlaschin

Other Decks in Programming

Transcript

  1. let digit = satisfy (fun ch -> Char.IsDigit ch )

    "digit" let point = pchar '.' let e = pchar 'e' <|> pchar 'E' let optPlusMinus = opt (pchar '-' <|> pchar '+') let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest let intPart = zero <|> nonZeroInt let fractionPart = point >>. manyChars1 digit let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit Typical code using parser combinators
  2. let digit = satisfy (fun ch -> Char.IsDigit ch )

    "digit" let point = pchar '.' let e = pchar 'e' <|> pchar 'E' let optPlusMinus = opt (pchar '-' <|> pchar '+') let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest let intPart = zero <|> nonZeroInt let fractionPart = point >>. manyChars1 digit let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit
  3. Overview 1. What is a parser combinator library? 2. The

    foundation: a simple parser 3. Three basic parser combinators 4. Building combinators from other combinators 5. Improving the error messages 6. Building a JSON parser
  4. Something to match Parser<something> Create step in parsing recipe Creating

    a parsing recipe A “Parser-making" function This is a recipe to make something, not the thing itself
  5. Parser<thingC> Combining parsing recipes A recipe to make a more

    complicated thing Parser<thingA> Parser<thingB> combined with A "combinator"
  6. Why parser combinators? • Written in your favorite programming language

    • No preprocessing needed – Lexing, parsing, AST transform all in one. – REPL-friendly • Easy to create little DSLs – Google "fogcreek fparsec" • Fun way of understanding functional composition
  7. let pcharA input = if String.IsNullOrEmpty(input) then (false,"") else if

    input.[0] = 'A' then let remaining = input.[1..] (true,remaining) else (false,input)
  8. Version 2 – parse any character matched char input pchar

    remaining input charToMatch failure message
  9. let pchar (charToMatch,input) = if String.IsNullOrEmpty(input) then "No more input"

    else let first = input.[0] if first = charToMatch then let remaining = input.[1..] (charToMatch,remaining) else sprintf "Expecting '%c'. Got '%c'" charToMatch first
  10. Fix – create a choice type to capture either case

    Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result<'a> = | Success of 'a | Failure of string
  11. Fix – create a choice type to capture either case

    Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result<'a> = | Success of 'a | Failure of string
  12. Fix – create a choice type to capture either case

    Success: matched char input pchar Success: remaining input charToMatch Failure: message type Result<'a> = | Success of 'a | Failure of string
  13. let pchar (charToMatch,input) = if String.IsNullOrEmpty(input) then Failure "No more

    input" else let first = input.[0] if first = charToMatch then let remaining = input.[1..] Success (charToMatch,remaining) else let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch firs Failure msg
  14. Version 3 – returning a function Success: matched char input

    pchar Success: remaining input charToMatch Failure: message
  15. Version 3 – returning a function Success: matched char input

    pchar Success: remaining input charToMatch Failure: message
  16. Version 4 – wrapping the function in a type charToMatch

    pchar Parser<char> type Parser<'a> = Parser of (string -> Result<'a * string>) A function that takes a string and returns a Result
  17. Version 4 – wrapping the function in a type charToMatch

    pchar Parser<char> type Parser<'a> = Parser of (string -> Result<'a * string>) Wrapper
  18. let run parser input = // unwrap parser to get

    inner function let (Parser innerFn) = parser // call inner function with input innerFn input
  19. What is a combinator? • A “combinator” library is a

    library designed around combining things to get more complex values of the same type. • integer + integer = integer • list @ list = list // @ is list concat • Parser ?? Parser = Parser
  20. Basic parser combinators • Parser andThen Parser => Parser •

    Parser orElse Parser => Parser • Parser map (transformer) => Parser
  21. AndThen parser combinator • Run the first parser. – If

    there is a failure, return. • Otherwise, run the second parser with the remaining input. – If there is a failure, return. • If both parsers succeed, return a pair (tuple) that contains both parsed values.
  22. let andThen parser1 parser2 = let innerFn input = //

    run parser1 with the input let result1 = run parser1 input // test the 1st parse result for Failure/Success match result1 with | Failure err -> Failure err // return error from parser1 | Success (value1,remaining1) -> // run parser2 with the remaining input (continued on next slide..)
  23. let andThen parser1 parser2 = [...snip...] let result2 = run

    parser2 remaining1 // test the 2nd parse result for Failure/Success match result2 with | Failure err -> Failure err // return error from parser2 | Success (value2,remaining2) -> let combinedValue = (value1,value2) Success (combinedValue,remaining2) // return the inner function Parser innerFn
  24. OrElse parser combinator • Run the first parser. • On

    success, return the parsed value, along with the remaining input. • Otherwise, on failure, run the second parser with the original input... • ...and in this case, return the result (success or failure) from the second parser.
  25. let orElse parser1 parser2 = let innerFn input = //

    run parser1 with the input let result1 = run parser1 input // test the result for Failure/Success match result1 with | Success result -> // if success, return the original result result1 | Failure err -> // if failed, run parser2 with the input (continued on next slide..)
  26. let orElse parser1 parser2 = [...snip...] | Failure err ->

    // if failed, run parser2 with the input let result2 = run parser2 input // return parser2's result result2 // return the inner function Parser innerFn
  27. Map parser combinator • Run the parser. • On success,

    transform the parsed value using the provided function. • Otherwise, return the failure
  28. let mapP f parser = let innerFn input = //

    run parser with the input let result = run parser input // test the result for Failure/Success match result with | Success (value,remaining) -> // if success, return the value transformed by f let newValue = f value Success (newValue, remaining) (continued on next slide..)
  29. let mapP f parser = [...snip...] | Failure err ->

    // if failed, return the error Failure err // return the inner function Parser innerFn
  30. Parser combinator operators pcharA .>>. pcharB // 'A' andThen 'B'

    pcharA <|> pcharB // 'A' orElse 'B' pcharA |>> (...) // map ch to something
  31. [ 1; 2; 3] |> List.reduce (+) // 1 +

    2 + 3 [ pcharA; pcharB; pcharC] |> List.reduce ( .>>. ) // pcharA .>>. pcharB .>>. pcharC [ pcharA; pcharB; pcharC] |> List.reduce ( <|> ) // pcharA <|> pcharB <|> pcharC Using reduce to combine parsers
  32. let choice listOfParsers = listOfParsers |> List.reduce ( <|> )

    let anyOf listOfChars = listOfChars |> List.map pchar // convert char into Parser<char> |> choice // combine them all let parseLowercase = anyOf ['a'..'z'] let parseDigit = anyOf ['0'..'9'] Using reduce to combine parsers
  33. /// Convert a list of parsers into a Parser of

    list let sequence listOfParsers = let concatResults p1 p2 = // helper p1 .>>. p2 |>> (fun (list1,list2) -> list1 @ list2) listOfParsers // map each parser result to a list |> Seq.map (fun parser -> parser |>> List.singleton) // reduce by concatting the results of AndThen |> Seq.reduce concatResults Using reduce to combine parsers
  34. /// match a specific string let pstring str = str

    // map each char to a pchar |> Seq.map pchar // convert to Parser<char list> |> sequence // convert Parser<char list> to Parser<char array> |>> List.toArray // convert Parser<char array> to Parser<string> |>> String Using reduce to combine parsers
  35. “More than one” combinators let many p = ... //

    zero or more let many1 p = ... // one or more let opt p = ... // zero or one // example let whitespaceChar = anyOf [' '; '\t'; '\n'] let whitespace = many1 whitespaceChar
  36. “Throwing away” combinators p1 .>> p2 // throw away right

    side p1 >>. p2 // throw away left side // keep only the inside value let between p1 p2 p3 = p1 >>. p2 .>> p3 // example let pdoublequote = pchar '"' let quotedInt = between pdoublequote pint pdoublequote
  37. “Separator” combinators let sepBy1 p sep = ... /// one

    or more p separated by sep let sepBy p sep = ... /// zero or more p separated by sep // example let comma = pchar ',' let digit = anyOf ['0'..'9'] let oneOrMoreDigitList = sepBy1 digit comma
  38. Named parsers let ( <?> ) = setLabel // infix

    version run parseDigit "ABC" // without the label // Error parsing "9" : Unexpected 'A' let parseDigit_WithLabel = anyOf ['0'..'9'] <?> "digit" run parseDigit_WithLabel "ABC" // with the label // Error parsing "digit" : Unexpected 'A'
  39. Extra input context run pint "-Z123" // Line:0 Col:1 Error

    parsing integer // -Z123 // ^Unexpected 'Z' run pfloat "-123Z45" // Line:0 Col:4 Error parsing float // -123Z45 // ^Unexpected 'Z'
  40. // A type that represents the previous diagram type JValue

    = | JString of string | JNumber of float | JObject of Map<string, JValue> | JArray of JValue list | JBool of bool | JNull
  41. // new helper operator. let (>>%) p x = p

    |>> (fun _ -> x) // runs parser p, but ignores the result // Parse a "null" let jNull = pstring "null" >>% JNull // map to JNull <?> "null" // give it a label
  42. // Parse a boolean let jBool = let jtrue =

    pstring "true" >>% JBool true // map to JBool let jfalse = pstring "false" >>% JBool false // map to JBool // choose between true and false jtrue <|> jfalse <?> "bool" // give it a label
  43. /// Parse an unescaped char let jUnescapedChar = let label

    = "char" satisfy (fun ch -> (ch <> '\\') && (ch <> '\"') ) label
  44. let jEscapedChar = [ // each item is (stringToMatch, resultChar)

    ("\\\"",'\"') // quote ("\\\\",'\\') // reverse solidus ("\\/",'/') // solidus ("\\b",'\b') // backspace ("\\f",'\f') // formfeed ("\\n",'\n') // newline ("\\r",'\r') // cr ("\\t",'\t') // tab ] // convert each pair into a parser |> List.map (fun (toMatch,result) -> pstring toMatch >>% result) // and combine them into one |> choice <?> "escaped char" // set label
  45. let quotedString = let quote = pchar '\"' <?> "quote"

    let jchar = jUnescapedChar <|> jEscapedChar <|> jUnicodeChar // set up the main parser quote >>. manyChars jchar .>> quote let jString = // wrap the string in a JString quotedString |>> JString // convert to JString <?> "quoted string" // add label
  46. let optSign = opt (pchar '-') let zero = pstring

    "0" let digitOneNine = satisfy (fun ch -> Char.IsDigit ch && ch <> '0') "1-9" let digit = satisfy (fun ch -> Char.IsDigit ch ) "digit" let nonZeroInt = digitOneNine .>>. manyChars digit |>> fun (first,rest) -> string first + rest // set up the integer part let intPart = zero <|> nonZeroInt
  47. // set up the fraction part let point = pchar

    '.' let fractionPart = point >>. manyChars1 digit
  48. // set up the exponent part let e = pchar

    'e' <|> pchar 'E' let optPlusMinus = opt (pchar '-' <|> pchar '+') let exponentPart = e >>. optPlusMinus .>>. manyChars1 digit
  49. // set up the main JNumber parser optSign .>>. intPart

    .>>. opt fractionPart .>>. opt exponentPart |>> convertToJNumber // not shown <?> "number" // add label
  50. // the final parser combines the others together let jValue

    = choice [ jNull jBool jNumber jString jArray jObject ]
  51. Summary • Treating a function like an object – Returning

    a function from a function – Wrapping a function in a type • Working with a "recipe" (aka "effect") – Combining recipes before running them. • The power of combinators – A few basic combinators: "andThen", "orElse", etc. – Complex parsers are built from smaller components. • Combinator libraries are small but powerful – Less than 500 lines for combinator library – Less than 300 lines for JSON parser itself
  52. Want more? • For a production-ready library for F#, search

    for "fparsec" • There are similar libraries for other languages