Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Combinator Parsing

Combinator Parsing

A dive into and a Haskell implementation of Graham Hutton's "Higher Order Functions for Parsing" from the Journal of Functional Programming, 1992.

Swanand Pagnis

January 20, 2018
Tweet

More Decks by Swanand Pagnis

Other Decks in Technology

Transcript

  1. • Abstract & Introduction • Build a parser, one fn

    at a time • Moving beyond toy parsers
  2. In combinator parsing, the text of parsers resembles BNF notation.

    We present the basic method, and a number of extensions. We address the special problems presented by whitespace, and parsers with separate lexical and syntactic phases. In particular, a combining form for handling the “offside rule” is given. Other extensions to the basic method include an “into” combining form with many useful applications, and a simple means by which combinator parsers can produce more informative error messages.
  3. • Combinators that resemble BNF notation • Whitespace handling through

    "Offside Rule" • "Into" combining form for advanced parsing • Strategy for better error messages
  4. Lexical analysis and syntax • Combine the combinators • Define

    lexical elements • Return results and unused input
  5. Simple when stick to fundamental FP • Higher order functions

    • Immutability • Recursive problem solving • Algebraic types
  6. Types help with abstraction • We'll be dealing with parsers

    and combinators • Parsers are functions, they accept input and return results • Combinators accept parsers and return parsers
  7. A parser is a function that accepts an input and

    returns parsed results and the unused input for each result
  8. Parser is a function type that accepts a list of

    type a and returns all possible results as a list of tuples of type (b, [a])
  9. (Parser Char Number) input: "42 it is!" !-- a is

    a [Char] output: [(42, " it is!")] !-- b is a Number
  10. satisfy !:: (a !-> Bool) !-> Parser a a satisfy

    p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []
  11. satisfy !:: (a !-> Bool) !-> Parser a a satisfy

    p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure [] Guard Clauses, if you want to Google
  12. literal !:: Eq a !=> a !-> Parser a a

    literal x = satisfy (!== x)
  13. alt !:: Parser a b !-> Parser a b !->

    Parser a b (p1 `alt` p2) inp = p1 inp !++ p2 inp
  14. and_then !:: Parser a b !-> Parser a c !->

    Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ]
  15. and_then !:: Parser a b !-> Parser a c !->

    Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ] List comprehensions
  16. (v11, out11) (v12, out12) (v13, out13) … (v21, out21) (v22,

    out22) … (v31, out31) (v32, out32) … p1 p2
  17. using !:: Parser a b !-> (b !-> c) !->

    Parser a c (p `using` f) inp = [(f v, out) | (v, out) !<- p inp ]
  18. many !:: Parser a b !-> Parser a [b] many

    p = ((p `and_then` many p) `using` cons) `alt` (succeed [])
  19. some !:: Parser a b !-> Parser a [b] some

    p = ((p `and_then` many p) `using` cons)
  20. positive_integer = some (satisfy Data.Char.isDigit) negative_integer = ((literal '-') `and_then`

    positive_integer) `using` cons positive_decimal = (positive_integer `and_then` (((literal '.') `and_then` positive_integer) `using` cons)) `using` join negative_decimal = ((literal '-') `and_then` positive_decimal) `using` cons
  21. string !:: (Eq a) !=> [a] !-> Parser a [a]

    string [] = succeed [] string (x:xs) = (literal x `and_then` string xs) `using` cons
  22. xthen !:: Parser a b !-> Parser a c !->

    Parser a c p1 `xthen` p2 = (p1 `and_then` p2) `using` snd
  23. thenx !:: Parser a b !-> Parser a c !->

    Parser a b p1 `thenx` p2 = (p1 `and_then` p2) `using` fst
  24. ret !:: Parser a b !-> c !-> Parser a

    c p `ret` v = p `using` (const v)
  25. data Expr = Const Double | Expr `Add` Expr |

    Expr `Sub` Expr | Expr `Mul` Expr | Expr `Div` Expr
  26. BNF Notation expn !::= expn + expn | expn −

    expn | expn ∗ expn | expn / expn | digit+ | (expn)
  27. Improving a little: expn !::= term + term | term

    − term | term term !::= factor ∗ factor | factor / factor | factor factor !::= digit+ | (expn)
  28. value xs = Const (numval xs) plus (x,y) = x

    `Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y
  29. expn "12*(5+(7-2))" # !=> [ (Const 12.0 `Mul` (Const 5.0

    `Add` (Const 7.0 `Sub` Const 2.0)),""), … ]
  30. value xs = Const (numval xs) plus (x,y) = x

    `Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y
  31. value = numval plus (x,y) = x + y minus

    (x,y) = x - y times (x,y) = x * y divide (x,y) = x / y
  32. The parser (nibble p) has the same behaviour as parser

    p, except that it eats up any white- space in the input string before or afterwards
  33. w = x + y where x = 10 y

    = 15 - 5 z = w * 2
  34. w = x + y where x = 10 y

    = 15 - 5 z = w * 2
  35. When obeying the offside rule, every token must lie either

    directly below, or to the right of its first token
  36. prelex "3 + \n 2 * (4 + 5)" #

    !=> [('3',(0,0)), ('+',(0,2)), ('2',(1,2)), ('*',(1,4)), … ]
  37. satisfy !:: (a !-> Bool) !-> Parser a a satisfy

    p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []
  38. satisfy !:: (a !-> Bool) !-> Parser (Pos a) a

    satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x
  39. satisfy !:: (a !-> Bool) !-> Parser (Pos a) a

    satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x
  40. offside !:: Parser (Pos a) b !-> Parser (Pos a)

    b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c
  41. offside !:: Parser (Pos a) b !-> Parser (Pos a)

    b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]
  42. offside !:: Parser (Pos a) b !-> Parser (Pos a)

    b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]
  43. offside !:: Parser (Pos a) b !-> Parser (Pos a)

    b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp
  44. offside !:: Parser (Pos a) b !-> Parser (Pos a)

    b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp
  45. offside !:: Parser (Pos a) b !-> Parser (Pos a)

    b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c
  46. (3 + 2 * (4 + 5)) + (8 *

    10) (3 + 2 * (4 + 5)) + (8 * 10)
  47. ∅ !|> succeed, fail !|> satisfy, literal !|> alt, and_then,

    using !|> many, some !|> string, thenx, xthen, return !|> expression parser & evaluator !|> any, nibble, symbol !|> prelex, offside
  48. type Parser a b = [a] !-> [(b, [a])] type

    Pos a = (a, (Integer, Integer))
  49. data Tag = Ident | Number | Symbol | Junk

    deriving (Show, Eq) type Token = (Tag, [Char])
  50. (p `tok` t) inp = [ (((t, xs), (r, c)),

    out) | (xs, out) !<- p inp] where (x, (r,c)) = head inp
  51. (p `tok` t) inp = [ ((<token>,<pos>),<unused input>) | (xs,

    out) !<- p inp] where (x, (r,c)) = head inp
  52. (p `tok` t) inp = [ (((t, xs), (r, c)),

    out) | (xs, out) !<- p inp] where (x, (r,c)) = head inp
  53. lexer = lex [ ((some (any_of literal " \n\t")), Junk),

    ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
  54. lexer = lex [ ((some (any_of literal " \n\t")), Junk),

    ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
  55. lexer = lex [ ((some (any_of literal " \n\t")), Junk),

    ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
  56. head (lexer (prelex "where x = 10")) # !=> ([((Symbol,"where"),(0,0)),

    ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])
  57. In this case, "where" is a source of conflict. It

    can be a symbol, or identifier.
  58. lexer = lex [ {- 1 -} ((some (any_of literal

    " \n\t")), Junk), {- 2 -} ((string "where"), Symbol), {- 3 -} (word, Ident), {- 4 -} (number, Number), {- 5 -} ((any_of string ["(",")","="]), Symbol)]
  59. (fst.head.lexer.prelex) "where x = 10" # !=> [((Symbol,"where"),(0,0)), ((Junk," "),(0,5)),

    ((Ident,"x"),(0,6)), ((Junk," "),(0,7)), ((Symbol,"="),(0,8)), ((Junk," "),(0,9)), ((Number,"10"),(0,10))]
  60. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5
  61. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5 Script
  62. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5 Definition
  63. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5 Body
  64. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5 Expression
  65. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5 Definition
  66. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5 Primitives
  67. data Script = Script [Def] data Def = Def Var

    [Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]
  68. prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using`

    numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
  69. !-- only allow a kind of tag kind !:: Tag

    !-> Parser (Pos Token) [Char] kind t = (satisfy ((!== t).fst)) `using` snd — only allow a given symbol lit !:: [Char] !-> Parser (Pos Token) [Char] lit xs = (literal (Symbol, xs)) `using` snd
  70. prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using`

    numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
  71. data Script = Script [Def] data Def = Def Var

    [Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]
  72. f x y = add a b where a =

    25 b = sub x y answer = mult (f 3 7) 5
  73. Script [ Def "f" ["x","y"] ( ((Var "add" `Apply` Var

    "a") `Apply` Var "b") `Where` [ Def "a" [] (Num 25.0), Def "b" [] ((Var "sub" `Apply` Var "x") `Apply` Var "y")]), Def "answer" [] ( (Var "mult" `Apply` ( (Var "f" `Apply` Num 3.0) `Apply` Num 7.0)) `Apply` Num 5.0)]
  74. lexer = lex [ ((some (any_of literal " \n\t")), Junk),

    ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
  75. defn = ((some (kind Ident)) `and_then` ((lit "=") `xthen` (offside

    body))) `using` defnFN body = (expr `and_then` (((lit "where") `xthen` (some defn)) `opt` [])) `using` bodyFN expr = (some prim) `using` (foldl1 Apply) prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
  76. Haskell: Parsec, MegaParsec. ✨ OCaml: Angstrom. ✨ Ruby: rparsec, or

    roll you own Elixir: Combine, ExParsec Python: Parsec. ✨