410

# Combinator Parsing

A dive into and a Haskell implementation of Graham Hutton's "Higher Order Functions for Parsing" from the Journal of Functional Programming, 1992. January 20, 2018

## Transcript

3. ### • Abstract & Introduction • Build a parser, one fn

at a time • Moving beyond toy parsers

5. ### In combinator parsing, the text of parsers resembles BNF notation.

We present the basic method, and a number of extensions. We address the special problems presented by whitespace, and parsers with separate lexical and syntactic phases. In particular, a combining form for handling the “offside rule” is given. Other extensions to the basic method include an “into” combining form with many useful applications, and a simple means by which combinator parsers can produce more informative error messages.
6. ### • Combinators that resemble BNF notation • Whitespace handling through

"Offside Rule" • "Into" combining form for advanced parsing • Strategy for better error messages

8. ### Primitive Parsers • Take input • Process one character •

Return results and unused input
9. ### Combinators • Combine primitives • Deﬁne building blocks • Return

results and unused input
10. ### Lexical analysis and syntax • Combine the combinators • Deﬁne

lexical elements • Return results and unused input

23. ### Simple when stick to fundamental FP • Higher order functions

• Immutability • Recursive problem solving • Algebraic types

26. ### Types help with abstraction • We'll be dealing with parsers

and combinators • Parsers are functions, they accept input and return results • Combinators accept parsers and return parsers
27. ### A parser is a function that accepts an input and

returns parsed results and the unused input for each result
28. ### Parser is a function type that accepts a list of

type a and returns all possible results as a list of tuples of type (b, [a])
29. ### (Parser Char Number) input: "42 it is!" !-- a is

a [Char] output: [(42, " it is!")] !-- b is a Number

= [(v, inp)]

36. ### satisfy !:: (a !-> Bool) !-> Parser a a satisfy

p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []
37. ### satisfy !:: (a !-> Bool) !-> Parser a a satisfy

p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure [] Guard Clauses, if you want to Google
38. ### literal !:: Eq a !=> a !-> Parser a a

literal x = satisfy (!== x)
39. ### match_3 = (literal '3') match_3 "345" !-- !=> [('3',"45")] match_3

"456" !-- !=> []

42. ### match_3_or_4 = match_3 `alt` match_4 match_3_or_4 "345" !-- !=> [('3',"45")]

match_3_or_4 "456" !-- !=> [('4',"56")]
43. ### alt !:: Parser a b !-> Parser a b !->

Parser a b (p1 `alt` p2) inp = p1 inp !++ p2 inp
44. ### (p1 `alt` p2) inp = p1 inp !++ p2 inp

List concatenation

46. None
47. ### and_then !:: Parser a b !-> Parser a c !->

Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ]
48. ### and_then !:: Parser a b !-> Parser a c !->

Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ] List comprehensions
49. ### (v11, out11) (v12, out12) (v13, out13) … (v21, out21) (v22,

out22) … (v31, out31) (v32, out32) … p1 p2

53. ### match_3 = (literal '3') match_3 "345" !-- !=> [('3',"45")] match_3

"456" !-- !=> []

55. ### (keyword "for") "for i in 1!..42" # !=> [(:for, "

i in 1!..42")]
56. ### using !:: Parser a b !-> (b !-> c) !->

Parser a c (p `using` f) inp = [(f v, out) | (v, out) !<- p inp ]

59. ### many !:: Parser a b !-> Parser a [b] many

p = ((p `and_then` many p) `using` cons) `alt` (succeed [])

63. ### some !:: Parser a b !-> Parser a [b] some

p = ((p `and_then` many p) `using` cons)

67. ### positive_integer = some (satisfy Data.Char.isDigit) negative_integer = ((literal '-') `and_then`

positive_integer) `using` cons positive_decimal = (positive_integer `and_then` (((literal '.') `and_then` positive_integer) `using` cons)) `using` join negative_decimal = ((literal '-') `and_then` positive_decimal) `using` cons
68. ### number !:: Parser Char [Char] number = negative_decimal `alt` positive_decimal

`alt` negative_integer `alt` positive_integer

70. ### string !:: (Eq a) !=> [a] !-> Parser a [a]

string [] = succeed [] string (x:xs) = (literal x `and_then` string xs) `using` cons

72. ### xthen !:: Parser a b !-> Parser a c !->

Parser a c p1 `xthen` p2 = (p1 `and_then` p2) `using` snd
73. ### thenx !:: Parser a b !-> Parser a c !->

Parser a b p1 `thenx` p2 = (p1 `and_then` p2) `using` fst
74. ### ret !:: Parser a b !-> c !-> Parser a

c p `ret` v = p `using` (const v)
75. ### succeed, failure, satisfy, literal, alt, and_then, using, string, many, some,

string, word, number, xthen, thenx, ret

77. ### data Expr = Const Double | Expr `Add` Expr |

Expr `Sub` Expr | Expr `Mul` Expr | Expr `Div` Expr

"3*(6+1)"

(Const 1)))

21
81. ### BNF Notation expn !::= expn + expn | expn −

expn | expn ∗ expn | expn / expn | digit+ | (expn)
82. ### Improving a little: expn !::= term + term | term

− term | term term !::= factor ∗ factor | factor / factor | factor factor !::= digit+ | (expn)

')'))))
89. ### value xs = Const (numval xs) plus (x,y) = x

`Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y

93. ### expn "12*(5+(7-2))" # !=> [ (Const 12.0 `Mul` (Const 5.0

`Add` (Const 7.0 `Sub` Const 2.0)),""), … ]
94. ### value xs = Const (numval xs) plus (x,y) = x

`Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y
95. ### value = numval plus (x,y) = x + y minus

(x,y) = x - y times (x,y) = x * y divide (x,y) = x / y

"\n")

104. ### any p [x1,x2,!!...,xn] = (p x1) `alt` (p x2) `alt`

!!... `alt` (p xn)

107. ### The parser (nibble p) has the same behaviour as parser

p, except that it eats up any white- space in the input string before or afterwards

")]

" ")]

112. ### w = x + y where x = 10 y

= 15 - 5 z = w * 2
113. ### w = x + y where x = 10 y

= 15 - 5 z = w * 2
114. ### When obeying the offside rule, every token must lie either

directly below, or to the right of its ﬁrst token

118. ### prelex "3 + \n 2 * (4 + 5)" #

!=> [('3',(0,0)), ('+',(0,2)), ('2',(1,2)), ('*',(1,4)), … ]
119. ### satisfy !:: (a !-> Bool) !-> Parser a a satisfy

p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []
120. ### satisfy !:: (a !-> Bool) !-> Parser (Pos a) a

satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x
121. ### satisfy !:: (a !-> Bool) !-> Parser (Pos a) a

satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x
122. ### offside !:: Parser (Pos a) b !-> Parser (Pos a)

b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c

b
124. ### offside !:: Parser (Pos a) b !-> Parser (Pos a)

b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]

")]
126. ### offside !:: Parser (Pos a) b !-> Parser (Pos a)

b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]
127. ### offside !:: Parser (Pos a) b !-> Parser (Pos a)

b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp
128. ### offside !:: Parser (Pos a) b !-> Parser (Pos a)

b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp
129. ### offside !:: Parser (Pos a) b !-> Parser (Pos a)

b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c
130. ### (3 + 2 * (4 + 5)) + (8 *

10) (3 + 2 * (4 + 5)) + (8 * 10)
131. ### (offside expn) (prelex inp_1) # !=> [(21.0,[('+',(2,0)),('(',(2,2)),('8',(2,3)),('*', (2,5)),('1',(2,7)),('0',(2,8)),(')',(2,9))])] (offside expn)

(prelex inp_2) # !=> [(101.0,[])]

133. ### ∅ !|> succeed, fail !|> satisfy, literal !|> alt, and_then,

using !|> many, some !|> string, thenx, xthen, return !|> expression parser & evaluator !|> any, nibble, symbol !|> prelex, offside

136. ### type Parser a b = [a] !-> [(b, [a])] type

Pos a = (a, (Integer, Integer))
137. ### data Tag = Ident | Number | Symbol | Junk

deriving (Show, Eq) type Token = (Tag, [Char])

139. ### Parse the string with parser p, & apply token t

to the result
140. ### (p `tok` t) inp = [ (((t, xs), (r, c)),

out) | (xs, out) !<- p inp] where (x, (r,c)) = head inp
141. ### (p `tok` t) inp = [ ((<token>,<pos>),<unused input>) | (xs,

out) !<- p inp] where (x, (r,c)) = head inp
142. ### (p `tok` t) inp = [ (((t, xs), (r, c)),

out) | (xs, out) !<- p inp] where (x, (r,c)) = head inp

144. ### many ((p1 `tok` t1) `alt` (p2 `tok` t2) `alt` !!...

`alt` (pn `tok` tn))

146. ### lex = many.(foldr op failure) where (p, t) `op` xs

= (p `tok` t) `alt` xs
147. None
148. ### lex = many.(foldr op failure) where (p, t) `op` xs

= (p `tok` t) `alt` xs

151. ### many ((p1 `tok` t1) `alt` (p2 `tok` t2) `alt` !!...

`alt` (pn `tok` tn))
152. ### lexer = lex [ ((some (any_of literal " \n\t")), Junk),

((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
153. ### lexer = lex [ ((some (any_of literal " \n\t")), Junk),

((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
154. ### lexer = lex [ ((some (any_of literal " \n\t")), Junk),

((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]
155. ### head (lexer (prelex "where x = 10")) # !=> ([((Symbol,"where"),(0,0)),

((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])
156. ### (head.lexer.prelex) "where x = 10" # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)),

((Number,"10"),(0,10)) ],[])
157. ### (head.lexer.prelex) "where x = 10" # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)),

((Number,"10"),(0,10)) ],[]) Function composition

160. ### In this case, "where" is a source of conﬂict. It

can be a symbol, or identiﬁer.
161. ### lexer = lex [ {- 1 -} ((some (any_of literal

" \n\t")), Junk), {- 2 -} ((string "where"), Symbol), {- 3 -} (word, Ident), {- 4 -} (number, Number), {- 5 -} ((any_of string ["(",")","="]), Symbol)]

164. ### strip !:: [(Pos Token)] !-> [(Pos Token)] strip = filter

((!!= Junk).fst.fst)

!=> False
166. ### (fst.head.lexer.prelex) "where x = 10" # !=> [((Symbol,"where"),(0,0)), ((Junk," "),(0,5)),

((Ident,"x"),(0,6)), ((Junk," "),(0,7)), ((Symbol,"="),(0,8)), ((Junk," "),(0,9)), ((Number,"10"),(0,10))]
167. ### (strip.fst.head.lexer.prelex) "where x = 10" # !=> [((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)),

((Number,"10"),(0,10))]

171. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5
172. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5 Script
173. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5 Deﬁnition
174. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5 Body
175. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5 Expression
176. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5 Deﬁnition
177. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5 Primitives
178. ### data Script = Script [Def] data Def = Def Var

[Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]

180. ### defn = ( (some (kind Ident)) `and_then` ((lit "=") `xthen`

(offside body))) `using` defnFN
181. ### body = ( expr `and_then` (((lit "where") `xthen` (some defn))

`opt` [])) `using` bodyFN

183. ### prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using`

numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
184. ### !-- only allow a kind of tag kind !:: Tag

!-> Parser (Pos Token) [Char] kind t = (satisfy ((!== t).fst)) `using` snd — only allow a given symbol lit !:: [Char] !-> Parser (Pos Token) [Char] lit xs = (literal (Symbol, xs)) `using` snd

186. ### defn = ( (some (kind Ident)) `and_then` ((lit "=") `xthen`

(offside body))) `using` defnFN
187. ### body = ( expr `and_then` (((lit "where") `xthen` (some defn))

`opt` [])) `using` bodyFN

189. ### prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using`

numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))
190. ### data Script = Script [Def] data Def = Def Var

[Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]

194. ### f x y = add a b where a =

25 b = sub x y answer = mult (f 3 7) 5
195. ### Script [ Def "f" ["x","y"] ( ((Var "add" `Apply` Var

"a") `Apply` Var "b") `Where` [ Def "a" [] (Num 25.0), Def "b" [] ((Var "sub" `Apply` Var "x") `Apply` Var "y")]), Def "answer" [] ( (Var "mult" `Apply` ( (Var "f" `Apply` Num 3.0) `Apply` Num 7.0)) `Apply` Num 5.0)]

198. ### lexer = lex [ ((some (any_of literal " \n\t")), Junk),

((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]

200. ### defn = ((some (kind Ident)) `and_then` ((lit "=") `xthen` (offside

body))) `using` defnFN body = (expr `and_then` (((lit "where") `xthen` (some defn)) `opt` [])) `using` bodyFN expr = (some prim) `using` (foldl1 Apply) prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))

208. ### Haskell: Parsec, MegaParsec. ✨ OCaml: Angstrom. ✨ Ruby: rparsec, or

roll you own Elixir: Combine, ExParsec Python: Parsec. ✨