Higher-Order Functions for
Parsing
By Graham Hutton
Slide 3
Slide 3 text
• Abstract & Introduction
• Build a parser, one fn at a time
• Moving beyond toy parsers
Slide 4
Slide 4 text
Abstract
Slide 5
Slide 5 text
In combinator parsing, the text of parsers resembles BNF
notation. We present the basic method, and a number of
extensions. We address the special problems presented by
whitespace, and parsers with separate lexical and syntactic
phases. In particular, a combining form for handling the
“offside rule” is given. Other extensions to the basic
method include an “into” combining form with many useful
applications, and a simple means by which combinator
parsers can produce more informative error messages.
Slide 6
Slide 6 text
• Combinators that resemble BNF notation
• Whitespace handling through "Offside
Rule"
• "Into" combining form for advanced parsing
• Strategy for better error messages
Slide 7
Slide 7 text
Introduction
Slide 8
Slide 8 text
Primitive Parsers
• Take input
• Process one character
• Return results and unused input
Slide 9
Slide 9 text
Combinators
• Combine primitives
• Define building blocks
• Return results and unused input
Slide 10
Slide 10 text
Lexical analysis and syntax
• Combine the combinators
• Define lexical elements
• Return results and unused input
rule: 'a' followed by 'b'
input: "abcdef"
output: [(('a','b'),"cdef")]
Slide 14
Slide 14 text
rule: 'a' followed by 'b'
input: "abcdef"
output: [(('a','b'),"cdef")]
Combinator
Slide 15
Slide 15 text
Language choice
Slide 16
Slide 16 text
Suggested:
Lazy Functional Languages
Slide 17
Slide 17 text
Miranda:
Author's choice
Slide 18
Slide 18 text
Haskell:
An obvious choice.
Slide 19
Slide 19 text
Racket:
Another obvious choice.
Slide 20
Slide 20 text
Ruby:
to so $ for learning
Slide 21
Slide 21 text
OCaml:
Functional, but not lazy.
Slide 22
Slide 22 text
Haskell %
Slide 23
Slide 23 text
Simple when stick to fundamental FP
• Higher order functions
• Immutability
• Recursive problem solving
• Algebraic types
Slide 24
Slide 24 text
Let's build a parser,
one fn at a time
Slide 25
Slide 25 text
type Parser a b = [a] !-> [(b, [a])]
Slide 26
Slide 26 text
Types help with abstraction
• We'll be dealing with parsers and combinators
• Parsers are functions, they accept input and
return results
• Combinators accept parsers and return
parsers
Slide 27
Slide 27 text
A parser is a function that
accepts an input and returns
parsed results and the
unused input for each result
Slide 28
Slide 28 text
Parser is a function type that
accepts a list of type a
and returns all possible results
as a list of tuples of type (b, [a])
Slide 29
Slide 29 text
(Parser Char Number)
input: "42 it is!" !-- a is a [Char]
output: [(42, " it is!")] !-- b is a Number
Slide 30
Slide 30 text
type Parser a b = [a] !-> [(b, [a])]
Slide 31
Slide 31 text
Primitive Parsers
Slide 32
Slide 32 text
succeed !:: b !-> Parser a b
succeed v inp = [(v, inp)]
Slide 33
Slide 33 text
Always succeeds
Returns "v" for all inputs
Slide 34
Slide 34 text
failure !:: Parser a b
failure inp = []
Slide 35
Slide 35 text
Always fails
Returns "[]" for all inputs
Slide 36
Slide 36 text
satisfy !:: (a !-> Bool) !-> Parser a a
satisfy p [] = failure []
satisfy p (x:xs)
| p x = succeed x xs !-- if p(x) is true
| otherwise = failure []
Slide 37
Slide 37 text
satisfy !:: (a !-> Bool) !-> Parser a a
satisfy p [] = failure []
satisfy p (x:xs)
| p x = succeed x xs !-- if p(x) is true
| otherwise = failure []
Guard Clauses, if you want to Google
Slide 38
Slide 38 text
literal !:: Eq a !=> a !-> Parser a a
literal x = satisfy (!== x)
satisfy !:: (a !-> Bool) !-> Parser a a
satisfy p [] = failure []
satisfy p (x:xs)
| p x = succeed x xs !-- if p(x) is true
| otherwise = failure []
Slide 120
Slide 120 text
satisfy !:: (a !-> Bool) !-> Parser (Pos a) a
satisfy p [] = failure []
satisfy p (x:xs)
| p a = succeed a xs !-- if p(a) is true
| otherwise = failure []
where (a, (r, c)) = x
Slide 121
Slide 121 text
satisfy !:: (a !-> Bool) !-> Parser (Pos a) a
satisfy p [] = failure []
satisfy p (x:xs)
| p a = succeed a xs !-- if p(a) is true
| otherwise = failure []
where (a, (r, c)) = x
Slide 122
Slide 122 text
offside !:: Parser (Pos a) b !-> Parser (Pos a) b
offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]
where inpON = takeWhile (onside (head inp)) inp
inpOFF = drop (length inpON) inp
onside
(a, (r, c)) (b, (r', c')) =
r' !>= r !&& c' !>= c
Slide 123
Slide 123 text
offside !:: Parser (Pos a) b !-> Parser (Pos a) b
Slide 124
Slide 124 text
offside !:: Parser (Pos a) b !-> Parser (Pos a) b
offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]
!-- only allow a kind of tag
kind !:: Tag !-> Parser (Pos Token) [Char]
kind t = (satisfy ((!== t).fst)) `using` snd
— only allow a given symbol
lit !:: [Char] !-> Parser (Pos Token) [Char]
lit xs = (literal (Symbol, xs)) `using` snd