Monadic Parsingin PythonAlexey Kachayev, 2014
View Slide
About me• CTO at Attendify.com• Erlang, Clojure, Go, Haskell• Fn.py library author• CPython & Storm contributor
Find me•@kachayev•github.com/kachayev•kachayev <$> gmail.com
Topic
Will talk•What is "parsing(ers)"•Approaches•Monadic parsing from scratch•More…
Will talk•Less about theory•Much more about practice
Won’t talk•What "monad" is•Why FP is cool (*)* you’ll understand it by yourself
Parsing
Definition•Takes grammar•Takes input string (?)•Returns tree (??) or an error
For PLcreators only?
Tasks• Processing information from logs• Source code analysing• DSLs• Protocols & data formats• … and more
Approaches
Production ruleS → SS|(S)|()
Grammarblock =["const" ident "=" number{"," ident "=" number} ";"]["var" ident {"," ident} ";"]{"procedure" ident ";" block ";"} statement!expression = ["+"|"-"] term {("+"|"-") term}!term = factor {("*"|"/") factor}!factor = ident | number | "(" expression ")"!. . . .
•Top-down / bottom-up•Predictive / Backtracking•LL(k), LALR, LR, CYK and othersIn theory
Manually!
@ wikipedia
Manually•Simple to understand•Hard to maintain•Really boring
Can we do better?
What we have•Context-free grammars•Formal theory•Well-defined algorithms•Standard grammar notation(s)
So…
Parser generator•1. Parse DSL notation•2. Generate parser code•("any" language)
Parser generator•*PEG*•*Yacc*•ANTLR•… and tens more
Parser generator•Pros•many targeted languages•formalism•performance & optimisations
Parser generator•Cons•another language•bounded in features•"compiled-time" mostly
Monadic parsers& combinators
Functional PearlsMonadic Parsing in Haskell@Graham Hutton, @Erik Meijer
ParsecMPC library for Haskell
Parsec•Monadic parser combinator(s)•Works even with context-sensitive, infinite LA grammars•Tens of ports to other langs
The Big Idea
Simpletype Parser = String → Tree
Compose?type Parser = String → (Tree, String)
Generalize?type Parser a = String → (a, String)
Errors?type Parser a = String → Maybe (a, String)
Or better…type Parser a = String → [(a, String)]
Let’s try…
Snippets:http://goo.gl/leQIEE
… and so?
Expressiveness•[] for error•[s1] for single (predictive)•[s1..sN] for backtracking
First-class citizen
Skip anything…
Recognise digit
Combinators
RegExp•and: "abc"•or: "a | b | c"•Kleene star: "a*"
Derives•a? = "" | a•a+ = aa*•a{2,3} = aa | aaa
laziness is cool for thisdo you need backtracking?
How to use it?
Cool! but..
uglyuglynot readable
Enhancements•use generators for "laziness"•"combine" function•Scala-style methods•"delay" method
fn.py Stream
[1,2,3,4,5]expr →"[" digit (","digit)* "]"
Interesting! but..
Is it enough?
In Haskell
Can I do this inPython?
… hm
Challengeaccepted!
In Python
How?
Desugaring…
What?
WAT???even more like
unita → Parser a
bindParser a → (a → Parser b) → Parser b
lift(a → b) → (a → Parser b)
liftedParser a → (a → b) → Parser b
WAT???ok, looks cool, but
How to use
And even more..
Haskell-style
Do-notation
(define R 2)(define diameter (lambda (r) (* 2 r)))
Looks nice!
Mutability killsbacktracking :(
And more•errors handling•backtracking control•performance
Links• "funcparselib" http://goo.gl/daidQY• "Monadic parsing in Haskell" http://goo.gl/gygNlM• "Higher-Order functions for Parsing" http://goo.gl/c8VOIZ• "Parsec" http://goo.gl/bdnDZQ• "Parcon" http://goo.gl/CT06S5• "Pyparsing" http://goo.gl/gmr2lQ• "You Could Have Invented Monadic Parsing" http://goo.gl/h0rnOQ
Learn HaskellFor Great Good
Q/Athanks for your attention,