Let’s writea parser!DENIS DEFREYNE / SOUNDCLOUD, BERLIN / MAY 17TH, 2016
View Slide
1. Language2
I am Denis.3
But how do you know that I am Denis?4
But how do you know that I am Denis?I told you. I wrote it down. You’veprobably seen me before. Etc.5
But how do you know that I am Denis?You understand English.6
Computers are stupid.7
8$ git commit --message="Fix bugs"
9def greet(name)puts "Hello, #{name}"end
10def greet(name: String): Unit = {println(s"Hello, $name!")}
Text forms a language,but computers don’t know that.11
2. Parsing12
Basic idea:13Parser objects that are small,composable, and purely functional.
14def read(input, pos)
15def read(input, pos)Success.new(pos + 1)end
16def read(input, pos)Failure.new(pos)end
17char("H")Succeeds if the next characteris the given one.
18char("H").apply("Hello")
18H e l l ochar("H").apply("Hello")
18H e l l o0 1 2 3 4char("H").apply("Hello")
18H e l l o0 1 2 3 4char("H").apply("Hello")Success(pos = 1)
19char("H").apply("Adiós")
19A d i ó s0 1 2 3 4char("H").apply("Adiós")
Failure(pos = 0)19A d i ó s0 1 2 3 4char("H").apply("Adiós")
20if input[pos] == @charSuccess.new(pos + 1)elseFailure.new(pos)end
21seq(a, b)Succeeds if both given parserssucceed in sequence.
22seq(char("H"), char("e")).apply("Hello")
H e l l o220 1 2 3 4seq(char("H"), char("e")).apply("Hello")
H e l l o220 1 2 3 4seq(char("H"), char("e")).apply("Hello")Success(pos = 2)
23seq(char("H"),char("e"),char("l"),char("l"),char("o"),)
24string(s)Succeeds if all charactersin the given stringcan be read in sequence.
H e l l o250 1 2 3 4string("Hello").apply("Hello")
H e l l o250 1 2 3 4string("Hello").apply("Hello")Success(pos = 5)
26eof()Succeeds at the end of input;fails otherwise.
H e l l o270 1 2 3 4seq(string("Hello"), eof).apply("Hello")
H e l l o270 1 2 3 4seq(string("Hello"), eof).apply("Hello")Success(pos = 5)
280 1 2 3 4 5H e l l o !seq(string("Hello"), eof).apply("Hello!")
280 1 2 3 4 5Failure(pos = 5)H e l l o !seq(string("Hello"), eof).apply("Hello!")
29alt(a, b)Succeeds if either of thegiven parsers succeed.
A d i ó s300 1 2 3 4alt(char("H"), char("A")).apply("Adiós")
A d i ó s300 1 2 3 4alt(char("H"), char("A")).apply("Adiós")Success(pos = 1)
31whitespace_char =alt(char(" "),char("\t"),char("\r"),char("\n"),)
32opt(p)Succeeds always, but onlyadvances if p succeeds.
33repeat(p)Succeeds always, and attemptsto apply p as often as possible.
34repeat(whitespace_char)
35intersperse(a, b)Alternates between a and b.,always ending with a.
36intersperse(char("a"), char(",")).apply("a,a,b")
a , a , b360 1 2 3 4intersperse(char("a"), char(",")).apply("a,a,b")
a , a , b360 1 2 3 4intersperse(char("a"), char(",")).apply("a,a,b")Success(pos = 3)
37etc.
3. Examples38
39720629530
40digit =alt(*('0'..'9').map { |c| char(c) })
41digit = char_in('0'..'9')
42digit = char_in('0'..'9')nat_number =seq(digit, repeat(digit))
43digit = char_in('0'..'9')nat_number =repeat1(digit)
44digit = char_in('0'..'9') nat_number =repeat1(digit).capture
44digit = char_in('0'..'9') nat_number =repeat1(digit).capture Success(pos = 3, data = "720")
45def read(input, pos)
46def read(input, pos)Success.new(pos + 1)end
47def read(input, pos)Success.new(pos + 1, "blahblah")end
48dec_number =seq(nat_number,char('.'),nat_number,)
49Horan,Niall,93Payne,Liam,93Tomlinson,Louis,91Styles,Harry,94Malik,Zayn,93
50field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file = seq( line.intersperse(char("\n")), eof(), )
51Horan,Niall,93Payne,Liam,93Tomlinson,Louis,91Styles,Harry,94Malik,Zayn,93
52[["Horan", "Niall", 93],["Payne", "Liam", 93],["Tomlinson", "Louis", 91],["Styles", "Harry", 94],["Malik", "Zayn", 93],]
53add(1, mul(2, 3))sub(5, 4)
54lparen = char('(')rparen = char(')')comma = char(',')
55expr =alt(lazy { funcall }, nat_number)
56funcall =seq(identifier,lparen,arg_list,rparen,)
57letter = char_in('a'..'z') identifier = repeat1(letter)
58arg_list = intersperse(expr,seq(comma, whitespace),)
59arg_list = opt( intersperse(expr,seq(comma, whitespace),) )
60
60expr_list = intersperse(expr, char("\n"))
60expr_list = intersperse(expr, char("\n")) program = seq(expr_list, eof)
61add(1, mul(2, 3))sub(5, 4)
62Success(pos = 27)
Where’s the data!!!63
64funcall =seq(identifier,lparen,arg_list,rparen,)
65funcall =seq(identifier.capture,lparen,arg_list,rparen,)
66funcall =seq(identifier.capture,lparen,arg_list,rparen,).map do |data|# stuff hereend
67funcall =seq(identifier.capture,lparen,arg_list,rparen,).map do |data|FunCall.new(data[0], data[2])end
68add(1, mul(2, 3))sub(5, 4)
69[FunCall.new("add", [1,FunCall.new("mul", [2, 3]),]),FunCall.new("sub", [5, 4]),]
And that is how you can write a parser. 70
And that is how you can write a parserusing parser combinators.71
72ḌPARSE
72ḌPARSEA GOOD PARSER LIBRARY FOR RUBY
github.com/ddfreyne/d-parse73
74
74require 'd-parse'
74require 'd-parse' module JSONGrammar
74require 'd-parse' module JSONGrammarextend DParse::DSL
74require 'd-parse' module JSONGrammarextend DParse::DSL DIGIT = char_in('0'..'9') NUMBER = repeat1(DIGIT) end
74require 'd-parse' module JSONGrammarextend DParse::DSL DIGIT = char_in('0'..'9') NUMBER = repeat1(DIGIT) end res = Grammar::NUMBER.apply('8700')
75
75case res
75case reswhen DParse::Success puts(res.data.inspect)
75case reswhen DParse::Success puts(res.data.inspect)when DParse::Failure $stderr.puts res.pretty_message exit(1) end
76 expected identifier at line 1, column 36def reticulate(splines, threshold, ) {↑
77github.com/ddfreyne/d-parse
77github.com/ddfreyne/d-parsePRE-ALPHA! BE ANEARLY ADOPTER!
78My name is Denis.Ready to parse your questions.Find me at [email protected], or @denis on Slack.