Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Let’s write a parser! [SoundCloud HQ edition]
Search
Denis Defreyne
May 17, 2016
Programming
0
230
Let’s write a parser! [SoundCloud HQ edition]
Denis Defreyne
May 17, 2016
Tweet
Share
More Decks by Denis Defreyne
See All by Denis Defreyne
The importance of naming
ddfreyne
0
76
An introduction to fibers
ddfreyne
0
190
Code as data (RubyConfBY 2019 edition)
ddfreyne
0
120
Code as data
ddfreyne
0
170
How to memoize
ddfreyne
0
180
Clean & fast code with enumerators
ddfreyne
0
130
Fibers
ddfreyne
0
460
Let’s create a programming language! [SoundCloud HQ edition]
ddfreyne
0
210
Let’s create a programming language! [RUG::B edition]
ddfreyne
1
220
Other Decks in Programming
See All in Programming
Django NinjaによるAPI開発の効率化とリプレースの実践
kashewnuts
1
270
ABEMA iOS 大規模プロジェクトにおける段階的な技術刷新 / ABEMA iOS Technology Upgrade
akkyie
1
190
color-scheme: light dark; を完全に理解する
uhyo
7
490
Boost Performance and Developer Productivity with Jakarta EE 11
ivargrimstad
0
860
責務と認知負荷を整える! 抽象レベルを意識した関心の分離
yahiru
8
1.3k
From the Wild into the Clouds - Laravel Meetup Talk
neverything
0
160
Amazon Q Developer Proで効率化するAPI開発入門
seike460
PRO
0
130
ML.NETで始める機械学習
ymd65536
0
230
Jakarta EE meets AI
ivargrimstad
0
410
Learning Kotlin with detekt
inouehi
1
140
CDK開発におけるコーディング規約の運用
yamanashi_ren01
2
260
Flutter × Firebase Genkit で加速する生成 AI アプリ開発
coborinai
0
170
Featured
See All Featured
Designing Experiences People Love
moore
140
23k
Bootstrapping a Software Product
garrettdimon
PRO
306
110k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
59k
GraphQLとの向き合い方2022年版
quramy
44
14k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
12
990
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.4k
Bash Introduction
62gerente
611
210k
RailsConf 2023
tenderlove
29
1k
The Invisible Side of Design
smashingmag
299
50k
For a Future-Friendly Web
brad_frost
176
9.6k
Faster Mobile Websites
deanohume
306
31k
Scaling GitHub
holman
459
140k
Transcript
Let’s write a parser! DENIS DEFREYNE / SOUNDCLOUD, BERLIN /
MAY 17TH, 2016
1. Language 2
I am Denis. 3
But how do you know that I am Denis? 4
But how do you know that I am Denis? I
told you. I wrote it down. You’ve probably seen me before. Etc. 5
But how do you know that I am Denis? You
understand English. 6
Computers are stupid. 7
8 $ git commit --message="Fix bugs"
9 def greet(name) puts "Hello, #{name}" end
10 def greet(name: String): Unit = { println(s"Hello, $name!") }
Text forms a language, but computers don’t know that. 11
2. Parsing 12
Basic idea: 13 Parser objects that are small, composable, and
purely functional.
14 def read(input, pos)
15 def read(input, pos) Success.new(pos + 1) end
16 def read(input, pos) Failure.new(pos) end
17 char("H") Succeeds if the next character is the given
one.
18 char("H").apply("Hello")
18 H e l l o char("H").apply("Hello")
18 H e l l o 0 1 2 3
4 char("H").apply("Hello")
18 H e l l o 0 1 2 3
4 char("H").apply("Hello")
18 H e l l o 0 1 2 3
4 char("H").apply("Hello")
18 H e l l o 0 1 2 3
4 char("H").apply("Hello") Success(pos = 1)
19 char("H").apply("Adiós")
19 A d i ó s 0 1 2 3
4 char("H").apply("Adiós")
19 A d i ó s 0 1 2 3
4 char("H").apply("Adiós")
19 A d i ó s 0 1 2 3
4 char("H").apply("Adiós")
Failure(pos = 0) 19 A d i ó s 0
1 2 3 4 char("H").apply("Adiós")
20 if input[pos] == @char Success.new(pos + 1) else Failure.new(pos)
end
21 seq(a, b) Succeeds if both given parsers succeed in
sequence.
22 seq(char("H"), char("e")).apply("Hello")
H e l l o 22 0 1 2 3
4 seq(char("H"), char("e")).apply("Hello")
H e l l o 22 0 1 2 3
4 seq(char("H"), char("e")).apply("Hello")
H e l l o 22 0 1 2 3
4 seq(char("H"), char("e")).apply("Hello")
H e l l o 22 0 1 2 3
4 seq(char("H"), char("e")).apply("Hello")
H e l l o 22 0 1 2 3
4 seq(char("H"), char("e")).apply("Hello") Success(pos = 2)
23 seq( char("H"), char("e"), char("l"), char("l"), char("o"), )
24 string(s) Succeeds if all characters in the given string
can be read in sequence.
H e l l o 25 0 1 2 3
4 string("Hello").apply("Hello")
H e l l o 25 0 1 2 3
4 string("Hello").apply("Hello")
H e l l o 25 0 1 2 3
4 string("Hello").apply("Hello")
H e l l o 25 0 1 2 3
4 string("Hello").apply("Hello") Success(pos = 5)
26 eof() Succeeds at the end of input; fails otherwise.
H e l l o 27 0 1 2 3
4 seq(string("Hello"), eof).apply("Hello")
H e l l o 27 0 1 2 3
4 seq(string("Hello"), eof).apply("Hello")
H e l l o 27 0 1 2 3
4 seq(string("Hello"), eof).apply("Hello")
H e l l o 27 0 1 2 3
4 seq(string("Hello"), eof).apply("Hello")
H e l l o 27 0 1 2 3
4 seq(string("Hello"), eof).apply("Hello") Success(pos = 5)
28 0 1 2 3 4 5 H e l
l o ! seq(string("Hello"), eof).apply("Hello!")
28 0 1 2 3 4 5 H e l
l o ! seq(string("Hello"), eof).apply("Hello!")
28 0 1 2 3 4 5 H e l
l o ! seq(string("Hello"), eof).apply("Hello!")
28 0 1 2 3 4 5 H e l
l o ! seq(string("Hello"), eof).apply("Hello!")
28 0 1 2 3 4 5 Failure(pos = 5)
H e l l o ! seq(string("Hello"), eof).apply("Hello!")
29 alt(a, b) Succeeds if either of the given parsers
succeed.
A d i ó s 30 0 1 2 3
4 alt(char("H"), char("A")).apply("Adiós")
A d i ó s 30 0 1 2 3
4 alt(char("H"), char("A")).apply("Adiós")
A d i ó s 30 0 1 2 3
4 alt(char("H"), char("A")).apply("Adiós")
A d i ó s 30 0 1 2 3
4 alt(char("H"), char("A")).apply("Adiós") Success(pos = 1)
31 whitespace_char = alt( char(" "), char("\t"), char("\r"), char("\n"), )
32 opt(p) Succeeds always, but only advances if p succeeds.
33 repeat(p) Succeeds always, and attempts to apply p as
often as possible.
34 repeat(whitespace_char)
35 intersperse(a, b) Alternates between a and b., always ending
with a.
36 intersperse(char("a"), char(",")).apply("a,a,b")
a , a , b 36 0 1 2 3
4 intersperse(char("a"), char(",")).apply("a,a,b")
a , a , b 36 0 1 2 3
4 intersperse(char("a"), char(",")).apply("a,a,b")
a , a , b 36 0 1 2 3
4 intersperse(char("a"), char(",")).apply("a,a,b")
a , a , b 36 0 1 2 3
4 intersperse(char("a"), char(",")).apply("a,a,b")
a , a , b 36 0 1 2 3
4 intersperse(char("a"), char(",")).apply("a,a,b") Success(pos = 3)
37 etc.
3. Examples 38
39 720 6 29530
40 digit = alt( *('0'..'9') .map { |c| char(c) }
)
41 digit = char_in('0'..'9')
42 digit = char_in('0'..'9') nat_number = seq(digit, repeat(digit))
43 digit = char_in('0'..'9') nat_number = repeat1(digit)
44 digit = char_in('0'..'9') nat_number = repeat1(digit) .capture
44 digit = char_in('0'..'9') nat_number = repeat1(digit) .capture
Success(pos = 3, data = "720")
45 def read(input, pos)
46 def read(input, pos) Success.new(pos + 1) end
47 def read(input, pos) Success.new(pos + 1, "blahblah") end
48 dec_number = seq( nat_number, char('.'), nat_number, )
49 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file
= seq( line.intersperse(char("\n")), eof(), )
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file
= seq( line.intersperse(char("\n")), eof(), )
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file
= seq( line.intersperse(char("\n")), eof(), )
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file
= seq( line.intersperse(char("\n")), eof(), )
51 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93
52 [ ["Horan", "Niall", 93], ["Payne", "Liam", 93], ["Tomlinson", "Louis",
91], ["Styles", "Harry", 94], ["Malik", "Zayn", 93], ]
53 add(1, mul(2, 3)) sub(5, 4)
54 lparen = char('(') rparen = char(')') comma = char(',')
55 expr = alt(lazy { funcall }, nat_number)
56 funcall = seq( identifier, lparen, arg_list, rparen, )
57 letter = char_in('a'..'z') identifier = repeat1(letter)
58 arg_list = intersperse( expr, seq(comma, whitespace), )
59 arg_list = opt( intersperse( expr, seq(comma, whitespace), ) )
60
60 expr_list = intersperse(expr, char("\n"))
60 expr_list = intersperse(expr, char("\n")) program = seq(expr_list, eof)
61 add(1, mul(2, 3)) sub(5, 4)
62 Success(pos = 27)
Where’s the data!!! 63
64 funcall = seq( identifier, lparen, arg_list, rparen, )
65 funcall = seq( identifier.capture, lparen, arg_list, rparen, )
66 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do
|data| # stuff here end
67 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do
|data| FunCall.new(data[0], data[2]) end
68 add(1, mul(2, 3)) sub(5, 4)
69 [ FunCall.new("add", [ 1, FunCall.new("mul", [2, 3]), ]), FunCall.new("sub",
[5, 4]), ]
And that is how you can write a parser. 70
And that is how you can write a parser using
parser combinators. 71
72 ḌPARSE
72 ḌPARSE A GOOD PARSER LIBRARY FOR RUBY
github.com/ddfreyne/d-parse 73
github.com/ddfreyne/d-parse 73
github.com/ddfreyne/d-parse 73
74
74 require 'd-parse'
74 require 'd-parse' module JSONGrammar
74 require 'd-parse' module JSONGrammar extend DParse::DSL
74 require 'd-parse' module JSONGrammar extend DParse::DSL DIGIT = char_in('0'..'9')
NUMBER = repeat1(DIGIT) end
74 require 'd-parse' module JSONGrammar extend DParse::DSL DIGIT = char_in('0'..'9')
NUMBER = repeat1(DIGIT) end res = Grammar::NUMBER.apply('8700')
75
75 case res
75 case res when DParse::Success puts(res.data.inspect)
75 case res when DParse::Success puts(res.data.inspect) when DParse::Failure $stderr.puts res.pretty_message
exit(1) end
76 expected identifier at line 1, column 36 def
reticulate(splines, threshold, ) { ↑
77 github.com/ddfreyne/d-parse
77 github.com/ddfreyne/d-parse PRE- ALPHA! BE AN EARLY ADOPTER!
78 My name is Denis. Ready to parse your questions.
Find me at denis@soundcloud.com, or @denis on Slack.