Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Let’s write a parser! DENIS DEFREYNE / SOUNDCLOUD, BERLIN / MAY 17TH, 2016
Slide 2
Slide 2 text
1. Language 2
Slide 3
Slide 3 text
I am Denis. 3
Slide 4
Slide 4 text
But how do you know that I am Denis? 4
Slide 5
Slide 5 text
But how do you know that I am Denis? I told you. I wrote it down. You’ve probably seen me before. Etc. 5
Slide 6
Slide 6 text
But how do you know that I am Denis? You understand English. 6
Slide 7
Slide 7 text
Computers are stupid. 7
Slide 8
Slide 8 text
8 $ git commit --message="Fix bugs"
Slide 9
Slide 9 text
9 def greet(name) puts "Hello, #{name}" end
Slide 10
Slide 10 text
10 def greet(name: String): Unit = { println(s"Hello, $name!") }
Slide 11
Slide 11 text
Text forms a language, but computers don’t know that. 11
Slide 12
Slide 12 text
2. Parsing 12
Slide 13
Slide 13 text
Basic idea: 13 Parser objects that are small, composable, and purely functional.
Slide 14
Slide 14 text
14 def read(input, pos)
Slide 15
Slide 15 text
15 def read(input, pos) Success.new(pos + 1) end
Slide 16
Slide 16 text
16 def read(input, pos) Failure.new(pos) end
Slide 17
Slide 17 text
17 char("H") Succeeds if the next character is the given one.
Slide 18
Slide 18 text
18 char("H").apply("Hello")
Slide 19
Slide 19 text
18 H e l l o char("H").apply("Hello")
Slide 20
Slide 20 text
18 H e l l o 0 1 2 3 4 char("H").apply("Hello")
Slide 21
Slide 21 text
18 H e l l o 0 1 2 3 4 char("H").apply("Hello")
Slide 22
Slide 22 text
18 H e l l o 0 1 2 3 4 char("H").apply("Hello")
Slide 23
Slide 23 text
18 H e l l o 0 1 2 3 4 char("H").apply("Hello") Success(pos = 1)
Slide 24
Slide 24 text
19 char("H").apply("Adiós")
Slide 25
Slide 25 text
19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 26
Slide 26 text
19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 27
Slide 27 text
19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 28
Slide 28 text
Failure(pos = 0) 19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 29
Slide 29 text
20 if input[pos] == @char Success.new(pos + 1) else Failure.new(pos) end
Slide 30
Slide 30 text
21 seq(a, b) Succeeds if both given parsers succeed in sequence.
Slide 31
Slide 31 text
22 seq(char("H"), char("e")).apply("Hello")
Slide 32
Slide 32 text
H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 33
Slide 33 text
H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 34
Slide 34 text
H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 35
Slide 35 text
H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 36
Slide 36 text
H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello") Success(pos = 2)
Slide 37
Slide 37 text
23 seq( char("H"), char("e"), char("l"), char("l"), char("o"), )
Slide 38
Slide 38 text
24 string(s) Succeeds if all characters in the given string can be read in sequence.
Slide 39
Slide 39 text
H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello")
Slide 40
Slide 40 text
H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello")
Slide 41
Slide 41 text
H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello")
Slide 42
Slide 42 text
H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello") Success(pos = 5)
Slide 43
Slide 43 text
26 eof() Succeeds at the end of input; fails otherwise.
Slide 44
Slide 44 text
H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 45
Slide 45 text
H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 46
Slide 46 text
H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 47
Slide 47 text
H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 48
Slide 48 text
H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello") Success(pos = 5)
Slide 49
Slide 49 text
28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 50
Slide 50 text
28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 51
Slide 51 text
28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 52
Slide 52 text
28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 53
Slide 53 text
28 0 1 2 3 4 5 Failure(pos = 5) H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 54
Slide 54 text
29 alt(a, b) Succeeds if either of the given parsers succeed.
Slide 55
Slide 55 text
A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")
Slide 56
Slide 56 text
A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")
Slide 57
Slide 57 text
A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")
Slide 58
Slide 58 text
A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós") Success(pos = 1)
Slide 59
Slide 59 text
31 whitespace_char = alt( char(" "), char("\t"), char("\r"), char("\n"), )
Slide 60
Slide 60 text
32 opt(p) Succeeds always, but only advances if p succeeds.
Slide 61
Slide 61 text
33 repeat(p) Succeeds always, and attempts to apply p as often as possible.
Slide 62
Slide 62 text
34 repeat(whitespace_char)
Slide 63
Slide 63 text
35 intersperse(a, b) Alternates between a and b., always ending with a.
Slide 64
Slide 64 text
36 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 65
Slide 65 text
a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 66
Slide 66 text
a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 67
Slide 67 text
a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 68
Slide 68 text
a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 69
Slide 69 text
a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b") Success(pos = 3)
Slide 70
Slide 70 text
37 etc.
Slide 71
Slide 71 text
3. Examples 38
Slide 72
Slide 72 text
39 720 6 29530
Slide 73
Slide 73 text
40 digit = alt( *('0'..'9') .map { |c| char(c) } )
Slide 74
Slide 74 text
41 digit = char_in('0'..'9')
Slide 75
Slide 75 text
42 digit = char_in('0'..'9') nat_number = seq(digit, repeat(digit))
Slide 76
Slide 76 text
43 digit = char_in('0'..'9') nat_number = repeat1(digit)
Slide 77
Slide 77 text
44 digit = char_in('0'..'9') nat_number = repeat1(digit) .capture
Slide 78
Slide 78 text
44 digit = char_in('0'..'9') nat_number = repeat1(digit) .capture Success(pos = 3, data = "720")
Slide 79
Slide 79 text
45 def read(input, pos)
Slide 80
Slide 80 text
46 def read(input, pos) Success.new(pos + 1) end
Slide 81
Slide 81 text
47 def read(input, pos) Success.new(pos + 1, "blahblah") end
Slide 82
Slide 82 text
48 dec_number = seq( nat_number, char('.'), nat_number, )
Slide 83
Slide 83 text
49 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93
Slide 84
Slide 84 text
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file = seq( line.intersperse(char("\n")), eof(), )
Slide 85
Slide 85 text
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file = seq( line.intersperse(char("\n")), eof(), )
Slide 86
Slide 86 text
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file = seq( line.intersperse(char("\n")), eof(), )
Slide 87
Slide 87 text
50 field = repeat(char_not_in(',', "\n")) line = intersperse(field, char(',')) file = seq( line.intersperse(char("\n")), eof(), )
Slide 88
Slide 88 text
51 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93
Slide 89
Slide 89 text
52 [ ["Horan", "Niall", 93], ["Payne", "Liam", 93], ["Tomlinson", "Louis", 91], ["Styles", "Harry", 94], ["Malik", "Zayn", 93], ]
Slide 90
Slide 90 text
53 add(1, mul(2, 3)) sub(5, 4)
Slide 91
Slide 91 text
54 lparen = char('(') rparen = char(')') comma = char(',')
Slide 92
Slide 92 text
55 expr = alt(lazy { funcall }, nat_number)
Slide 93
Slide 93 text
56 funcall = seq( identifier, lparen, arg_list, rparen, )
Slide 94
Slide 94 text
57 letter = char_in('a'..'z') identifier = repeat1(letter)
Slide 95
Slide 95 text
58 arg_list = intersperse( expr, seq(comma, whitespace), )
Slide 96
Slide 96 text
59 arg_list = opt( intersperse( expr, seq(comma, whitespace), ) )
Slide 97
Slide 97 text
60
Slide 98
Slide 98 text
60 expr_list = intersperse(expr, char("\n"))
Slide 99
Slide 99 text
60 expr_list = intersperse(expr, char("\n")) program = seq(expr_list, eof)
Slide 100
Slide 100 text
61 add(1, mul(2, 3)) sub(5, 4)
Slide 101
Slide 101 text
62 Success(pos = 27)
Slide 102
Slide 102 text
Where’s the data!!! 63
Slide 103
Slide 103 text
64 funcall = seq( identifier, lparen, arg_list, rparen, )
Slide 104
Slide 104 text
65 funcall = seq( identifier.capture, lparen, arg_list, rparen, )
Slide 105
Slide 105 text
66 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do |data| # stuff here end
Slide 106
Slide 106 text
67 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do |data| FunCall.new(data[0], data[2]) end
Slide 107
Slide 107 text
68 add(1, mul(2, 3)) sub(5, 4)
Slide 108
Slide 108 text
69 [ FunCall.new("add", [ 1, FunCall.new("mul", [2, 3]), ]), FunCall.new("sub", [5, 4]), ]
Slide 109
Slide 109 text
And that is how you can write a parser. 70
Slide 110
Slide 110 text
And that is how you can write a parser using parser combinators. 71
Slide 111
Slide 111 text
72 ḌPARSE
Slide 112
Slide 112 text
72 ḌPARSE A GOOD PARSER LIBRARY FOR RUBY
Slide 113
Slide 113 text
github.com/ddfreyne/d-parse 73
Slide 114
Slide 114 text
github.com/ddfreyne/d-parse 73
Slide 115
Slide 115 text
github.com/ddfreyne/d-parse 73
Slide 116
Slide 116 text
74
Slide 117
Slide 117 text
74 require 'd-parse'
Slide 118
Slide 118 text
74 require 'd-parse' module JSONGrammar
Slide 119
Slide 119 text
74 require 'd-parse' module JSONGrammar extend DParse::DSL
Slide 120
Slide 120 text
74 require 'd-parse' module JSONGrammar extend DParse::DSL DIGIT = char_in('0'..'9') NUMBER = repeat1(DIGIT) end
Slide 121
Slide 121 text
74 require 'd-parse' module JSONGrammar extend DParse::DSL DIGIT = char_in('0'..'9') NUMBER = repeat1(DIGIT) end res = Grammar::NUMBER.apply('8700')
Slide 122
Slide 122 text
75
Slide 123
Slide 123 text
75 case res
Slide 124
Slide 124 text
75 case res when DParse::Success puts(res.data.inspect)
Slide 125
Slide 125 text
75 case res when DParse::Success puts(res.data.inspect) when DParse::Failure $stderr.puts res.pretty_message exit(1) end
Slide 126
Slide 126 text
76 expected identifier at line 1, column 36 def reticulate(splines, threshold, ) { ↑
Slide 127
Slide 127 text
77 github.com/ddfreyne/d-parse
Slide 128
Slide 128 text
77 github.com/ddfreyne/d-parse PRE- ALPHA! BE AN EARLY ADOPTER!
Slide 129
Slide 129 text
78 My name is Denis. Ready to parse your questions. Find me at
[email protected]
, or @denis on Slack.