Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
Let’s write a parser! DENIS DEFREYNE / RUG˸˸B / MAY 12TH, 2016
Slide 2
Slide 2 text
1. Language 2
Slide 3
Slide 3 text
I am Denis. 3
Slide 4
Slide 4 text
But how do you know that I am Denis? 4
Slide 5
Slide 5 text
But how do you know that I am Denis? I told you. I wrote it down. Tobi introduced me. You might have seen me before. Etc. 5
Slide 6
Slide 6 text
But how do you know that I am Denis? You understand English. 6
Slide 7
Slide 7 text
Computers are stupid. 7
Slide 8
Slide 8 text
8 $ git commit --message="Fix bugs"
Slide 9
Slide 9 text
9 def greet(name) puts "Hello, #{name}" end
Slide 10
Slide 10 text
Text forms a language, but computers don’t know that. 10
Slide 11
Slide 11 text
2. Parsing 11
Slide 12
Slide 12 text
Basic idea: 12 Parser objects that are small, composable, and purely functional.
Slide 13
Slide 13 text
13 def read(input, pos)
Slide 14
Slide 14 text
14 def read(input, pos) Success.new(pos + 1) end
Slide 15
Slide 15 text
15 def read(input, pos) Failure.new(pos) end
Slide 16
Slide 16 text
16 char("H") Succeeds if the next character is the given one.
Slide 17
Slide 17 text
17 char("H").apply("Hello")
Slide 18
Slide 18 text
17 H e l l o char("H").apply("Hello")
Slide 19
Slide 19 text
17 H e l l o 0 1 2 3 4 char("H").apply("Hello")
Slide 20
Slide 20 text
17 H e l l o 0 1 2 3 4 char("H").apply("Hello")
Slide 21
Slide 21 text
17 H e l l o 0 1 2 3 4 char("H").apply("Hello")
Slide 22
Slide 22 text
17 H e l l o 0 1 2 3 4 char("H").apply("Hello") Success(position = 1)
Slide 23
Slide 23 text
18 char("H").apply("Adiós")
Slide 24
Slide 24 text
18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 25
Slide 25 text
18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 26
Slide 26 text
18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 27
Slide 27 text
Failure(position = 0) 18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")
Slide 28
Slide 28 text
19 if input[pos] == @char Success.new(pos + 1) else Failure.new(pos) end
Slide 29
Slide 29 text
20 seq(a, b) Succeeds if both given parsers succeed in sequence.
Slide 30
Slide 30 text
21 seq(char("H"), char("e")).apply("Hello")
Slide 31
Slide 31 text
H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 32
Slide 32 text
H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 33
Slide 33 text
H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 34
Slide 34 text
H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")
Slide 35
Slide 35 text
H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello") Success(position = 2)
Slide 36
Slide 36 text
22 seq( char("H"), char("e"), char("l"), char("l"), char("o"), )
Slide 37
Slide 37 text
23 string(s) Succeeds if all characters in the given string can be read in sequence.
Slide 38
Slide 38 text
H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello")
Slide 39
Slide 39 text
H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello")
Slide 40
Slide 40 text
H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello")
Slide 41
Slide 41 text
H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello") Success(position = 5)
Slide 42
Slide 42 text
25 eof() Succeeds at the end of input; fails otherwise.
Slide 43
Slide 43 text
H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 44
Slide 44 text
H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 45
Slide 45 text
H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 46
Slide 46 text
H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")
Slide 47
Slide 47 text
H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello") Success(position = 5)
Slide 48
Slide 48 text
27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 49
Slide 49 text
27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 50
Slide 50 text
27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 51
Slide 51 text
27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 52
Slide 52 text
27 0 1 2 3 4 5 Failure(position = 5) H e l l o ! seq(string("Hello"), eof).apply("Hello!")
Slide 53
Slide 53 text
28 alt(a, b) Succeeds if either of the given parsers succeed.
Slide 54
Slide 54 text
A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")
Slide 55
Slide 55 text
A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")
Slide 56
Slide 56 text
A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")
Slide 57
Slide 57 text
A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós") Success(position = 1)
Slide 58
Slide 58 text
30 whitespace_char = alt(char(" "), char("\t"))
Slide 59
Slide 59 text
31 optional(p) Succeeds always, but only advances if p succeeds.
Slide 60
Slide 60 text
32 repeat(p) Succeeds always, and attempts to apply p as often as possible.
Slide 61
Slide 61 text
33 repeat(whitespace_char)
Slide 62
Slide 62 text
34 intersperse(a, b) Alternates between a and b., always ending with a.
Slide 63
Slide 63 text
35 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 64
Slide 64 text
a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 65
Slide 65 text
a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 66
Slide 66 text
a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 67
Slide 67 text
a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")
Slide 68
Slide 68 text
a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b") Success(position = 3)
Slide 69
Slide 69 text
36 etc.
Slide 70
Slide 70 text
37 720 6 29530
Slide 71
Slide 71 text
38 digit = alt( *('0'..'9') .map { |c| char(c) } )
Slide 72
Slide 72 text
39 digit = char_in('0'..'9')
Slide 73
Slide 73 text
40 nat_number = seq(digit, repeat(digit))
Slide 74
Slide 74 text
41 nat_number = repeat1(digit)
Slide 75
Slide 75 text
42 nat_number = repeat1(digit) .capture
Slide 76
Slide 76 text
42 nat_number = repeat1(digit) .capture Success(position = 3, data = "123")
Slide 77
Slide 77 text
43 nat_number = repeat1(digit) .capture .map(&:to_i)
Slide 78
Slide 78 text
43 nat_number = repeat1(digit) .capture .map(&:to_i) Success(position = 3, data = 123)
Slide 79
Slide 79 text
44 def read(input, pos)
Slide 80
Slide 80 text
45 def read(input, pos) Success.new(pos + 1) end
Slide 81
Slide 81 text
46 def read(input, pos) Success.new(pos + 1, "blahblah") end
Slide 82
Slide 82 text
47 first,last,age Denis,Defreyne,29
Slide 83
Slide 83 text
48
Slide 84
Slide 84 text
48 field = repeat(char_not(',', "\n")).capture
Slide 85
Slide 85 text
48 field = repeat(char_not(',', "\n")).capture line = field.intersperse(char(','))
Slide 86
Slide 86 text
48 field = repeat(char_not(',', "\n")).capture line = field.intersperse(char(',')) file = seq( line.intersperse(char("\n")), end_of_input, )
Slide 87
Slide 87 text
49 [ ["first_name", "last_name", "age"], ["Denis", "Defreyne", "29"], ]
Slide 88
Slide 88 text
50 add(1, mul(2, 3)) mul(2, 3)
Slide 89
Slide 89 text
51 lparen = char('(') rparen = char(')') comma = char(',')
Slide 90
Slide 90 text
52 expr = alt(lazy { funcall }, nat_number)
Slide 91
Slide 91 text
53 funcall = seq( identifier, lparen, arglist, rparen, )
Slide 92
Slide 92 text
54 letter = char_in('a'..'z') identifier = repeat1(letter).capture
Slide 93
Slide 93 text
55
Slide 94
Slide 94 text
55 arglist = seq(expr, arglist_tail)
Slide 95
Slide 95 text
55 arglist = seq(expr, arglist_tail) arglist_tail = repeat(seq(comma, whitespace, expr))
Slide 96
Slide 96 text
56
Slide 97
Slide 97 text
56 expr_list = expr.intersperse(char("\n"))
Slide 98
Slide 98 text
56 expr_list = expr.intersperse(char("\n")) program = seq(expr_list, eof)
Slide 99
Slide 99 text
57 [ ["add", 1, ["mul", 2, 3]], ["mul", 2, 3], ]
Slide 100
Slide 100 text
And that is how you can write a parser. 58
Slide 101
Slide 101 text
github.com/ddfreyne/d-parse 59
Slide 102
Slide 102 text
github.com/ddfreyne/d-parse 59
Slide 103
Slide 103 text
github.com/ddfreyne/d-parse 59
Slide 104
Slide 104 text
60 My name is Denis Defreyne. Ready to parse your questions. Find me at
[email protected]
, or @ddfreyne on Twitter.