Slide 1

Slide 1 text

Let’s write a parser! DENIS DEFREYNE / SOUNDCLOUD, BERLIN / MAY 17TH, 2016

Slide 2

Slide 2 text

1. Language 2

Slide 3

Slide 3 text

I am Denis. 3

Slide 4

Slide 4 text

But how do you know that I am Denis? 4

Slide 5

Slide 5 text

But how do you know that I am Denis? I told you. I wrote it down. You’ve probably seen me before. Etc. 5

Slide 6

Slide 6 text

But how do you know that I am Denis? You understand English. 6

Slide 7

Slide 7 text

Computers are stupid. 7

Slide 8

Slide 8 text

8 $ git commit --message="Fix bugs"

Slide 9

Slide 9 text

9 def greet(name) puts "Hello, #{name}" end

Slide 10

Slide 10 text

10 def greet(name: String): Unit = { println(s"Hello, $name!") }

Slide 11

Slide 11 text

Text forms a language, but computers don’t know that. 11

Slide 12

Slide 12 text

2. Parsing 12

Slide 13

Slide 13 text

Basic idea: 13 Parser objects that are small, composable, and purely functional.

Slide 14

Slide 14 text

14 def read(input, pos)

Slide 15

Slide 15 text

15 def read(input, pos) + 1) end

Slide 16

Slide 16 text

16 def read(input, pos) end

Slide 17

Slide 17 text

17 char("H") Succeeds if the next character is the given one.

Slide 18

Slide 18 text

18 char("H").apply("Hello")

Slide 19

Slide 19 text

18 H e l l o char("H").apply("Hello")

Slide 20

Slide 20 text

18 H e l l o 0 1 2 3 4 char("H").apply("Hello")

Slide 21

Slide 21 text

18 H e l l o 0 1 2 3 4 char("H").apply("Hello")

Slide 22

Slide 22 text

18 H e l l o 0 1 2 3 4 char("H").apply("Hello")

Slide 23

Slide 23 text

18 H e l l o 0 1 2 3 4 char("H").apply("Hello") Success(pos = 1)

Slide 24

Slide 24 text

19 char("H").apply("Adiós")

Slide 25

Slide 25 text

19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 26

Slide 26 text

19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 27

Slide 27 text

19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 28

Slide 28 text

Failure(pos = 0) 19 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 29

Slide 29 text

20 if input[pos] == @char + 1) else end

Slide 30

Slide 30 text

21 seq(a, b) Succeeds if both given parsers succeed in sequence.

Slide 31

Slide 31 text

22 seq(char("H"), char("e")).apply("Hello")

Slide 32

Slide 32 text

H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 33

Slide 33 text

H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 34

Slide 34 text

H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 35

Slide 35 text

H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 36

Slide 36 text

H e l l o 22 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello") Success(pos = 2)

Slide 37

Slide 37 text

23 seq( char("H"), char("e"), char("l"), char("l"), char("o"), )

Slide 38

Slide 38 text

24 string(s) Succeeds if all characters in the given string can be read in sequence.

Slide 39

Slide 39 text

H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello")

Slide 40

Slide 40 text

H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello")

Slide 41

Slide 41 text

H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello")

Slide 42

Slide 42 text

H e l l o 25 0 1 2 3 4 string("Hello").apply("Hello") Success(pos = 5)

Slide 43

Slide 43 text

26 eof() Succeeds at the end of input; fails otherwise.

Slide 44

Slide 44 text

H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 45

Slide 45 text

H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 46

Slide 46 text

H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 47

Slide 47 text

H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 48

Slide 48 text

H e l l o 27 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello") Success(pos = 5)

Slide 49

Slide 49 text

28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 50

Slide 50 text

28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 51

Slide 51 text

28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 52

Slide 52 text

28 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 53

Slide 53 text

28 0 1 2 3 4 5 Failure(pos = 5) H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 54

Slide 54 text

29 alt(a, b) Succeeds if either of the given parsers succeed.

Slide 55

Slide 55 text

A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")

Slide 56

Slide 56 text

A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")

Slide 57

Slide 57 text

A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")

Slide 58

Slide 58 text

A d i ó s 30 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós") Success(pos = 1)

Slide 59

Slide 59 text

31 whitespace_char = alt( char(" "), char("\t"), char("\r"), char("\n"), )

Slide 60

Slide 60 text

32 opt(p) Succeeds always, but only advances if p succeeds.

Slide 61

Slide 61 text

33 repeat(p) Succeeds always, and attempts to apply p as often as possible.

Slide 62

Slide 62 text

34 repeat(whitespace_char)

Slide 63

Slide 63 text

35 intersperse(a, b) Alternates between a and b., always ending with a.

Slide 64

Slide 64 text

36 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 65

Slide 65 text

a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 66

Slide 66 text

a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 67

Slide 67 text

a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 68

Slide 68 text

a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 69

Slide 69 text

a , a , b 36 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b") Success(pos = 3)

Slide 70

Slide 70 text

37 etc.

Slide 71

Slide 71 text

3. Examples 38

Slide 72

Slide 72 text

39 720 6 29530

Slide 73

Slide 73 text

40 digit = alt( *('0'..'9') .map { |c| char(c) } )

Slide 74

Slide 74 text

41 digit = char_in('0'..'9')

Slide 75

Slide 75 text

42 digit = char_in('0'..'9') nat_number = seq(digit, repeat(digit))

Slide 76

Slide 76 text

43 digit = char_in('0'..'9') nat_number = repeat1(digit)

Slide 77

Slide 77 text

44 digit = char_in('0'..'9')
 nat_number = repeat1(digit) .capture 

Slide 78

Slide 78 text

44 digit = char_in('0'..'9')
 nat_number = repeat1(digit) .capture 
 Success(pos = 3, data = "720")

Slide 79

Slide 79 text

45 def read(input, pos)

Slide 80

Slide 80 text

46 def read(input, pos) + 1) end

Slide 81

Slide 81 text

47 def read(input, pos) + 1, "blahblah") end

Slide 82

Slide 82 text

48 dec_number = seq( nat_number, char('.'), nat_number, )

Slide 83

Slide 83 text

49 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93

Slide 84

Slide 84 text

50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file =

Slide 85

Slide 85 text

50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file =

Slide 86

Slide 86 text

50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file =

Slide 87

Slide 87 text

50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file =

Slide 88

Slide 88 text

51 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93

Slide 89

Slide 89 text

52 [ ["Horan", "Niall", 93], ["Payne", "Liam", 93], ["Tomlinson", "Louis", 91], ["Styles", "Harry", 94], ["Malik", "Zayn", 93], ]

Slide 90

Slide 90 text

53 add(1, mul(2, 3)) sub(5, 4)

Slide 91

Slide 91 text

54 lparen = char('(') rparen = char(')') comma = char(',')

Slide 92

Slide 92 text

55 expr = alt(lazy { funcall }, nat_number)

Slide 93

Slide 93 text

56 funcall = seq( identifier, lparen, arg_list, rparen, )

Slide 94

Slide 94 text

57 letter =
 identifier =

Slide 95

Slide 95 text

58 arg_list =
 intersperse( expr, seq(comma, whitespace), )

Slide 96

Slide 96 text

59 arg_list =
 intersperse( expr, seq(comma, whitespace), )

Slide 97

Slide 97 text


Slide 98

Slide 98 text

60 expr_list =
 intersperse(expr, char("\n"))

Slide 99

Slide 99 text

60 expr_list =
 intersperse(expr, char("\n"))
 program =
 seq(expr_list, eof)

Slide 100

Slide 100 text

61 add(1, mul(2, 3)) sub(5, 4)

Slide 101

Slide 101 text

62 Success(pos = 27)

Slide 102

Slide 102 text

Where’s the data!!! 63

Slide 103

Slide 103 text

64 funcall = seq( identifier, lparen, arg_list, rparen, ) 

Slide 104

Slide 104 text

65 funcall = seq( identifier.capture, lparen, arg_list, rparen, ) 

Slide 105

Slide 105 text

66 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do |data| # stuff here end

Slide 106

Slide 106 text

67 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do |data|[0], data[2]) end

Slide 107

Slide 107 text

68 add(1, mul(2, 3)) sub(5, 4)

Slide 108

Slide 108 text

69 ["add", [ 1,"mul", [2, 3]), ]),"sub", [5, 4]), ]

Slide 109

Slide 109 text

And that is how you can write a parser.

Slide 110

Slide 110 text

And that is how you can write a parser using parser combinators. 71

Slide 111

Slide 111 text


Slide 112

Slide 112 text


Slide 113

Slide 113 text 73

Slide 114

Slide 114 text 73

Slide 115

Slide 115 text 73

Slide 116

Slide 116 text


Slide 117

Slide 117 text

74 require 'd-parse'

Slide 118

Slide 118 text

74 require 'd-parse'
 module JSONGrammar

Slide 119

Slide 119 text

74 require 'd-parse'
 module JSONGrammar extend DParse::DSL

Slide 120

Slide 120 text

74 require 'd-parse'
 module JSONGrammar extend DParse::DSL
 DIGIT = char_in('0'..'9')
 NUMBER = repeat1(DIGIT)

Slide 121

Slide 121 text

74 require 'd-parse'
 module JSONGrammar extend DParse::DSL
 DIGIT = char_in('0'..'9')
 NUMBER = repeat1(DIGIT)
 res = Grammar::NUMBER.apply('8700')

Slide 122

Slide 122 text


Slide 123

Slide 123 text

75 case res

Slide 124

Slide 124 text

75 case res when DParse::Success

Slide 125

Slide 125 text

75 case res when DParse::Success
 puts( when DParse::Failure
 $stderr.puts res.pretty_message

Slide 126

Slide 126 text

 expected identifier at line 1, column 36 def reticulate(splines, threshold, ) { ↑

Slide 127

Slide 127 text


Slide 128

Slide 128 text


Slide 129

Slide 129 text

78 My name is Denis. Ready to parse your questions. Find me at [email protected], or @denis on Slack.