Slide 1

Slide 1 text

Let’s write a parser! DENIS DEFREYNE / RUG˸˸B / MAY 12TH, 2016

Slide 2

Slide 2 text

1. Language 2

Slide 3

Slide 3 text

I am Denis. 3

Slide 4

Slide 4 text

But how do you know that I am Denis? 4

Slide 5

Slide 5 text

But how do you know that I am Denis? I told you. I wrote it down. Tobi introduced me. You might have seen me before. Etc. 5

Slide 6

Slide 6 text

But how do you know that I am Denis? You understand English. 6

Slide 7

Slide 7 text

Computers are stupid. 7

Slide 8

Slide 8 text

8 $ git commit --message="Fix bugs"

Slide 9

Slide 9 text

9 def greet(name) puts "Hello, #{name}" end

Slide 10

Slide 10 text

Text forms a language, but computers don’t know that. 10

Slide 11

Slide 11 text

2. Parsing 11

Slide 12

Slide 12 text

Basic idea: 12 Parser objects that are small, composable, and purely functional.

Slide 13

Slide 13 text

13 def read(input, pos)

Slide 14

Slide 14 text

14 def read(input, pos) Success.new(pos + 1) end

Slide 15

Slide 15 text

15 def read(input, pos) Failure.new(pos) end

Slide 16

Slide 16 text

16 char("H") Succeeds if the next character is the given one.

Slide 17

Slide 17 text

17 char("H").apply("Hello")

Slide 18

Slide 18 text

17 H e l l o char("H").apply("Hello")

Slide 19

Slide 19 text

17 H e l l o 0 1 2 3 4 char("H").apply("Hello")

Slide 20

Slide 20 text

17 H e l l o 0 1 2 3 4 char("H").apply("Hello")

Slide 21

Slide 21 text

17 H e l l o 0 1 2 3 4 char("H").apply("Hello")

Slide 22

Slide 22 text

17 H e l l o 0 1 2 3 4 char("H").apply("Hello") Success(position = 1)

Slide 23

Slide 23 text

18 char("H").apply("Adiós")

Slide 24

Slide 24 text

18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 25

Slide 25 text

18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 26

Slide 26 text

18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 27

Slide 27 text

Failure(position = 0) 18 A d i ó s 0 1 2 3 4 char("H").apply("Adiós")

Slide 28

Slide 28 text

19 if input[pos] == @char Success.new(pos + 1) else Failure.new(pos) end

Slide 29

Slide 29 text

20 seq(a, b) Succeeds if both given parsers succeed in sequence.

Slide 30

Slide 30 text

21 seq(char("H"), char("e")).apply("Hello")

Slide 31

Slide 31 text

H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 32

Slide 32 text

H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 33

Slide 33 text

H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 34

Slide 34 text

H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello")

Slide 35

Slide 35 text

H e l l o 21 0 1 2 3 4 seq(char("H"), char("e")).apply("Hello") Success(position = 2)

Slide 36

Slide 36 text

22 seq( char("H"), char("e"), char("l"), char("l"), char("o"), )

Slide 37

Slide 37 text

23 string(s) Succeeds if all characters in the given string can be read in sequence.

Slide 38

Slide 38 text

H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello")

Slide 39

Slide 39 text

H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello")

Slide 40

Slide 40 text

H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello")

Slide 41

Slide 41 text

H e l l o 24 0 1 2 3 4 string("Hello").apply("Hello") Success(position = 5)

Slide 42

Slide 42 text

25 eof() Succeeds at the end of input; fails otherwise.

Slide 43

Slide 43 text

H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 44

Slide 44 text

H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 45

Slide 45 text

H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 46

Slide 46 text

H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello")

Slide 47

Slide 47 text

H e l l o 26 0 1 2 3 4 seq(string("Hello"), eof).apply("Hello") Success(position = 5)

Slide 48

Slide 48 text

27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 49

Slide 49 text

27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 50

Slide 50 text

27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 51

Slide 51 text

27 0 1 2 3 4 5 H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 52

Slide 52 text

27 0 1 2 3 4 5 Failure(position = 5) H e l l o ! seq(string("Hello"), eof).apply("Hello!")

Slide 53

Slide 53 text

28 alt(a, b) Succeeds if either of the given parsers succeed.

Slide 54

Slide 54 text

A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")

Slide 55

Slide 55 text

A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")

Slide 56

Slide 56 text

A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós")

Slide 57

Slide 57 text

A d i ó s 29 0 1 2 3 4 alt(char("H"), char("A")).apply("Adiós") Success(position = 1)

Slide 58

Slide 58 text

30 whitespace_char = alt(char(" "), char("\t"))

Slide 59

Slide 59 text

31 optional(p) Succeeds always, but only advances if p succeeds.

Slide 60

Slide 60 text

32 repeat(p) Succeeds always, and attempts to apply p as often as possible.

Slide 61

Slide 61 text

33 repeat(whitespace_char)

Slide 62

Slide 62 text

34 intersperse(a, b) Alternates between a and b., always ending with a.

Slide 63

Slide 63 text

35 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 64

Slide 64 text

a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 65

Slide 65 text

a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 66

Slide 66 text

a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 67

Slide 67 text

a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b")

Slide 68

Slide 68 text

a , a , b 35 0 1 2 3 4 intersperse(char("a"), char(",")).apply("a,a,b") Success(position = 3)

Slide 69

Slide 69 text

36 etc.

Slide 70

Slide 70 text

37 720 6 29530

Slide 71

Slide 71 text

38 
 digit = alt( *('0'..'9') .map { |c| char(c) } )

Slide 72

Slide 72 text

39 
 digit = char_in('0'..'9')
 
 


Slide 73

Slide 73 text

40 nat_number = seq(digit, repeat(digit))

Slide 74

Slide 74 text

41 nat_number = repeat1(digit)

Slide 75

Slide 75 text

42 nat_number = repeat1(digit) .capture

Slide 76

Slide 76 text

42 nat_number = repeat1(digit) .capture Success(position = 3, data = "123")

Slide 77

Slide 77 text

43 nat_number = repeat1(digit) .capture .map(&:to_i)

Slide 78

Slide 78 text

43 nat_number = repeat1(digit) .capture .map(&:to_i) Success(position = 3, data = 123)

Slide 79

Slide 79 text

44 def read(input, pos)

Slide 80

Slide 80 text

45 def read(input, pos) Success.new(pos + 1) end

Slide 81

Slide 81 text

46 def read(input, pos) Success.new(pos + 1, "blahblah") end

Slide 82

Slide 82 text

47 first,last,age Denis,Defreyne,29

Slide 83

Slide 83 text

48

Slide 84

Slide 84 text

48 field =
 repeat(char_not(',', "\n")).capture


Slide 85

Slide 85 text

48 field =
 repeat(char_not(',', "\n")).capture
 line =
 field.intersperse(char(','))


Slide 86

Slide 86 text

48 field =
 repeat(char_not(',', "\n")).capture
 line =
 field.intersperse(char(','))
 file =
 seq(
 line.intersperse(char("\n")),
 end_of_input,
 )

Slide 87

Slide 87 text

49 [ ["first_name", "last_name", "age"], ["Denis", "Defreyne", "29"], ]

Slide 88

Slide 88 text

50 add(1, mul(2, 3)) mul(2, 3)

Slide 89

Slide 89 text

51 lparen = char('(') rparen = char(')') comma = char(',')

Slide 90

Slide 90 text

52 expr = alt(lazy { funcall }, nat_number)

Slide 91

Slide 91 text

53 funcall = seq( identifier, lparen, arglist, rparen, )

Slide 92

Slide 92 text

54 letter =
 char_in('a'..'z')
 identifier =
 repeat1(letter).capture

Slide 93

Slide 93 text

55

Slide 94

Slide 94 text

55 arglist =
 seq(expr, arglist_tail)


Slide 95

Slide 95 text

55 arglist =
 seq(expr, arglist_tail)
 arglist_tail =
 repeat(seq(comma, whitespace, expr))

Slide 96

Slide 96 text

56

Slide 97

Slide 97 text

56 expr_list =
 expr.intersperse(char("\n"))


Slide 98

Slide 98 text

56 expr_list =
 expr.intersperse(char("\n"))
 program =
 seq(expr_list, eof)

Slide 99

Slide 99 text

57 [ ["add", 1, ["mul", 2, 3]], ["mul", 2, 3], ]

Slide 100

Slide 100 text

And that is how you can write a parser. 58

Slide 101

Slide 101 text

github.com/ddfreyne/d-parse 59

Slide 102

Slide 102 text

github.com/ddfreyne/d-parse 59

Slide 103

Slide 103 text

github.com/ddfreyne/d-parse 59

Slide 104

Slide 104 text

60 My name is Denis Defreyne. Ready to parse your questions. Find me at [email protected], or @ddfreyne on Twitter.