Pro Yearly is on sale from $80 to $50! »

Let’s write a parser! [RUG::B edition]

Let’s write a parser! [RUG::B edition]

Be732ee41fd3038aa98a0a7e7b7be081?s=128

Denis Defreyne

May 12, 2016
Tweet

Transcript

  1. Let’s write a parser! DENIS DEFREYNE / RUG˸˸B / MAY

    12TH, 2016
  2. 1. Language 2

  3. I am Denis. 3

  4. But how do you know that I am Denis? 4

  5. But how do you know that I am Denis? I

    told you. I wrote it down. Tobi introduced me. You might have seen me before. Etc. 5
  6. But how do you know that I am Denis? You

    understand English. 6
  7. Computers are stupid. 7

  8. 8 $ git commit --message="Fix bugs"

  9. 9 def greet(name) puts "Hello, #{name}" end

  10. Text forms a language, but computers don’t know that. 10

  11. 2. Parsing 11

  12. Basic idea: 12 Parser objects that are small, composable, and

    purely functional.
  13. 13 def read(input, pos)

  14. 14 def read(input, pos) Success.new(pos + 1) end

  15. 15 def read(input, pos) Failure.new(pos) end

  16. 16 char("H") Succeeds if the next character is the given

    one.
  17. 17 char("H").apply("Hello")

  18. 17 H e l l o char("H").apply("Hello")

  19. 17 H e l l o 0 1 2 3

    4 char("H").apply("Hello")
  20. 17 H e l l o 0 1 2 3

    4 char("H").apply("Hello")
  21. 17 H e l l o 0 1 2 3

    4 char("H").apply("Hello")
  22. 17 H e l l o 0 1 2 3

    4 char("H").apply("Hello") Success(position = 1)
  23. 18 char("H").apply("Adiós")

  24. 18 A d i ó s 0 1 2 3

    4 char("H").apply("Adiós")
  25. 18 A d i ó s 0 1 2 3

    4 char("H").apply("Adiós")
  26. 18 A d i ó s 0 1 2 3

    4 char("H").apply("Adiós")
  27. Failure(position = 0) 18 A d i ó s 0

    1 2 3 4 char("H").apply("Adiós")
  28. 19 if input[pos] == @char Success.new(pos + 1) else Failure.new(pos)

    end
  29. 20 seq(a, b) Succeeds if both given parsers succeed in

    sequence.
  30. 21 seq(char("H"), char("e")).apply("Hello")

  31. H e l l o 21 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  32. H e l l o 21 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  33. H e l l o 21 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  34. H e l l o 21 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  35. H e l l o 21 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello") Success(position = 2)
  36. 22 seq( char("H"), char("e"), char("l"), char("l"), char("o"), )

  37. 23 string(s) Succeeds if all characters in the given string

    can be read in sequence.
  38. H e l l o 24 0 1 2 3

    4 string("Hello").apply("Hello")
  39. H e l l o 24 0 1 2 3

    4 string("Hello").apply("Hello")
  40. H e l l o 24 0 1 2 3

    4 string("Hello").apply("Hello")
  41. H e l l o 24 0 1 2 3

    4 string("Hello").apply("Hello") Success(position = 5)
  42. 25 eof() Succeeds at the end of input; fails otherwise.

  43. H e l l o 26 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  44. H e l l o 26 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  45. H e l l o 26 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  46. H e l l o 26 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  47. H e l l o 26 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello") Success(position = 5)
  48. 27 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  49. 27 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  50. 27 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  51. 27 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  52. 27 0 1 2 3 4 5 Failure(position = 5)

    H e l l o ! seq(string("Hello"), eof).apply("Hello!")
  53. 28 alt(a, b) Succeeds if either of the given parsers

    succeed.
  54. A d i ó s 29 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós")
  55. A d i ó s 29 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós")
  56. A d i ó s 29 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós")
  57. A d i ó s 29 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós") Success(position = 1)
  58. 30 whitespace_char = alt(char(" "), char("\t"))

  59. 31 optional(p) Succeeds always, but only advances if p succeeds.

  60. 32 repeat(p) Succeeds always, and attempts to apply p as

    often as possible.
  61. 33 repeat(whitespace_char)

  62. 34 intersperse(a, b) Alternates between a and b., always ending

    with a.
  63. 35 intersperse(char("a"), char(",")).apply("a,a,b")

  64. a , a , b 35 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  65. a , a , b 35 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  66. a , a , b 35 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  67. a , a , b 35 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  68. a , a , b 35 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b") Success(position = 3)
  69. 36 etc.

  70. 37 720 6 29530

  71. 38 
 digit = alt( *('0'..'9') .map { |c| char(c)

    } )
  72. 39 
 digit = char_in('0'..'9')
 
 


  73. 40 nat_number = seq(digit, repeat(digit))

  74. 41 nat_number = repeat1(digit)

  75. 42 nat_number = repeat1(digit) .capture

  76. 42 nat_number = repeat1(digit) .capture Success(position = 3, data =

    "123")
  77. 43 nat_number = repeat1(digit) .capture .map(&:to_i)

  78. 43 nat_number = repeat1(digit) .capture .map(&:to_i) Success(position = 3, data

    = 123)
  79. 44 def read(input, pos)

  80. 45 def read(input, pos) Success.new(pos + 1) end

  81. 46 def read(input, pos) Success.new(pos + 1, "blahblah") end

  82. 47 first,last,age Denis,Defreyne,29

  83. 48

  84. 48 field =
 repeat(char_not(',', "\n")).capture


  85. 48 field =
 repeat(char_not(',', "\n")).capture
 line =
 field.intersperse(char(','))


  86. 48 field =
 repeat(char_not(',', "\n")).capture
 line =
 field.intersperse(char(','))
 file =


    seq(
 line.intersperse(char("\n")),
 end_of_input,
 )
  87. 49 [ ["first_name", "last_name", "age"], ["Denis", "Defreyne", "29"], ]

  88. 50 add(1, mul(2, 3)) mul(2, 3)

  89. 51 lparen = char('(') rparen = char(')') comma = char(',')

  90. 52 expr = alt(lazy { funcall }, nat_number)

  91. 53 funcall = seq( identifier, lparen, arglist, rparen, )

  92. 54 letter =
 char_in('a'..'z')
 identifier =
 repeat1(letter).capture

  93. 55

  94. 55 arglist =
 seq(expr, arglist_tail)


  95. 55 arglist =
 seq(expr, arglist_tail)
 arglist_tail =
 repeat(seq(comma, whitespace, expr))

  96. 56

  97. 56 expr_list =
 expr.intersperse(char("\n"))


  98. 56 expr_list =
 expr.intersperse(char("\n"))
 program =
 seq(expr_list, eof)

  99. 57 [ ["add", 1, ["mul", 2, 3]], ["mul", 2, 3],

    ]
  100. And that is how you can write a parser. 58

  101. github.com/ddfreyne/d-parse 59

  102. github.com/ddfreyne/d-parse 59

  103. github.com/ddfreyne/d-parse 59

  104. 60 My name is Denis Defreyne. Ready to parse your

    questions. Find me at denis@stoneship.org, or @ddfreyne on Twitter.