Let’s write a parser! [SoundCloud HQ edition]

Let’s write a parser! [SoundCloud HQ edition]

Be732ee41fd3038aa98a0a7e7b7be081?s=128

Denis Defreyne

May 17, 2016
Tweet

Transcript

  1. Let’s write a parser! DENIS DEFREYNE / SOUNDCLOUD, BERLIN /

    MAY 17TH, 2016
  2. 1. Language 2

  3. I am Denis. 3

  4. But how do you know that I am Denis? 4

  5. But how do you know that I am Denis? I

    told you. I wrote it down. You’ve probably seen me before. Etc. 5
  6. But how do you know that I am Denis? You

    understand English. 6
  7. Computers are stupid. 7

  8. 8 $ git commit --message="Fix bugs"

  9. 9 def greet(name) puts "Hello, #{name}" end

  10. 10 def greet(name: String): Unit = { println(s"Hello, $name!") }

  11. Text forms a language, but computers don’t know that. 11

  12. 2. Parsing 12

  13. Basic idea: 13 Parser objects that are small, composable, and

    purely functional.
  14. 14 def read(input, pos)

  15. 15 def read(input, pos) Success.new(pos + 1) end

  16. 16 def read(input, pos) Failure.new(pos) end

  17. 17 char("H") Succeeds if the next character is the given

    one.
  18. 18 char("H").apply("Hello")

  19. 18 H e l l o char("H").apply("Hello")

  20. 18 H e l l o 0 1 2 3

    4 char("H").apply("Hello")
  21. 18 H e l l o 0 1 2 3

    4 char("H").apply("Hello")
  22. 18 H e l l o 0 1 2 3

    4 char("H").apply("Hello")
  23. 18 H e l l o 0 1 2 3

    4 char("H").apply("Hello") Success(pos = 1)
  24. 19 char("H").apply("Adiós")

  25. 19 A d i ó s 0 1 2 3

    4 char("H").apply("Adiós")
  26. 19 A d i ó s 0 1 2 3

    4 char("H").apply("Adiós")
  27. 19 A d i ó s 0 1 2 3

    4 char("H").apply("Adiós")
  28. Failure(pos = 0) 19 A d i ó s 0

    1 2 3 4 char("H").apply("Adiós")
  29. 20 if input[pos] == @char Success.new(pos + 1) else Failure.new(pos)

    end
  30. 21 seq(a, b) Succeeds if both given parsers succeed in

    sequence.
  31. 22 seq(char("H"), char("e")).apply("Hello")

  32. H e l l o 22 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  33. H e l l o 22 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  34. H e l l o 22 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  35. H e l l o 22 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello")
  36. H e l l o 22 0 1 2 3

    4 seq(char("H"), char("e")).apply("Hello") Success(pos = 2)
  37. 23 seq( char("H"), char("e"), char("l"), char("l"), char("o"), )

  38. 24 string(s) Succeeds if all characters in the given string

    can be read in sequence.
  39. H e l l o 25 0 1 2 3

    4 string("Hello").apply("Hello")
  40. H e l l o 25 0 1 2 3

    4 string("Hello").apply("Hello")
  41. H e l l o 25 0 1 2 3

    4 string("Hello").apply("Hello")
  42. H e l l o 25 0 1 2 3

    4 string("Hello").apply("Hello") Success(pos = 5)
  43. 26 eof() Succeeds at the end of input; fails otherwise.

  44. H e l l o 27 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  45. H e l l o 27 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  46. H e l l o 27 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  47. H e l l o 27 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello")
  48. H e l l o 27 0 1 2 3

    4 seq(string("Hello"), eof).apply("Hello") Success(pos = 5)
  49. 28 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  50. 28 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  51. 28 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  52. 28 0 1 2 3 4 5 H e l

    l o ! seq(string("Hello"), eof).apply("Hello!")
  53. 28 0 1 2 3 4 5 Failure(pos = 5)

    H e l l o ! seq(string("Hello"), eof).apply("Hello!")
  54. 29 alt(a, b) Succeeds if either of the given parsers

    succeed.
  55. A d i ó s 30 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós")
  56. A d i ó s 30 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós")
  57. A d i ó s 30 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós")
  58. A d i ó s 30 0 1 2 3

    4 alt(char("H"), char("A")).apply("Adiós") Success(pos = 1)
  59. 31 whitespace_char = alt( char(" "), char("\t"), char("\r"), char("\n"), )

  60. 32 opt(p) Succeeds always, but only advances if p succeeds.

  61. 33 repeat(p) Succeeds always, and attempts to apply p as

    often as possible.
  62. 34 repeat(whitespace_char)

  63. 35 intersperse(a, b) Alternates between a and b., always ending

    with a.
  64. 36 intersperse(char("a"), char(",")).apply("a,a,b")

  65. a , a , b 36 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  66. a , a , b 36 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  67. a , a , b 36 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  68. a , a , b 36 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b")
  69. a , a , b 36 0 1 2 3

    4 intersperse(char("a"), char(",")).apply("a,a,b") Success(pos = 3)
  70. 37 etc.

  71. 3. Examples 38

  72. 39 720 6 29530

  73. 40 digit = alt( *('0'..'9') .map { |c| char(c) }

    )

  74. 41 digit = char_in('0'..'9')
 
 
 
 
 


  75. 42 digit = char_in('0'..'9') nat_number = seq(digit, repeat(digit))

  76. 43 digit = char_in('0'..'9') nat_number = repeat1(digit)

  77. 44 digit = char_in('0'..'9')
 
 nat_number = repeat1(digit) .capture 


  78. 44 digit = char_in('0'..'9')
 
 nat_number = repeat1(digit) .capture 


    Success(pos = 3, data = "720")
  79. 45 def read(input, pos)

  80. 46 def read(input, pos) Success.new(pos + 1) end

  81. 47 def read(input, pos) Success.new(pos + 1, "blahblah") end

  82. 48 dec_number = seq( nat_number, char('.'), nat_number, )

  83. 49 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93

  84. 50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file

    =
 seq(
 line.intersperse(char("\n")),
 eof(),
 )
  85. 50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file

    =
 seq(
 line.intersperse(char("\n")),
 eof(),
 )
  86. 50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file

    =
 seq(
 line.intersperse(char("\n")),
 eof(),
 )
  87. 50 field =
 repeat(char_not_in(',', "\n"))
 line =
 intersperse(field, char(','))
 file

    =
 seq(
 line.intersperse(char("\n")),
 eof(),
 )
  88. 51 Horan,Niall,93 Payne,Liam,93 Tomlinson,Louis,91 Styles,Harry,94 Malik,Zayn,93

  89. 52 [ ["Horan", "Niall", 93], ["Payne", "Liam", 93], ["Tomlinson", "Louis",

    91], ["Styles", "Harry", 94], ["Malik", "Zayn", 93], ]
  90. 53 add(1, mul(2, 3)) sub(5, 4)

  91. 54 lparen = char('(') rparen = char(')') comma = char(',')

  92. 55 expr = alt(lazy { funcall }, nat_number)

  93. 56 funcall = seq( identifier, lparen, arg_list, rparen, )

  94. 57 letter =
 char_in('a'..'z')
 identifier =
 repeat1(letter)

  95. 58 arg_list =
 
 intersperse( expr, seq(comma, whitespace), )


  96. 59 arg_list =
 opt(
 intersperse( expr, seq(comma, whitespace), )
 )

  97. 60

  98. 60 expr_list =
 intersperse(expr, char("\n"))


  99. 60 expr_list =
 intersperse(expr, char("\n"))
 program =
 seq(expr_list, eof)

  100. 61 add(1, mul(2, 3)) sub(5, 4)

  101. 62 Success(pos = 27)

  102. Where’s the data!!! 63

  103. 64 funcall = seq( identifier, lparen, arg_list, rparen, ) 


  104. 65 funcall = seq( identifier.capture, lparen, arg_list, rparen, ) 


  105. 66 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do

    |data| # stuff here end
  106. 67 funcall = seq( identifier.capture, lparen, arg_list, rparen, ).map do

    |data| FunCall.new(data[0], data[2]) end
  107. 68 add(1, mul(2, 3)) sub(5, 4)

  108. 69 [ FunCall.new("add", [ 1, FunCall.new("mul", [2, 3]), ]), FunCall.new("sub",

    [5, 4]), ]
  109. And that is how you can write a parser.
 70

  110. And that is how you can write a parser using

    parser combinators. 71
  111. 72 ḌPARSE

  112. 72 ḌPARSE A GOOD PARSER LIBRARY FOR RUBY

  113. github.com/ddfreyne/d-parse 73

  114. github.com/ddfreyne/d-parse 73

  115. github.com/ddfreyne/d-parse 73

  116. 74

  117. 74 require 'd-parse'


  118. 74 require 'd-parse'
 module JSONGrammar

  119. 74 require 'd-parse'
 module JSONGrammar extend DParse::DSL


  120. 74 require 'd-parse'
 module JSONGrammar extend DParse::DSL
 DIGIT = char_in('0'..'9')


    NUMBER = repeat1(DIGIT)
 end

  121. 74 require 'd-parse'
 module JSONGrammar extend DParse::DSL
 DIGIT = char_in('0'..'9')


    NUMBER = repeat1(DIGIT)
 end
 res = Grammar::NUMBER.apply('8700')
  122. 75

  123. 75 case res

  124. 75 case res when DParse::Success
 puts(res.data.inspect)

  125. 75 case res when DParse::Success
 puts(res.data.inspect) when DParse::Failure
 $stderr.puts res.pretty_message


    exit(1)
 end
  126. 76 
 expected identifier at line 1, column 36 def

    reticulate(splines, threshold, ) { ↑
  127. 77 github.com/ddfreyne/d-parse

  128. 77 github.com/ddfreyne/d-parse PRE- ALPHA!
 BE AN EARLY
 
 ADOPTER!


  129. 78 My name is Denis. Ready to parse your questions.

    Find me at denis@soundcloud.com, or @denis on Slack.