Save 37% off PRO during our Black Friday Sale! »

Parsing Ruby (RubyConf)

944780427713649edff38f53df53b6e1?s=47 Kevin Newton
November 09, 2021
46

Parsing Ruby (RubyConf)

Since Ruby's inception, there have been many different projects that parse Ruby code. This includes everything from development tools to Ruby implementations themselves. This talk dives into the technical details and tradeoffs of how each of these tools parses and subsequently understands your applications. After, we'll discuss how you can do the same with your own projects using the Ripper standard library. You'll see just how far we can take this library toward building useful development tools.

944780427713649edff38f53df53b6e1?s=128

Kevin Newton

November 09, 2021
Tweet

Transcript

  1. Parsing Ruby Kevin Newton

  2. Kevin Newton

  3. Kevin Newton

  4. Kevin Newton

  5. None
  6. • Build a grammar for a simple language

  7. • Build a grammar for a simple language • Build

    a parser for a simple language
  8. • Build a grammar for a simple language • Build

    a parser for a simple language • Look at the history of the Ruby parser
  9. • Build a grammar for a simple language • Build

    a parser for a simple language • Look at the history of the Ruby parser • Look at how Ripper works
  10. Building a grammar

  11. twitter.com/kddnewton Parsing Ruby

  12. twitter.com/kddnewton Parsing Ruby program: | number number: | NUMBER

  13. twitter.com/kddnewton Parsing Ruby program: | number number: | NUMBER 1

  14. twitter.com/kddnewton Parsing Ruby program: | number number: | NUMBER 2

  15. twitter.com/kddnewton Parsing Ruby program: | number number: | NUMBER 7

  16. twitter.com/kddnewton Parsing Ruby program: | number number: | NUMBER

  17. twitter.com/kddnewton Parsing Ruby program: | expression expression: | number "+"

    number | number number: | NUMBER
  18. twitter.com/kddnewton Parsing Ruby 1 program: | expression expression: | number

    "+" number | number number: | NUMBER
  19. twitter.com/kddnewton Parsing Ruby 1 + 2 program: | expression expression:

    | number "+" number | number number: | NUMBER
  20. twitter.com/kddnewton Parsing Ruby program: | expression expression: | number "+"

    number | number number: | NUMBER
  21. twitter.com/kddnewton Parsing Ruby program: | expression expression: | number "+"

    number | number "-" number | number number: | NUMBER
  22. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    number | expression "-" number | number number: | NUMBER
  23. twitter.com/kddnewton Parsing Ruby 1 program: | expression expression: | expression

    "+" number | expression "-" number | number number: | NUMBER
  24. twitter.com/kddnewton Parsing Ruby 1 + 2 program: | expression expression:

    | expression "+" number | expression "-" number | number number: | NUMBER
  25. twitter.com/kddnewton Parsing Ruby 1 + 2 + 3 program: |

    expression expression: | expression "+" number | expression "-" number | number number: | NUMBER
  26. twitter.com/kddnewton Parsing Ruby 1 + 2 + 3 - 4

    + 5 - 6 program: | expression expression: | expression "+" number | expression "-" number | number number: | NUMBER
  27. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    number | expression "-" number | number number: | NUMBER
  28. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    number | expression "-" number | number term: | term "*" number | term "/" number | number number: | NUMBER
  29. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    term | expression "-" term | term term: | term "*" number | term "/" number | number number: | NUMBER
  30. twitter.com/kddnewton Parsing Ruby 1 program: | expression expression: | expression

    "+" term | expression "-" term | term term: | term "*" number | term "/" number | number number: | NUMBER
  31. twitter.com/kddnewton Parsing Ruby 1 * 2 program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" number | term "/" number | number number: | NUMBER
  32. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    term | expression "-" term | term term: | term "*" number | term "/" number | number number: | NUMBER
  33. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    term | expression "-" term | term term: | term "*" number | term "/" number | number number: | NUMBER | "(" expression ")"
  34. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  35. Building a parser

  36. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

  37. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

  38. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  39. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  40. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  41. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  42. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  43. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  44. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  45. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end
  46. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER
  47. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+”
  48. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+” “(”
  49. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+” “(” NUMBER
  50. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+” “(” NUMBER “-”
  51. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+” “(” NUMBER “-” NUMBER
  52. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+” “(” NUMBER “-” NUMBER “)”
  53. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+” “(” NUMBER “-” NUMBER “)” “*”
  54. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end NUMBER “+” “(” NUMBER “-” NUMBER “)” “*” NUMBER
  55. twitter.com/kddnewton Parsing Ruby 1 + (4 - 2) * 3

    NUMBER “+” “(” NUMBER “-” NUMBER “)” “*” NUMBER
  56. twitter.com/kddnewton Parsing Ruby NUMBER “+” “(” NUMBER “-” NUMBER “)”

    “*” NUMBER
  57. Accepting the input

  58. twitter.com/kddnewton Parsing Ruby NUMBER “+” “(” NUMBER “-” NUMBER “)”

    “*” NUMBER
  59. twitter.com/kddnewton Parsing Ruby NUMBER “+” “(” NUMBER “-” NUMBER “)”

    “*” NUMBER program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  60. twitter.com/kddnewton Parsing Ruby NUMBER “+” “(” NUMBER “-” NUMBER “)”

    “*” NUMBER program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  61. twitter.com/kddnewton Parsing Ruby “+” “(” NUMBER “-” NUMBER “)” “*”

    NUMBER program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" NUMBER
  62. twitter.com/kddnewton Parsing Ruby “+” “(” NUMBER “-” NUMBER “)” “*”

    NUMBER program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" NUMBER
  63. twitter.com/kddnewton Parsing Ruby “+” “(” NUMBER “-” NUMBER “)” “*”

    NUMBER program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" factor
  64. twitter.com/kddnewton Parsing Ruby “(” NUMBER “-” NUMBER “)” “*” NUMBER

    program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" “+” factor
  65. twitter.com/kddnewton Parsing Ruby NUMBER “-” NUMBER “)” “*” NUMBER program:

    | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" “(” “+” factor
  66. twitter.com/kddnewton Parsing Ruby “-” NUMBER “)” “*” NUMBER program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" NUMBER “(” “+” factor
  67. twitter.com/kddnewton Parsing Ruby “-” NUMBER “)” “*” NUMBER program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" NUMBER “(” “+” factor
  68. twitter.com/kddnewton Parsing Ruby “-” NUMBER “)” “*” NUMBER program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" factor “(” “+” factor
  69. twitter.com/kddnewton Parsing Ruby NUMBER “)” “*” NUMBER program: | expression

    expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" “-” factor “(” “+” factor
  70. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" NUMBER “-” factor “(” “+” factor
  71. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" factor “-” factor “(” “+” factor
  72. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" factor “+” “(” factor “-” factor
  73. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" factor “-” factor “(” “+” factor
  74. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" factor “-” factor “(” “+” factor
  75. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER term “-” term “(”

    “+” factor program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  76. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER term “-” term “(”

    “+” factor program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  77. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER expression “-” term “(”

    “+” factor program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  78. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER expression “-” term “(”

    “+” factor program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  79. twitter.com/kddnewton Parsing Ruby “)” “*” NUMBER expression “(” “+” factor

    program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  80. twitter.com/kddnewton Parsing Ruby “*” NUMBER “)” expression “(” “+” factor

    program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  81. twitter.com/kddnewton Parsing Ruby “*” NUMBER “)” expression “(” “+” factor

    program: | expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  82. twitter.com/kddnewton Parsing Ruby “*” NUMBER factor “+” factor program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  83. twitter.com/kddnewton Parsing Ruby NUMBER “*” factor “+” factor program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  84. twitter.com/kddnewton Parsing Ruby NUMBER “*” factor “+” factor program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  85. twitter.com/kddnewton Parsing Ruby factor “*” factor “+” factor program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  86. twitter.com/kddnewton Parsing Ruby term “*” factor “+” factor program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  87. twitter.com/kddnewton Parsing Ruby term “*” factor “+” factor program: |

    expression expression: | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  88. twitter.com/kddnewton Parsing Ruby term “+” factor program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  89. twitter.com/kddnewton Parsing Ruby expression “+” term program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  90. twitter.com/kddnewton Parsing Ruby expression “+” term program: | expression expression:

    | expression "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  91. twitter.com/kddnewton Parsing Ruby program program: | expression expression: | expression

    "+" term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  92. Parser generators

  93. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")"
  94. twitter.com/kddnewton Parsing Ruby program: | expression expression: | expression "+"

    term | expression "-" term | term term: | term "*" factor | term "/" factor | factor factor: | NUMBER | "(" expression ")" class Parser prechigh left "*" "/" left "+" "-" preclow options no_result_var rule program : expression expression : expression "+" expression | expression "-" expression | expression "*" expression | expression "/" expression | "(" expression ")" | NUMBER end
  95. twitter.com/kddnewton Parsing Ruby class Parser prechigh left "*" "/" left

    "+" "-" preclow options no_result_var rule program : expression expression : expression "+" expression | expression "-" expression | expression "*" expression | expression "/" expression | "(" expression ")" | NUMBER end
  96. twitter.com/kddnewton Parsing Ruby class Parser prechigh left "*" "/" left

    "+" "-" preclow options no_result_var rule program : expression expression : expression "+" expression { val[0] + val[2] } | expression "-" expression { val[0] - val[2] } | expression "*" expression { val[0] * val[2] } | expression "/" expression { val[0] / val[2] } | "(" expression ")" { val[1] } | NUMBER end
  97. twitter.com/kddnewton Parsing Ruby class Parser prechigh left "*" "/" left

    "+" "-" preclow options no_result_var rule program : expression expression : expression "+" expression { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER end
  98. twitter.com/kddnewton Parsing Ruby $

  99. twitter.com/kddnewton Parsing Ruby $ racc parser.y -o parser.rb $

  100. twitter.com/kddnewton Parsing Ruby $ racc parser.y -o parser.rb $ irb

    irb(main):001:0>
  101. twitter.com/kddnewton Parsing Ruby $ racc parser.y -o parser.rb $ irb

    irb(main):001:0> require_relative "parser" => true irb(main):002:0>
  102. twitter.com/kddnewton Parsing Ruby $ racc parser.y -o parser.rb $ irb

    irb(main):001:0> require_relative "parser" => true irb(main):002:0> Parser.new.parse("1 + (4 - 2) * 3") => [:add, 1, [:mul, [:prn, [:sub, 4, 2]], 3]] irb(main):003:0>
  103. twitter.com/kddnewton Parsing Ruby [:add, 1, [:mul, [:prn, [:sub, 4, 2]],

    3]]
  104. twitter.com/kddnewton Parsing Ruby add 1 mul prn sub 3 4

    2
  105. twitter.com/kddnewton Parsing Ruby add 1 mul prn sub 3 4

    2 1 + (4 - 2) * 3
  106. twitter.com/kddnewton Parsing Ruby class Parser prechigh left "*" "/" left

    "+" "-" preclow options no_result_var rule program : expression expression : expression "+" expression { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER end
  107. History of the Ruby parser

  108. The Early Days

  109. twitter.com/kddnewton Parsing Ruby Ruby 0.06 1994-01-07 Fri Jan 7 15:23:20

    1994 Yukihiro Matsumoto (matz at nws119) * baseline - version 0.06.
  110. twitter.com/kddnewton Parsing Ruby Ruby 0.76 1995-05-19 Thu Jul 14 11:18:07

    1994 Yukihiro Matsumoto (matz@ix-02) * parse.y: Dictを⽣成する構⽂を追加. こちらを{..}にした. * parse.y: 配列を⽣成する構⽂を[..]に変更した. 過去のRubyスクリプ トとの互換性が保てないが, Dictを⽣成する構⽂を導⼊するに当たり, perl5に合わせて(意識して), 変更する時期は今しかないと考えた. *BACKWARD INCOMPATIBILITY*
  111. twitter.com/kddnewton Parsing Ruby Ruby 0.95 1995-12-21 Thu Nov 9 23:26:01

    1995 Yukihiro Matsumoto <matz@caelum.co.jp> * parse.y (f_arglist): メソッド定義の引数を括弧で括らなくても良い ようにした. Mon Aug 7 12:47:41 1995 Yukihiro Matsumoto <matz@caelum.co.jp> * parse.y: resque -> rescue.恥ずかしいがtypoを残しておくわけには いかないよなあ.なんで今まで気がつかなかったのか….
  112. Ruby 1.x

  113. twitter.com/kddnewton Parsing Ruby Ruby 1.0.961225 1996-12-25 Wed May 22 19:48:42

    1996 Yukihiro Matsumoto <matz@caelum.co.jp> * parse.y (superclass): スーパークラスの指定⼦を`:'から`<'に変更. Wed Mar 27 10:02:44 1996 Yukihiro Matsumoto <matz@caelum.co.jp> * parse.y: 予約語の変更 continue -> next
  114. twitter.com/kddnewton Parsing Ruby Ruby 1.0.971225 1997-12-25 Mon Apr 7 11:36:16

    1997 Yukihiro Matsumoto <matz@caelum.co.jp> * parse.y (primary): syntax to access singleton class. Thu Apr 3 02:12:31 1997 Yukihiro Matsumoto <matz@caelum.co.jp> * parse.y (parse_regx): new option //[nes] to specify character code for regexp literals. Last speci fi ed code option is valid.
  115. twitter.com/kddnewton Parsing Ruby Ruby 1.3.0 1998-12-24 • begin..rescue..else..end clauses •

    <<- indentable heredocs
 • :: method calls
  116. twitter.com/kddnewton Parsing Ruby Ruby 1.2.0 1998-12-25 • heredocs
 • =begin

    to =end
 • true and false
 • BEGIN and END
 • %w
 • Top-level constant access
 • ||= and &&=
  117. twitter.com/kddnewton Parsing Ruby Ruby 1.4.0 1999-08-13 • binary number literals


    • anonymous * in method de fi nitions
 • nested string interpolation
 • multibyte character identi fi ers
  118. twitter.com/kddnewton Parsing Ruby Ruby 1.5.0 1999-12-07 • Compile-time string concatenation

  119. twitter.com/kddnewton Parsing Ruby Ruby 1.6.0 2000-09-19 • rescue modi fi

    er
  120. twitter.com/kddnewton Parsing Ruby nodeDump 0.1.0 2000-10-01 • C extension
 •

    Human-readable format
  121. twitter.com/kddnewton Parsing Ruby Ruby 1.7.1 2001-06-01 • break and next

    now accept values
 • %w can escape spaces
 • rescue in singleton method bodies
  122. twitter.com/kddnewton Parsing Ruby JRuby 2001-09-10 • Java port of Ruby

    1.6
 • Rewrite actions in parse.y into Java
 • Rewrite standard library in Ruby
  123. twitter.com/kddnewton Parsing Ruby class Parser prechigh left "*" "/" left

    "+" "-" preclow options no_result_var rule program : expression expression : expression "+" expression { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER end
  124. twitter.com/kddnewton Parsing Ruby class Parser prechigh left "*" "/" left

    "+" "-" preclow options no_result_var rule program : expression expression : expression "+" expression { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER end
  125. twitter.com/kddnewton Parsing Ruby ripper 0.0.1 2001-10-20 • Rewrite parse.y to

    dispatch parser events
 • “Ripper is still early-alpha version”
  126. twitter.com/kddnewton Parsing Ruby Ruby 1.8.0 2003-08-04 • %W word lists


    • Dynamic symbols
 • Nested constant assignment
  127. twitter.com/kddnewton Parsing Ruby ParseTree 1.0.0 2004-11-10 • C extension
 •

    s-expressions from NODE structs
  128. twitter.com/kddnewton Parsing Ruby Rubinius 2006-07-12 • sydney
 • Rewrote parse.y

    in Ruby
 • Rewrote standard library in Ruby
  129. twitter.com/kddnewton Parsing Ruby Cardinal 2006-07-16 • Parrot VM
 • From

    scratch PGE grammar from Ruby EBNF
  130. twitter.com/kddnewton Parsing Ruby IronRuby 2007-04-30 • Microsoft .NET port of

    Ruby
 • Rewrote parse.y, reused parts
  131. twitter.com/kddnewton Parsing Ruby ruby_parser 1.0.0 2007-11-14 • Rewrite parse.y in

    Ruby, use racc
 • dawnscanner, debride, fasterer, fl ay, fl og, railroader, roodi
  132. YARV

  133. twitter.com/kddnewton Parsing Ruby Ruby 1.9.0 2007-12-25 • Bison
 • YARV


    • Ripper merged
 • Lambda literals
 • Symbol hash keys
  134. twitter.com/kddnewton Parsing Ruby Ruby 1.9.1 2009-01-30 • encoding pragma
 •

    call shorthand
 • positional arguments after splat
  135. twitter.com/kddnewton Parsing Ruby Ruby Intermediate Language 2009-10-26 • Intermediate representation

    for powering semantic analysis
 • Used to implement type systems in OCaml
 • druby, rtc, rubydust
  136. twitter.com/kddnewton Parsing Ruby Ruby 1.9.3 • JIS X 3017
 •

    ISO/IEC 30170:2012 2011-10-31
  137. Ruby 2.x

  138. twitter.com/kddnewton Parsing Ruby Ruby 2.0.0 • Re fi nements
 •

    %i symbol lists
 • Keyword arguments 2013-02-24
  139. twitter.com/kddnewton Parsing Ruby parser • Published gem, parser API
 •

    Well-documented
 • covered, deep-cover, erb-lint, fast, opal, packwerk, querly, rdl, reek, rubocop, rubrowser, ruby-lint, ruby-next, ruby_detective, rubycritic, seeing_is_believing, standard, steep, unparser, vernacular, yoda 2013-04-15
  140. twitter.com/kddnewton Parsing Ruby Tru ffl eRuby • Originally branched o

    ff JRuby
 • Graal dynamic compiler, Tru ffl e AST interpreter 2013-10-26
  141. twitter.com/kddnewton Parsing Ruby Ruby 2.1.0 • Required keyword arguments
 •

    Rational and complex literals
 • Frozen string literal su ffi x 2013-12-25
  142. twitter.com/kddnewton Parsing Ruby Ruby 2.2.0 • Dynamic symbol hash keys

    2014-12-25
  143. twitter.com/kddnewton Parsing Ruby Ruby 2.3.0 • Frozen string literal pragma


    • <<~ heredocs
 • &. lonely operator 2015-12-25
  144. twitter.com/kddnewton Parsing Ruby Ruby 2.4.0 • Symbol#to_proc re fi nements


    • Top-level return
 • Multiple assignment in conditional 2016-12-25
  145. twitter.com/kddnewton Parsing Ruby tree-sitter • Parser-generator library
 • vscode-ruby 2017-02-02

  146. twitter.com/kddnewton Parsing Ruby typedruby • Type system in Rust, parser

    in C++
 • Grammar from Ruby 2.4, lexer from ruby_parser
 • Vendored in Sorbet 2017-02-26
  147. twitter.com/kddnewton Parsing Ruby Ruby 2.5.0 • String interpolation re fi

    nements
 • rescue and ensure at the block level 2017-12-25
  148. twitter.com/kddnewton Parsing Ruby Ruby 2.6.0 • RubyVM::AbstractSyntaxTree
 • Flip- fl

    op deprecated
 • Endless ranges
 • Non-ASCII constant names 2018-12-25
  149. twitter.com/kddnewton Parsing Ruby Ruby 2.7.0 • Flip- fl op undeprecated

    • Method reference operator • Keyword argument warning • No other keywords syntax • Beginless range • Pattern matching • Numbered parameters • Rightward assignment • Argument forwarding 2019-12-25
  150. Ruby 3.x

  151. twitter.com/kddnewton Parsing Ruby Ruby 3.0.0 • Keyword arguments
 • Single-line

    “endless” methods
 • “Find pattern” pattern matching
 • shareable_constant_value pragma
 • in keyword pattern matching 2020-12-25
  152. twitter.com/kddnewton Parsing Ruby Ruby 3.1.0 Preview 1 • Hash literal

    shorthand { x:, y: } == { x: x, y: y }
 • Pinned expressions 2021-11-09
  153. How Ripper works

  154. twitter.com/kddnewton Parsing Ruby class Parser expression : expression "+" expression

    { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER def parse(input) until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end end
  155. twitter.com/kddnewton Parsing Ruby class Parser expression : expression "+" expression

    { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER def parse(input) until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end end
  156. twitter.com/kddnewton Parsing Ruby class Parser expression : expression "+" expression

    { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER def parse(input) until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end end
  157. twitter.com/kddnewton Parsing Ruby static enum yytokentype parser_yylex(struct parser_params *p) {

    ... case '#': /* it's a comment */ p->token_seen = token_seen; lex_goto_eol(p); dispatch_scan_event(p, tCOMMENT); fallthru = TRUE; /* fall through */ case '\n': p->token_seen = token_seen; ... }
  158. twitter.com/kddnewton Parsing Ruby static enum yytokentype parser_yylex(struct parser_params *p) {

    ... case '#': /* it's a comment */ p->token_seen = token_seen; lex_goto_eol(p); dispatch_scan_event(p, tCOMMENT); fallthru = TRUE; /* fall through */ case '\n': p->token_seen = token_seen; ... }
  159. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY
  160. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY
  161. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY
  162. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY
  163. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY
  164. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY
  165. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY
  166. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper attr_reader

    :comments def initialize(*) super @comments = [] end def on_comment(value) @comments << value end end p Parser.new(<<~RUBY).tap(&:parse).comments # this is a comment # this is another comment # this is a third comment RUBY [ "# this is a comment\n", "# this is another comment\n", "# this is a third comment\n" ]
  167. twitter.com/kddnewton Parsing Ruby class Parser expression : expression "+" expression

    { [:add, val[0], val[2]] } | expression "-" expression { [:sub, val[0], val[2]] } | expression "*" expression { [:mul, val[0], val[2]] } | expression "/" expression { [:div, val[0], val[2]] } | "(" expression ")" { [:prn, val[1]] } | NUMBER def parse(input) until input.empty? case input when /^\s+/ when /^\d+/ yield [:NUMBER, $&.to_i] when /^[-+*\/()]/ yield [$&, $&] else raise "Failed to parse: #{input.first(10)}" end input = $' end end
  168. twitter.com/kddnewton Parsing Ruby method_call : | keyword_super paren_args { /*%%%*/

    $$ = NEW_SUPER($2, &@$); /*% %*/ /*% ripper: super!($2) %*/ } | keyword_super { /*%%%*/ $$ = NEW_ZSUPER(&@$); /*% %*/ /*% ripper: zsuper! %*/ }
  169. twitter.com/kddnewton Parsing Ruby method_call : | keyword_super paren_args { /*%%%*/

    $$ = NEW_SUPER($2, &@$); /*% %*/ /*% ripper: super!($2) %*/ } | keyword_super { /*%%%*/ $$ = NEW_ZSUPER(&@$); /*% %*/ /*% ripper: zsuper! %*/ }
  170. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper def

    on_super(arguments) puts "super called with arguments: #{arguments}" end def on_zsuper puts "super called without arguments" end end Parser.new(<<~RUBY).parse super super(1, 2, 3) RUBY
  171. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper def

    on_super(arguments) puts "super called with arguments: #{arguments}" end def on_zsuper puts "super called without arguments" end end Parser.new(<<~RUBY).parse super super(1, 2, 3) RUBY
  172. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper def

    on_super(arguments) puts "super called with arguments: #{arguments}" end def on_zsuper puts "super called without arguments" end end Parser.new(<<~RUBY).parse super super(1, 2, 3) RUBY
  173. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper def

    on_super(arguments) puts "super called with arguments: #{arguments}" end def on_zsuper puts "super called without arguments" end end Parser.new(<<~RUBY).parse super super(1, 2, 3) RUBY super called without arguments super called with arguments:
  174. twitter.com/kddnewton Parsing Ruby method_call : | keyword_super paren_args { /*%%%*/

    $$ = NEW_SUPER($2, &@$); /*% %*/ /*% ripper: super!($2) %*/ } | keyword_super { /*%%%*/ $$ = NEW_ZSUPER(&@$); /*% %*/ /*% ripper: zsuper! %*/ }
  175. twitter.com/kddnewton Parsing Ruby method_call : | keyword_super paren_args { /*%%%*/

    $$ = NEW_SUPER($2, &@$); /*% %*/ /*% ripper: super!($2) %*/ } | keyword_super { /*%%%*/ $$ = NEW_ZSUPER(&@$); /*% %*/ /*% ripper: zsuper! %*/ }
  176. twitter.com/kddnewton Parsing Ruby method_call : | keyword_super paren_args { /*%%%*/

    $$ = NEW_SUPER($2, &@$); /*% %*/ /*% ripper: super!($2) %*/ } | keyword_super { /*%%%*/ $$ = NEW_ZSUPER(&@$); /*% %*/ /*% ripper: zsuper! %*/ }
  177. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper def

    on_super(arguments) puts "super called with arguments: #{arguments}" end def on_zsuper puts "super called without arguments" end end Parser.new(<<~RUBY).parse super super(1, 2, 3) RUBY
  178. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper::SexpBuilderPP def

    on_super(arguments) puts "super called with arguments: #{arguments}" end def on_zsuper puts "super called without arguments" end end Parser.new(<<~RUBY).parse super super(1, 2, 3) RUBY
  179. twitter.com/kddnewton Parsing Ruby require "ripper" class Parser < Ripper::SexpBuilderPP def

    on_super(arguments) puts "super called with arguments: #{arguments}" end def on_zsuper puts "super called without arguments" end end Parser.new(<<~RUBY).parse super super(1, 2, 3) RUBY super called without arguments super called with arguments: [:arg_paren, [:args_add_block, [ [:@int, "1", [2, 6]], [:@int, "2", [2, 9]], [:@int, "3", [2, 12]] ], false]]
  180. None
  181. • Implement every method handler yourself

  182. • Implement every method handler yourself • Inherit from Ripper::SexpBuilder

    or Ripper::SexpBuilderPP
  183. • Implement every method handler yourself • Inherit from Ripper::SexpBuilder

    or Ripper::SexpBuilderPP • Some combination of both
  184. twitter.com/kddnewton Parsing Ruby # frozen_string_literal: true require 'ripper' class Prettier::Parser

    < Ripper # Represents a line in the source. If this class is being used, it means that # every character in the string is 1 byte in length, so we can just return the # start of the line + the index. class SingleByteString def initialize(start) @start = start end def [](byteindex) @start + byteindex end end # Represents a line in the source. If this class is being used, it means that # there are characters in the string that are multi-byte, so we will build up # an array of indices, such that array[byteindex] will be equal to the index # of the character within the string. class MultiByteString def initialize(start, line) @indices = [] line .each_char .with_index(start) do |char, index| char.bytesize.times { @indices << index } end end def [](byteindex) @indices[byteindex] end end class Location attr_reader :start_line, :start_char, :end_line, :end_char def initialize(start_line:, start_char:, end_line:, end_char:) @start_line = start_line
  185. twitter.com/kddnewton Parsing Ruby ending = find_scanner_event(:@tstring_end) { type: :xstring_literal, body:

    xstring[:body], loc: xstring[:loc].to(ending[:loc]) } end end # yield is a parser event that represents using the yield keyword with # arguments. It accepts as an argument an args_add_block event that # contains all of the arguments being passed. def on_yield(args_add_block) event = find_scanner_event(:@kw, 'yield') { type: :yield, body: [args_add_block], loc: event[:loc].to(args_add_block[:loc]) } end # yield0 is a parser event that represents the bare yield keyword. It has # no body as it accepts no arguments. This is as opposed to the yield # parser event, which is the version where you're yielding one or more # values. def on_yield0 event = find_scanner_event(:@kw, 'yield') { type: :yield0, body: event[:body], loc: event[:loc] } end # zsuper is a parser event that represents the bare super keyword. It has # no body as it accepts no arguments. This is as opposed to the super # parser event, which is the version where you're calling super with one # or more values. def on_zsuper event = find_scanner_event(:@kw, 'super') { type: :zsuper, body: event[:body], loc: event[:loc] } end end
  186. Syntax tree options

  187. twitter.com/kddnewton Parsing Ruby ruby_parser • Some community adoption • Not

    100% compatible, new stu ff may break
 • Doesn’t ship with/test with core
  188. twitter.com/kddnewton Parsing Ruby parser • Tons of community adoption
 •

    Well-documented
 • Not 100% compatible, new stu ff may break
 • Doesn’t ship with/test with core
  189. twitter.com/kddnewton Parsing Ruby RubyVM::AbstractSyntaxTree • Still too early to tell


    • Not implemented anywhere else
  190. twitter.com/kddnewton Parsing Ruby ripper • Built into the parser generator


    • Well-tested in core
 • Ships with Ruby
 • No documentation
  191. twitter.com/kddnewton Parsing Ruby ripper • Built into the parser generator


    • Well-tested in core
 • Ships with Ruby
 • No documentation
  192. twitter.com/kddnewton Parsing Ruby

  193. Parsing Ruby Kevin Newton https://kddnewton.com/parsing-ruby https://kddnewton.com/ripper-docs