Slide 1

Slide 1 text

Rearchitect Ripper May 16, 2024 in RubyKaigi 2024 @yui-knk Yuichiro Kaneko

Slide 2

Slide 2 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan RubyKaigi 2023 was “Great Parser Era”

Slide 3

Slide 3 text

RubyKaigi 2023

Slide 4

Slide 4 text

7 sessions for Parser Day 1 Day 2 Day 3 LT

Slide 5

Slide 5 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan “Parser Renaissance”

Slide 6

Slide 6 text

RubyKaigi 2023 LT Introduced “parse.y” with 3 courses & 15 themes You must completely understood “parse.y”

Slide 7

Slide 7 text

Ripper was not covered !!! Ripper parts were ignored as comments !

Slide 8

Slide 8 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan EXTRA STAGE “Ripper” CHALLENGE 19

Slide 9

Slide 9 text

About me Yuichiro Kaneko yui-knk (GitHub) / spikeolaf (Twitter) Treasure Data Engineering Manager of Applications Backend CRuby committer, mainly develop parser generator and parser Lrama LALR (1) parser generator (2023, Ruby 3.3) Love LR parser

Slide 10

Slide 10 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan The Bison Slayer The parser monster The world is now in the great age of parsers. People are setting sail into the vast sea of parsers. - RubyKaigi 2023 LT- Yuichiro Kaneko https://twitter.com/kakutani/status/1657762294431105025/ NEW !!!

Slide 11

Slide 11 text

I was the weakest of the Big Four

Slide 12

Slide 12 text

“parse.y” Who's Who in 2024 3rd contributor for parse.y But I’m still on light side The Creator of Ruby The patch monster Me The Organizer of TRICK

Slide 13

Slide 13 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan What is Ripper?

Slide 14

Slide 14 text

S-expression Ripper provides an easy interface for parsing your program into a symbolic expression tree (or S- expression) https://github.com/ruby/ruby/blob/v3_3_0/ext/ripper/lib/ripper.rb

Slide 15

Slide 15 text

Ripper is … A Ruby script parser You can get information from the parser with event- based style Abstract syntax trees Simple lexical analysis https://github.com/ruby/ruby/blob/v3_3_0/ext/ripper/lib/ripper.rb

Slide 16

Slide 16 text

Low level interface Ripper provides “on_XXX” methods on_int, on_op, on_binary, on_stmts_new, … Simply count the number of method call

Slide 17

Slide 17 text

How low level interface works For example, “1 + 2” is provided on_int(“1”) is called when “1” is scanned 1 2 + 2 + 1 1 (“count 1”) 2 + on_int(“1”) is called and “count 1” is returned 1

Slide 18

Slide 18 text

How low level interface works on_int(“2”) is called when “2” is scanned 2 1 (“count 1”) 2 (“count 5”) + 1 (“count 1”) + on_int(“2”) is called and “count 5” is returned 2

Slide 19

Slide 19 text

How low level interface works on_binary is called when 1, +, 2 are reduced to arg Arguments are “count 1”, :+ and “count 5” Returned values are passed to another method call 1 (“count 1”) 2 (“count 5”) + arg (“count 6”) 1 (“count 1”) 2 (“count 5”) + on_binary("count 1", :+, "count 5") is called and “count 6” is returned

Slide 20

Slide 20 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan How Ripper is implemented?

Slide 21

Slide 21 text

How parse.y is used parse.y is a source of ripper.y parse.y parse.c parse.h Lrama ripper.y tool/id2token.rb tools/preproc.rb Lrama ripper.c

Slide 22

Slide 22 text

Comments in parse.y is transformed to C codes in ripper.y Comments are not comments !! parse.y is two-faced parse.y ripper.y

Slide 23

Slide 23 text

“dsl.rb” is fantastic “ext/ripper/tools/dsl.rb” “tools/preproc.rb” requires “dsl.rb” This is my favorite script !! Less than 100 lines but very hacky

Slide 24

Slide 24 text

https://twitter.com/yhara/status/1747252622636339207

Slide 25

Slide 25 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Longstanding bugs

Slide 26

Slide 26 text

Bug #10436 Ruby parser reports Syntax Error but Ripper doesn’t 9 years old bug

Slide 27

Slide 27 text

Jeremy’s challenge “This isn't a very clean way to fix it, but I was not able to figure out a way to fix it by modifying parse.y.”

Slide 28

Slide 28 text

Nobu’s challenge Make semantic value stack to manage callback value and parser value by using union. But it’s not a finisher.

Slide 29

Slide 29 text

Can we solve this issue if Nobu and Jeremy can not solve ?

Slide 30

Slide 30 text

Parser and Stack CRuby’s parser is LR parser LR parser is implemented as pushdown automaton LR parser manages semantic value with stack

Slide 31

Slide 31 text

Parser semantic value stack Parser manages Nodes on the stack arg 1 2 + NODE_INTEGER NODE_INTEGER NODE_OPCALL NODE_INTEGER NODE_INTEGER Stack to manage Node

Slide 32

Slide 32 text

Ripper semantic value stack Ripper manages Ruby Objects on the stack arg 1 2 + “count 1” “count 5” “count 6” #on_binary Stack to manage Ruby Object Call #on_binary method Return value

Slide 33

Slide 33 text

How to do semantic analysis block_dup_check function checks the existence of NODE_BLOCK_PASS and NODE_ITER Ripper can’t do the check because it doesn’t have nodes primary method_call brace_block NODE_FCALL NODE_ITER block_dup_check Stack to manage Node NODE_BLOCK_ PASS

Slide 34

Slide 34 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Think in simple

Slide 35

Slide 35 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan If single stack is not enough, then let use two stacks.

Slide 36

Slide 36 text

If tow stacks exist Ripper can manages Nodes and Ruby Objects arg 1 2 + NODE_INTEGE R NODE_INTEGE R NODE_OPCAL L NODE_INTEGE R NODE_INTEGE R Stack to manage Node “count 1” “count 5” “count 6” #on_binary Stack to manage Ruby Object Call #on_binary method Return value

Slide 37

Slide 37 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Bison provides only one stack

Slide 38

Slide 38 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan We use Lrama

Slide 39

Slide 39 text

Lrama’s new features 5 callbacks %after-shift %before-reduce %after-reduce %after-shift-error-token %after-pop-stack 1 new syntax $:n https://github.com/ruby/lrama/pull/367

Slide 40

Slide 40 text

How to use callbacks New Ripper uses Ruby’s Array as its stack In %after_shift, “rb_ary_push” the object to the stack 1 2 + NODE_INTEGER NODE_INTEGER “count 1” “count 2” rb_ary_push

Slide 41

Slide 41 text

How to use callbacks In %after_reduce, “rb_ary_pop” objects then “rb_ary_push” the object to the stack 1 2 + NODE_INTEGER NODE_INTEGER “count 1” “count 2” rb_ary_push

Slide 42

Slide 42 text

How to use $:n variable Need to access the Object on the stack, like $1 $:n is expanded to minus index integer 1 2 + NODE_INTEGER NODE_INTEGER “count 1” “count 2” $1 $3 ary[$:1] => ary[-3] ary[$:3] => ary[-1]

Slide 43

Slide 43 text

Bug #10436 is fi xed 🎉 https://github.com/ruby/ruby/pull/9923

Slide 44

Slide 44 text

Day 0: Night Cruise at RubyKaigi 2024 by ESM What do you think of recent parse.y ? Best parse.y in the last 10 years

Slide 45

Slide 45 text

Fix other bugs Bug #18988 Bug #20055 Unreported bugs, e.g. omitted warning for “if 1 then end”

Slide 46

Slide 46 text

Fix other problems Different functions were defined for parser and ripper Because the type of semantic value was different It was too difficult… https://github.com/ruby/ruby/blob/v3_3_0/parse.y Parser Ripper NODE NODE VALUE VALUE

Slide 47

Slide 47 text

Single name, two implementations Parser's call_bin_op was different from ripper’s call_bin_op Parser Ripper

Slide 48

Slide 48 text

Only one call_bin_op is needed Type of $1 is same in both parser and ripper VALUE, Ruby Object, is managed by $:1 Parser / Ripper Ripper NODE NODE VALUE VALUE

Slide 49

Slide 49 text

Bene fi ts of the re-architecture Current ripper is super set of parser It’s easy to follow up parser changes, like new syntax We can maintain Ripper Parser Ripper Parser Ripper

Slide 50

Slide 50 text

Conclusions If single stack is not enough, then let use two stacks.

Slide 51

Slide 51 text

Conclusions It’s fun to hack parser generator !!

Slide 52

Slide 52 text

Conclusions We can change the parser. Only on the Lrama.

Slide 53

Slide 53 text

Conclusions Best parse.y in the last 10 years

Slide 54

Slide 54 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan See you next time at Lrama

Slide 55

Slide 55 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Thank you !!!

Slide 56

Slide 56 text

“dsl.rb” is fantastic “ext/ripper/tools/dsl.rb” “tools/preproc.rb” requires “dsl.rb” This is my favorite script !! Less than 100 lines but very hacky

Slide 57

Slide 57 text

“dsl.rb” is fantastic !

Slide 58

Slide 58 text

“dsl.rb” is fantastic !!

Slide 59

Slide 59 text

“dsl.rb” is fantastic !!!

Slide 60

Slide 60 text

References “Ruby Parser։ൃ೔ࢽ (18) - Rearchitect Ripper”, February 2024. https://yui- knk.hatenablog.com/entry/2024/02/24/104944

Slide 61

Slide 61 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Thank you !!!