Slide 1

Slide 1 text

The grand strategy of Ruby Parser May 15, 2024 in RubyKaigi 2024 @yui-knk Yuichiro Kaneko

Slide 2

Slide 2 text

About me Yuichiro Kaneko yui-knk (GitHub) / spikeolaf (Twitter) Treasure Data Engineering Manager of Applications Backend

Slide 3

Slide 3 text

PR: We are Gold sponsor!

Slide 4

Slide 4 text

TD and Ruby committers twitter: @nalsh GitHub: @nurse twitter: @k_tsj GitHub: @k-tsj twitter: @ spikeolaf GitHub: @yui-knk twitter: @mineroaoki GitHub: @aamine twitter: @nahi GitHub: @nahi Applications Backend

Slide 5

Slide 5 text

Attendees from TD @spikeolaf @nalsh @k_tsj @makimoto @ citystar (GH) @ybiquitous @a_ksi19 @tomog105

Slide 6

Slide 6 text

About me Yuichiro Kaneko yui-knk (GitHub) / spikeolaf (Twitter) CRuby committer, mainly develop parser generator and parser Lrama LALR (1) parser generator (2023, Ruby 3.3) The Bison Slayer Ripper Rearchitecture (2024, Ruby 3.4) Code positions to RNode (2018, Ruby 2.6) RubyVM::AbstractSyntaxTree (2018, Ruby 2.6)

Slide 7

Slide 7 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Introduction “௕ཱྀ͍ߦʹඞཁͳͷ͸େ͖ͳΧόϯ͡Όͳ͘ɺޱͣ͞ΊΔҰ ͭͷՎ͞” - εφϑΩϯ -

Slide 8

Slide 8 text

Parser in Ruby Converting input script into Abstract Syntax Tree CRuby’s parser is LALR parser CRuby uses GNU Bison Lrama to generate parser codes

Slide 9

Slide 9 text

How to create parser? Use parser generator Lrama (CRuby) Bison (Perl, PHP, PostgreSQL) ANTLR (Hive, Trino) Hand written parser Go, Rust, C# Prism

Slide 10

Slide 10 text

What is parser generator? The tool generates parser from grammar file, “parse.y”. parse.c Lrama parse.y

Slide 11

Slide 11 text

“parse.y” Write grammar rule with BNF (Backus–Naur form)

Slide 12

Slide 12 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan The grand strategy of Ruby Parser “ੜ͖Δͬͯ͜ͱ͸ɺฏ࿨ͳ΋ͷ͡Όͳ͍ΜͰ͢Α” - εφϑΩϯ -

Slide 13

Slide 13 text

Grand Strategy > Grand strategy or high strategy is a state's strategy of how means (military and nonmilitary) can be used to advance and achieve national interests in the long- term. https://en.wikipedia.org/wiki/Grand_strategy

Slide 14

Slide 14 text

Interests in the long-term Provide platform for LSP and other tools Provide Universal parser Keep both Ruby grammar and parser to be maintainable

Slide 15

Slide 15 text

Why LR parser is the best? LR parser Can handle large range of languages Major parser algorithm To be precise, LR-attributed grammar I believe grammar easy for human is close to LR grammar LL parser Has has less power than LR parser PEG It’s difficult to create Error Tolerant parser A rule failure doesn’t imply a parsing failure like in context free grammars

Slide 16

Slide 16 text

Why LR parser generator is the best? LR parser generator gives accurate feedback for grammar BNF is very declarative No gap between grammar and parser implementation LR parser is based on theory of computer science

Slide 17

Slide 17 text

1. Accurate feedback for grammar https://bugs.ruby-lang.org/issues/18080

Slide 18

Slide 18 text

It’s possible to implement > but nobu said it's hard to support because of parse.y limitation. No, it’s possible!! https://github.com/yui-knk/ruby/tree/bugs_18080

Slide 19

Slide 19 text

Need to consider these patterns There is an argument or not The arguments are sounded by parentheses or not There is block or not The symbol of pattern matching, `in` or `=>`

Slide 20

Slide 20 text

There is one combination which is suspicious Need to consider these patterns

Slide 21

Slide 21 text

Con fl icts with existing grammar There is no block The arguments are not sounded by parentheses The symbol of pattern matching is `=>`

Slide 22

Slide 22 text

LR parser generator knows this issue S/R or R/R conflict detection is a friend for programming language designer

Slide 23

Slide 23 text

Why this issue is dif fi cult to detect? Need to check all combination of grammar rules Discussion of grammar and implementation of parser are localized

Slide 24

Slide 24 text

Combination of grammar rules A lot of rules are optional Argument is optional Parentheses around arguments are optional Block is optional (The symbol of pattern matching, `in` or `=>) Need to discuss grammar rules as group E.g. “a == b”, “1 + 2” and “1..2” are in same “arg” group If change “arg” rules, need to consider the impact on “expr” and “stmt” too

Slide 25

Slide 25 text

Localized discussion & implementation Examples in a ticket is simple Parser implementation is a combination of parts Parser generator: combination of rules Recursive Descent Parser: combination of functions, e.g. “parse_pattern_matching”, “parse_arguments”

Slide 26

Slide 26 text

Localized discussion & implementation Localized discussion and implementation are good practice Divide the difficulties However it requires mechanism to integrate these parts LR parser generator has the mechanism, conflict detection Hand written parser doesn’t have such mechanism

Slide 27

Slide 27 text

Ruby grammer evolves Ruby grammar will change New syntax will be added Existing syntax will change by feedback Parser generator works as checker/linter for grammar Can not keep soundness of grammar without the help from computer science

Slide 28

Slide 28 text

2. BNF is very declarative parse_conditional in prism.c

Slide 29

Slide 29 text

BNF is very declarative parse.y: 28 lines Logic for parsing and others are separated prism.c: 127 lines Logic are mixed up Commit 96710a3 (Sat May 4 18:03:52 2024 +0900)

Slide 30

Slide 30 text

3. Gap between grammar and parser With parser generator, parser follows grammar With hand written parser, grammar follows parser implementation

Slide 31

Slide 31 text

Precedence difference The result of this code is different between parse.y and prism How modifier rescue & if is combined is different Commit 68b6fe7 (Sat May 11 02:19:38 2024 +0900)

Slide 32

Slide 32 text

Is this bug or intentional? With parser generator, let’s discuss about grammar With hand written parser, it’s impossible to distinguish parser’s bug from grammar problem

Slide 33

Slide 33 text

4. Based on theory of computer science Many books are published for LR parser, formal language and automaton

Slide 34

Slide 34 text

We can share the knowledge Joel Denny. “PSLR(1): Pseudo-Scannerless Minimal LR(1) for the Deterministic Parsing of Composite Languages”, May 2010. https://tigerprints.clemson.edu/cgi/ viewcontent.cgi?article=1519&context=all_dissertations Lukas Diekmann and Laurence Tratt. “Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers”, July 2020. https:// arxiv.org/pdf/1804.07133.pdf Joe Zimmerman “Practical LR Parser Generation”, Sep 2022 https://arxiv.org/pdf/2209.08383.pdf

Slide 35

Slide 35 text

Guidebooks for beginners Yuichiro Kaneko. “Ruby Parser։ൃ೔ࢽ (14) - LR parser׬શʹཧղ͠ ͨ”, December 2023. https://yui-knk.hatenablog.com/entry/ 2023/12/06/082203 shioimm/coe401_. “ͨͷ͍͠RubyͷߏจղੳπΞʔ”, March 2023. https://speakerdeck.com/coe401_/tanosiirubynogou-wen-jie-xi- tua aamine. “Rubyιʔείʔυ׬શղઆ” ୈ 2 ෦ʮߏจղੳʯ, July 2004. https://i.loveruby.net/ja/rhg/book/ [JA] https://ruby-hacking-guide.github.io/ [EN]

Slide 36

Slide 36 text

History of LR parser 1965: Donald E. Knuth invents LR parsing. “On the translation of languages from left to right” 1975: Yacc is published 1985: GNU Bison initial release 1989: Berkeley Yacc initial release 2006: GCC migrates it’s parser from Bison to hand- written recursive-descent parsers (C++ was 2004) 2015: Go migrates it’s parser from Bison to hand- written recursive-descent parsers

Slide 37

Slide 37 text

History of LR parser 2020: “Don't Panic! Better, Fewer, Syntax Errors for LR Parsers” 2022: “Practical LR Parser Generation” 2023: “The future vision of Ruby Parser” in RubyKaigi 2023 2023: Lrama replaces Bison in CRuby 2023: New era of LR parser !! 2024: “The grand strategy of Ruby Parser” in RubyKaigi 2024

Slide 38

Slide 38 text

What is Lrama? Lrama is LR parser generator written by Ruby Lrama (Llama) inherits the name from Yacc (Yak) and Bison It’s LR parser generator, then not “LL”ama but “LR”ama Ruby uses Lrama to generate parser from 3.3

Slide 39

Slide 39 text

Why Lrama is needed? Bison is not perfect Therefore we hack parse.y We need more and more features Bison is not easy to enhance new features Ruby build system depends on Bison installed on your machine Lrama is installed into ruby/ruby tool directory then we can use latest features Bison is difficult to manage It was broken even though we didn't do anything when we released Ruby 2.7.7 Especially installing Bison on Windows is not easy task

Slide 40

Slide 40 text

Win “16th Fukuoka Ruby Award”

Slide 41

Slide 41 text

Summary The grand strategy of Ruby Parser Long term goals Provide platform for LSP and other tools Provide Universal parser Keep both Ruby grammar and parser to be maintainable Solution LR parser and parser generator are the best friends for Ruby Lrama is new foundation for Ruby parser instead of Bison

Slide 42

Slide 42 text

What’s the challenge? LSP parse.y for Undergraduate Universal Parser

Slide 43

Slide 43 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan LSP “ੈͷதͬͯɺ΄Μͱʹ͓΋͠Ζ͍΋ͷͶɻۜͷ͓΅Μͷ࢖͍ ํ͸Ұ͖ͭΓͩͱɺΈΜͳ͕৴͖ͯͨ͡ͷʹɺͥΜͥΜ΂ͭ ͷɺͣͬͱ͍͍࢖͍ํ͕͋ͬͨͷͶ” - ϜʔϛϯϚϚ -

Slide 44

Slide 44 text

Many tools need Syntax Tree Many tools uses Abstract Syntax Tree LSP, TypeProf, RuboCop … LSP TypeProf RuboCop Abstract Syntax Tree

Slide 45

Slide 45 text

User friendly node structure Current CRuby node structure was designed as Object. Therefore some types of node, e.g. NODE_OP_ASGN2, has difficult structure. Need to re-design each node structure to be more easily understandable. More information is needed More location information Comments But there is a blocker of this goal…

Slide 46

Slide 46 text

Delete parser level optimization Before VM and InstructionSequence were introduced, CRuby interpreted AST nodes directly. Therefore some optimizations happen on parser even today. Need to delete such optimizations so that the result of parse keeps the structure of input text. But there is a blocker of this goal too…

Slide 47

Slide 47 text

Union to Struct (Node) Any kinds of nodes share the single struct definition There is no flexibility to add new field to specific type of node It’s not straightforward to cast each field based on node type Need to change data structure from union base struct to dedicated struct for each node

Slide 48

Slide 48 text

Union to Struct (Node) ✅ This milestone is already achieved on master branch

Slide 49

Slide 49 text

Why Error-tolerant parser is need? LSP (Language Server Protocol) requires parser to parse invalid code as far as possible Just raising syntax error is not enough in this case

Slide 50

Slide 50 text

Error tolerance parser CPCT+ algorithm solves the problem Insert/Delete/Shift operations based error recovery Lukas Diekmann and Laurence Tratt. “Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers”, July 2020 https://arxiv.org/pdf/1804.07133.pdf

Slide 51

Slide 51 text

Insert (or delete) tokens to make input script to be syntax valid Leverage the fact LR parser has clear automaton structure How CPCT+ works? IF … END IF expr_value THEN compstmt if_tail END IF expr_value THEN … END true then

Slide 52

Slide 52 text

Error tolerance parser Integration to parse.y Delete operation support Efficient data structure (Cactuses) More accurate recovery

Slide 53

Slide 53 text

Universal Parser LSP parse.y for Under graduate Error tolerance Parser Generator (Lrama) Parser

Slide 54

Slide 54 text

LSP Delete parser level optimization Union to Struct (Node) User friendly node structure Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Error tolerance Parser Generator (Lrama) Parser ✅ 💪 Universal Parser parse.y for Under graduate

Slide 55

Slide 55 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan parse.y for Undergraduate “Ͳ͜ʹ΋ͳ͔ͬͨΒࣗ෼Ͱ࡞ͬͨΒͲ͏͍ͩʁ Ͱ͖Δ͔Ͳ͏͔͸΍ͬͯΈͳ͍ͱΘ͔Βͳ͍Μ ͡Όͳ͍͔ͳ” - εφϑΩϯ -

Slide 56

Slide 56 text

Rumors about parse.y Monstrous lex_state (2017) Demon Castle parse.y (2017) parse.y is “hell” (2019) The current parse.y is a hell (2021)

Slide 57

Slide 57 text

Gap between theory and practice Practice Theory vs.

Slide 58

Slide 58 text

It is one-sided match because the theory only covers one of The Big Five parse.y calamities Gap between theory and practice Theory Practice LR parser Lex State Semantic Analysis Primitive BNF Ripper

Slide 59

Slide 59 text

parse.y for Under graduate Developers, who has textbook level knowledge of parser theory, can understand parse.y More declarative parser More expressive grammar

Slide 60

Slide 60 text

More declarative parser “Scanner state update syntax” v.s. Semantic Analysis “Scannerless parser” v.s. Lex State Practice LR parser Lex State Semantic Analysis Primitive BNF Ripper

Slide 61

Slide 61 text

Is parse.y declarative? If parse.y is like below, it’s very declarative and easy to understand Grammar rule and logic for class node creation

Slide 62

Slide 62 text

Life is hard Manage local variables tables and contexts (lex_context)

Slide 63

Slide 63 text

Where return can be written “return” can not be in the class context but can be in method context Invalid return in class/ module body (SyntaxError) OK

Slide 64

Slide 64 text

What is lex_context Parser manages current context by lex_context These logic are written in action by C in_class: 1 in_def: 0 in_class: 1 in_def: 1

Slide 65

Slide 65 text

Scanner state update syntax These contexts are managed by push/pop operations They are effective in a rule, e.g. class or method definition Embed state management into grammar rule POC is implemented Want to introduce this feature to Ruby 3.4 https://github.com/ruby/lrama/pull/231

Slide 66

Slide 66 text

Monstrous lex_state Usually lexer is state less Ruby’s lexer has 13 state bits!

Slide 67

Slide 67 text

Why lex_state is needed In general lexer check input text in the longest match manner otherwise longer one never matches E.g. Check “||” then check “|”

Slide 68

Slide 68 text

Why lex_state is needed However in some cases, shorter token should be returned “|” for block parameter is two “|”

Slide 69

Slide 69 text

EXPR_BEG or not If lex state is EXPR_BEG then “|” is retuned otherwise “||” is retuned A lot of conditional branches based on lex state Too complecated “|” “||” Check lex state

Slide 70

Slide 70 text

Scannerless parser Is this explicit communication between parser and lexer really needed? Parser knows current situation Parser knows “||” is not accepted after “do”

Slide 71

Slide 71 text

PSLR (1) It seems good idea to integrate parser and lexer then change to manage states on parser side Joel E. Denny. “PSLR(1): Pseudo-Scannerless Minimal LR(1) for the Deterministic Parsing of Composite Languages”, May 2010. https://tigerprints.clemson.edu/cgi/ viewcontent.cgi?article=1519&context=all_dissertations

Slide 72

Slide 72 text

PSLR (1) > Nevertheless, traditional scanner and parser generators attempt to generate loosely coupled scanners and parsers, so the user must maintain these tightly coupled scanner and parser specifications separately but consistently. > Scanner and parser specifications would be significantly more maintainable if all sub-language transitions were instead computed from a grammar by a parser generator and recognized automatically by the scanner using the parser’s stack.

Slide 73

Slide 73 text

IELR PSLR is an extension of IELR Both PSLR and IELR are invented by Joel E. Denny IELR is more powerful than LALR

Slide 74

Slide 74 text

Day 3: 14:10 - 14:40

Slide 75

Slide 75 text

More expressive grammar “Parameterizing rules” v.s. Primitive BNF Practice LR parser Lex State Semantic Analysis Primitive BNF Ripper

Slide 76

Slide 76 text

Primitive BNF In parse.y, we need to write all grammar rules by hand There is “common pattern” in grammar rules e.g. Optional part, list No way to abstract it

Slide 77

Slide 77 text

Parameterizing rules If you are familiar with parser generator, Menhir in OCaml is a good example

Slide 78

Slide 78 text

Day 2: 11:30 - 12:00

Slide 79

Slide 79 text

Refactoring Ripper ✅ “User define stack” v.s. Ripper Practice LR parser Lex State Semantic Analysis Primitive BNF Ripper

Slide 80

Slide 80 text

Day 2: 17:20 - 18:20 (LT)

Slide 81

Slide 81 text

Replace hand written parser with Racc ✅ Need to enhance new features to Lrama Need to expand Lrama grammar Parser generator is the best friend https://github.com/ruby/lrama/pull/62

Slide 82

Slide 82 text

Summary Defeat calamities by more powerful theory, abstraction and Refactoring Practice LR parser Lex State Semantic Analysis Primitive BNF Ripper Scanner state update syntax PSLR / IELR Parameterizing rules Refactoring Ripper

Slide 83

Slide 83 text

LSP Delete parser level optimization Union to Struct (Node) User friendly node structure Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Error tolerance Parser Generator (Lrama) Parser ✅ 💪 Universal Parser parse.y for Under graduate

Slide 84

Slide 84 text

Refactoring Ripper LSP Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ 💪 💪 💪 💪 Universal Parser

Slide 85

Slide 85 text

Deep forest - parse.y - yui-knk Lv. 5 HP 122 MP 16 ydah Lv. 5 HP 114 MP 20 junk0612 Lv. 5 HP 140 MP 19 ydah & junk0612 join to the party 🎉

Slide 86

Slide 86 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Universal Parser “ॱংਖ਼࢝͘͠ΊΔͱ͍͏͜ͱ͸ɺຊ͕ୈҰߦ͔Β ࢝·Δͷͱಉ͡Α͏ʹɺେ੾ͳ͜ͱ͔ͩΒͶɻສ ࣄ͕ͦΕͰܾ·ΔΜͩΑ” - Ϝʔϛϯύύ -

Slide 87

Slide 87 text

Universal Parser Everyone wants to use CRuby parser mruby, PicoRuby: Other Ruby implementation by C JRuby, TruffleRuby, ruruby: Other Ruby implementation by non-C Implementing 100 % compatible Ruby parser is a bit difficult Managing parser for each version is difficult

Slide 88

Slide 88 text

What’s is the challenge? CRuby parser depends on other CRuby functionaries !!! lexer & parser GC RString RArray RHash … rb_mRubyVMFrozen Core struct rb_iseq_struct * Ruby

Slide 89

Slide 89 text

Decouple AST from imemo Ruby AST structure is managed by GC as imemo object imemo is “Internal memo object” managed by GC Ruby’s GC is useful. It frees memory which is not used anymore Before this goal, it needs to remove objects from nodes Objects on nodes are GC marked via AST structure

Slide 90

Slide 90 text

Dependencies of GC mark “str” :sym

Slide 91

Slide 91 text

Remove Object from Node Many kinds of nodes refer to Ruby Object MATCH, LIT, STR, DSTR, XSTR, DXSTR, DREGX, DSYM This challenge depends on “Refactoring Ripper” challenge Node RIPPER and RIPPER_VALUES also refer to Ruby Object

Slide 92

Slide 92 text

Both of them are resolved

Slide 93

Slide 93 text

Day 2: 14:10 - 14:40

Slide 94

Slide 94 text

Day 2: 17:20 - 18:20 (LT)

Slide 95

Slide 95 text

The rest of the work ID ID is a number having a one-to-one association with a string. Warnings, error messages rb_str_vcatf

Slide 96

Slide 96 text

Refactoring Ripper LSP Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ 💪 💪 💪 💪

Slide 97

Slide 97 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪

Slide 98

Slide 98 text

Dark cave - Universal Parser - yui-knk Lv. 23 HP 394 MP 52 ydah Lv. 23 HP 361 MP 51 junk0612 Lv. 23 HP 340 MP 56 hasumikin Lv. 22 HP 355 MP 54 S-H-GAMELINKS Lv. 22 HP 425 MP 46 hasumikin & S-H-GAMELINKS join to the party 🎉

Slide 99

Slide 99 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Other challenges “ͳʹ͔ͨΊͯ͠ΈΑ͏ͬͯͱ͖ʹ͸ɺͲ͏ͨͬͯ͠ةݥ͕ͱ ΋ͳ͏Μͩ” - εφϑΩϯ -

Slide 100

Slide 100 text

Optimize Node memory management ✅ In the past all types of nodes used single struct Some nodes, NODE_TRUE, allocated more memory than needed This issue was solved when "Union to Struct” is achieved

Slide 101

Slide 101 text

RBS Adding RBS files to Lrama parser generator so that the implementation is more clear Led by Yla Aioi (Little-Rubyist) https://github.com/ruby/lrama/pull/417

Slide 102

Slide 102 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪

Slide 103

Slide 103 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪

Slide 104

Slide 104 text

The End Of Time yui-knk Lv. 31 HP 562 MP 68 ydah Lv. 30 HP 514 MP 67 junk0612 Lv. 31 HP 578 MP 64 hasumikin Lv. 29 HP 448 MP 68 S-H-GAMELINKS Lv. 28 HP 565 MP 60 Little-Rubyist Lv. 28 HP 442 MP 66 Little-Rubyist joins to the party 🎉

Slide 105

Slide 105 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan

Slide 106

Slide 106 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪

Slide 107

Slide 107 text

Is it really “Universal” Parser? Current Universal Parser is implemented by C Each programing language needs to integrated it Parser implemented by C C Java JavaScript Dynamic link JNI/JNA FFI Other languages wasm

Slide 108

Slide 108 text

True Universal Parser Higher order Universal Parser will provide parser written by each programing language Grammar C Java JavaScript Parser (C) Parser (Java) Parser (JavaScript) Parser Generator Generate Other languages Parser (Other languages)

Slide 109

Slide 109 text

Ruby parser by Ruby We can use the latest Ruby syntax in *.rbinc source files if parse.y can be transformed to Ruby parser array.rb array.rbinc (C fi le) mk_builtin_loader.rb (ripper) + baseruby parse.y parse.rb Lrama + baseruby array.rb array.rbinc (C fi le) parse.rb + baseruby

Slide 110

Slide 110 text

Programing language for action Remove C codes from actions Replace them with simple new language

Slide 111

Slide 111 text

Type inference for action It obvious that “$$” in action has same type with “tail” The type of “tail” is same with “block_args_tail” The type of “block_args_tail” is “node_args”

Slide 112

Slide 112 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Conclusion “৺ͷܨ͕ͬͨ஥ؒͦ͜ɺϧϏʔʹ΋উΔඒ͍͠ϧϏʔ͞” - εφϑΩϯ -

Slide 113

Slide 113 text

The grand strategy of Ruby Parser The grand strategy of Ruby Parser Long term goals Provide platform for LSP and other tools Provide Universal parser Keep both Ruby grammar and parser to be maintainable Solution LR parser and parser generator are the best approach for Ruby

Slide 114

Slide 114 text

Development cycle with parser generator Designer can focus on grammar Parser generator gives correct feedback Parser generator evolves Independently from grammar Programming Language Designer Grammar Parser Generator Parser Cactuses data structures Comopact data structures Panic Mode CPCT+ LALR IELR PSLR Design Input Generate Feedback Develop new features

Slide 115

Slide 115 text

Universal Parser LSP parse.y for Under graduate Error tolerance Parser Generator (Lrama) Parser

Slide 116

Slide 116 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser

Slide 117

Slide 117 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪 💪

Slide 118

Slide 118 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪 💪

Slide 119

Slide 119 text

Ruby 3.4 Universal Parser Fine-grained Error Tolerance parser Compatibility layer for Prism interfaces

Slide 120

Slide 120 text

Need your help ! LSP milestones Compatibility layer for Prism interfaces

Slide 121

Slide 121 text

Acknowledgements @nobu, @nurse and other committers ESM, Inc. Parser Club Dragon Book study group

Slide 122

Slide 122 text

Acknowledgements Contributors and supporters for Lrama and parse.y Junichi Kobayashi (@junk0612) Yudai Takada (@ydah_) Hitoshi HASUMI (@hasumikin) S.H. (@S-H-GAMELINKS) Yla Aioi (@Little_Rubyist)

Slide 123

Slide 123 text

References Lrama LALR (1) parser generator https://github.com/ruby/lrama Yuichiro Kaneko. “Ruby Parser Roadmap”, https://docs.google.com/presentation/d/ 1E4v9WPHBLjtvkN7QqulHPGJzKkwIweVfcaMsIQ984_Q Yuichiro Kaneko. “Ruby Parser։ൃ೔ࢽ (12) - LR parser generatorͷఏڙ͢Δจ๏ͷ݈શੑ”, September 2023. https://yui-knk.hatenablog.com/entry/2023/09/19/191135 Yuichiro Kaneko. “Ruby Parser։ൃ೔ࢽ (14) - LR parser׬શʹཧղͨ͠”, December 2023. https://yui-knk.hatenablog.com/entry/2023/12/06/082203 Lukas Diekmann and Laurence Tratt. “Don’t Panic! Better, Fewer, Syntax Errors for LR Parsers”, July 2020. https://arxiv.org/pdf/1804.07133.pdf Joel E. Denny. “PSLR(1): Pseudo-Scannerless Minimal LR(1) for the Deterministic Parsing of Composite Languages”, May 2010. https://tigerprints.clemson.edu/cgi/viewcontent.cgi? article=1519&context=all_dissertations

Slide 124

Slide 124 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan Thank you !!!