Slide 1

Slide 1 text

Ruby Parser progress report 2024 August 31, 2024 in RubyKaigi 2024 follow up @yui-knk Yuichiro Kaneko

Slide 2

Slide 2 text

About me Yuichiro Kaneko yui-knk (GitHub) / spikeolaf (Twitter) Treasure Data Engineering Manager of Applications Backend CRuby committer, mainly develop parser generator and parser Lrama LALR (1) parser generator (2023, Ruby 3.3) Love LR parser

Slide 3

Slide 3 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan The Bison Slayer The parser monster Parser界の黎明卿 The world is now in the great age of parsers. People are setting sail into the vast sea of parsers. - RubyKaigi 2023 LT- Yuichiro Kaneko https://twitter.com/kakutani/status/1657762294431105025/ NEW !!! NEW !!!

Slide 4

Slide 4 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan

Slide 5

Slide 5 text

The grand strategy of Ruby Parser The grand strategy of Ruby Parser Long term goals Provide platform for LSP and other tools Provide Universal parser Keep both Ruby grammar and parser to be maintainable Solution LR parser and parser generator are the best approach for Ruby

Slide 6

Slide 6 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪 💪

Slide 7

Slide 7 text

Universal Parser Decouple AST from imemo Remove Object from Node Refactoring Ripper LSP Optimize Node memory management Delete parser level optimization Union to Struct (Node) User friendly node structure parse.y for Under graduate More declarative parser Ef fi cient data structure (Cactuses) Delete operation support Integration to parse.y More accurate recovery Parameterizing rules Replace hand written parser with Racc User de fi ned stack Scanner state update syntax Scannerless parser IELR RBS Error tolerance Parser Generator (Lrama) Parser ✅ ✅ ✅ ✅ ✅ ✅ ✅ 💪 💪 💪 💪 💪 💪 💪

Slide 8

Slide 8 text

Ruby 3.4 Universal Parser Fine-grained Error Tolerance parser Compatibility layer for Prism interfaces

Slide 9

Slide 9 text

yui-knk/ast_to_prism https://github.com/yui-knk/ast_to_prism

Slide 10

Slide 10 text

構文木とは !? طଘͷߏจ໦ͱͲͷล͕ҧ͏ͷͩΖ͏ ? parser gem͸Ͳ͏͍͚ͯͨͬ͠ ?? ߏจ໦ʹਅཧͷΑ͏ͳ΋ͷ͸ͳ͍ͷ͔ ???

Slide 11

Slide 11 text

構文木の真理を求めて ϢʔεέʔεΛूΊΔ ઃܭΛߟ͑Δ ઃܭ͕ϢʔεέʔεΛͲͷ͘Β͍ΧόʔͰ͖͍ͯΔ͔ධՁ͢Δ

Slide 12

Slide 12 text

https://yui-knk.hatenablog.com/entry/2024/08/23/113543 3万字越え。本にできるのでは? https://x.com/inao/status/1827130119665938491

Slide 13

Slide 13 text

https://yui-knk.hatenablog.com/entry/2024/08/23/113543

Slide 14

Slide 14 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan 10分でわかる 構文木のユースケースと設計

Slide 15

Slide 15 text

構文木のユースケース ίʔυΛ࣮ߦ͍ͨ͠ compile.c ܕγεςϜ ίʔυΛղੳ͍ͨ͠ LSP (ruby-lsp) Linter & Code Formatter (RuboCop)

Slide 16

Slide 16 text

コードを実行したい ಉ͡ҙຯͷNode͸ͻͱͭʹ·ͱ·͍ͬͯͯ΄͍͠ if, ޙஔif, ࡾ߲ԋࢉࢠͷif => ಉ͡Node ߏจ໦͸্͔ΒԼ΁ḷΓ͍ͨ ίʔυͷҙຯʹؔ܎ͷͳ͍ཁૉ͸ෆཁ ۭനͱ͔վߦͱ͔Χοί͸ߏจ໦ʹ͸͍Βͳ͍ ͱݴ͍ͭͭඞཁʹԠ֦ͯ͡ு͖ͯͨ͠ UNLESS_NODEͷ௥Ճ (Ruby 2.5͔Β) ৄࡉͳҐஔ৘ใͷ௥Ճ (Ruby 2.5͔Β)

Slide 17

Slide 17 text

コードを解析したい τʔΫϯͷ৘ใ͕΄͍͠ (Syntax Highlight) ίϝϯτΛղੳ͍ͨ͠ (LSP DocumentLink) ߏจ໦Ͱࢠ͔Β਌ΛḷΓ͍ͨ (LSP SelectionRange) ίʔυΛॻ͖׵͍͑ͨ (LSP & Code Formatter)

Slide 18

Slide 18 text

子から親を辿りたい LSP SelectionRange ίʔυͷબ୒ൣғΛม͑Δػೳ

Slide 19

Slide 19 text

子から親を辿りたい NODE_CLASS NODE_DEF obj.m1

Slide 20

Slide 20 text

コードを書き換えたい Style::IfInsideElse ͱ͍͏Cop͕͋Δ ωετͨ͠ifͷ಺ଆʹ͋ΔifΛ֎ଆͷelsifʹ͢Δ https://github.com/rubocop/rubocop/blob/v1.65.1/lib/rubocop/cop/style/if_inside_else.rb#L10-L29

Slide 21

Slide 21 text

RuboCop の書き換え if condition_a action_a else if condition_b action_b else action_c end end if condition_a action_a elsif condition_b if condition_b action_b else action_c end end if condition_a action_a elsif condition_b action_b action_b else action_c end end if condition_a action_a elsif condition_b action_b action_b else action_c end if condition_a action_a elsif condition_b action_b else action_c end 1. elseΛelsif΁ 2. if condition_b Λ࡟আ 3. ༨෼ͳendΛ ࡟আ 4.ॏෳ͍ͯ͠Δaction_b Λ࡟আ

Slide 22

Slide 22 text

TreeRewriterの問題点 #1 ࣮૷͕ෳࡶ TreeRewriter͕௚઀จࣈྻΛॻ͖׵͑ΔΘ͚Ͱ͸ͳ͍ TreeRewriter::ActionͷΠϯελϯεΛͭͬͯ͘ɺ࠷ޙʹҰؾʹมߋΛՃ͑ Δ Action. :replace (2, 0)-(2, 4) “elsif condition_b” Action. :replace (3, 2)-(3, 16) “action_b” Action. :replace (7, 0)-(7, 6) “” Action. :replace (4, 0)-(4, 13) “”

Slide 23

Slide 23 text

Actionを用いる理由 #1 จࣈྻΛ౎౓ॻ͖׵͑Δͱίετ͕ߴ͍͔Β ͲͪΒͷέʔε΋elseҎ߱ͷจࣈྻΛҠಈ(ίϐʔ)͠ͳ͍ͱ͍ ͚ͳ͍ if condition_a action_a else action_b end if condition_a action_a action_b end elseΛ࡟আ if condition_a action_a else action_b end if condition_a action_a elsif action_b end elsifʹஔ׵

Slide 24

Slide 24 text

Actionを用いる理由 #2 จࣈྻΛ௚઀ॻ͖׵͑ΔͱଞͷϊʔυʹӨڹ͢Δ͔Β if condition_a action_a else action_b end Parser::Source::Bu ff er if condition_a action_a action_b end Parser::Source::Bu ff er NODE_VCALL action_b Range (3, 2)-(3, 10) elseΛ࡟আ

Slide 25

Slide 25 text

ॻ͖׵͑࣌ͷૢ࡞͕൥ࡶ ֤ૢ࡞ͰࠓͲͷΑ͏ͳঢ়ଶ͔ཧղ͠ͳ͍ͱ͍͚ͳ͍ TreeRewriterの問題点 #2 if condition_a action_a else if condition_b action_b else action_c end end if condition_a action_a elsif condition_b if condition_b action_b else action_c end end if condition_a action_a elsif condition_b action_b action_b else action_c end end if condition_a action_a elsif condition_b action_b action_b else action_c end if condition_a action_a elsif condition_b action_b else action_c end 1. elseΛelsif΁ 2. if condition_b Λ࡟আ 3. ༨෼ͳendΛ ࡟আ 4.ॏෳ͍ͯ͠Δaction_b Λ࡟আ

Slide 26

Slide 26 text

構文木を書き換えよう! ߏจ໦ͷߏ଄Λੜ͔͍ͨ͠ NODE_IFΛNODE_ELSIFʹͯ͠ɺNODE_ELSEΛ͚͢ NODE_IF condition_a action_a NODE_ELSE NODE_IF condition_b action_b NODE_ELSE action_c NODE_IF condition_a action_a NODE_ELSIF condition_b action_b action_c

Slide 27

Slide 27 text

構文木からソースコードへ ߏจ໦Λॻ͖׵͑ͨΒ͋ͱ͸ιʔείʔυΛੜ੒͢Δ͚ͩ ۭന΍վߦͳͲASTͰ͸͚͍ܽͯΔ৘ใ͕͋Δ NODE_IF condition_a action_a NODE_ELSIF condition_b action_b action_c

Slide 28

Slide 28 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan 簡単だとおもった?

Slide 29

Slide 29 text

問 1. 位置情報を与えよ NodeͷΠϯελϯε࡞੒࣌ʹRangeΛ౉͢ඞཁ͕͋Δ փ৭ͷ৽͍͠ίʔυͷҐஔ৘ใΛܭࢉͤΑ if condition_a action_a else if condition_b action_b else action_c end end if condition_a action_a elsif condition_b action_b else action_c end

Slide 30

Slide 30 text

問 1. 位置情報を与えよ ੜ੒͞ΕΔίʔυΛΠϝʔδ͠ͳ͕ΒҐஔ৘ใΛߟ͑ͳ͍ͱ͍ ͚ͳ͍ͷͰ൥ࡶ ͔ͤͬ͘ߏจ໦ͷॻ͖׵͑ʹͨ͠ͷʹʂ if condition_a action_a else if condition_b action_b else action_c end end if condition_a action_a elsif condition_b action_b else action_c end ։࢝Ґஔ͸ else ͷҐஔ ऴྃҐஔ(ߦ)͸ “action_cͷߦ - 1” ऴྃҐஔ(ΧϥϜ)͸ “action_cͷ຤ඌ - 2”

Slide 31

Slide 31 text

問 2. 他のNodeの位置を修正せよ มߋͨ͠ߏจ໦Ҏ߱ͷશͯͷϊʔυͷҐஔ৘ใ͕มԽ͢Δ ؔ৺ͷ͋ΔIF NODEͷܑఋཁૉʹ΋Өڹ͢Δ NODE_CLASS condition_a action_a NODE_ELSIF NODE_DEF NODE_IF NODE_DEF expr1 expr2 expr2 expr2 condition_b action_b action_c ߋ৽!!

Slide 32

Slide 32 text

Migration Tool ίʔυͷॻ͖׵͑͸LSP΍Code Formatter͚ͩͷ࢓ࣄͰ͸ͳ͍ TranspecͳͲͷmigration toolͰ΋ίʔυͷॻ͖׵͑Λߦ͏ඞཁ͕ ͋Δ http://yujinakayama.me/transpec/

Slide 33

Slide 33 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan 構文木の設計と実装

Slide 34

Slide 34 text

具象構文木: コードを復元する ۩৅ߏจ໦ (Concrete Syntax Tree: CST) ׅހͳͲASTͰ͸ࣦΘΕͯ͠·͏৘ใ΋࢒ͨ͠ߏจ໦ AST͕ҙຯ(Semantics)ʹ஫໨͍ͯ͠Δͷʹରͯ͠ɺCST͸ߏจ(Syntax)ʹ஫ ໨͍ͯ͠Δ ಛ௃ τʔΫϯΛද͢σʔλߏ଄Λಋೖ͢Δ lexerͰམͱͯ͠͠·͏৘ใΛτʔΫϯʹඥ͚ͮΔ ϊʔυ͕τʔΫϯΛ࣋ͭΑ͏ʹ͢Δ ߏจ໦͔Β΋ͱͷίʔυΛ׬શʹ෮ݩͰ͖ΔΑ͏ʹ͢Δ

Slide 35

Slide 35 text

具象構文木: コードを復元する ۩৅ߏจ໦ (Concrete Syntax Tree: CST) ׅހͳͲASTͰ͸ࣦΘΕͯ͠·͏৘ใ΋࢒ͨ͠ߏจ໦ AST͕ҙຯ(Semantics)ʹ஫໨͍ͯ͠Δͷʹରͯ͠ɺCST͸ߏจ(Syntax)ʹ஫ ໨͍ͯ͠Δ ಛ௃ τʔΫϯΛද͢σʔλߏ଄Λಋೖ͢Δ lexerͰམͱͯ͠͠·͏৘ใΛτʔΫϯʹඥ͚ͮΔ ϊʔυ͕τʔΫϯΛ࣋ͭΑ͏ʹ͢Δ ߏจ໦͔Β΋ͱͷίʔυΛ׬શʹ෮ݩͰ͖ΔΑ͏ʹ͢Δ ιʔείʔυͷ׬શͳ৘ใΛ࣋ͬͨߏจ໦

Slide 36

Slide 36 text

Trivia LexerͰམͱͯ͠͠·͏৘ใΛTriviaͱΑͿ ۭന, վߦ, ίϝϯτͳͲ͕Trivia Trivia (comment) Trivia (spaces) Trivia (new line)

Slide 37

Slide 37 text

Node, Token and Trivia τʔΫϯ͸લޙʹTriviaΛ΋ͭ ϊʔυ͸ϊʔυ / τʔΫϯΛ΋ͭ NODE_IF IF cond action_a END Token NODE ຌྫ space (1) NL (1) + space (2) NL (1) Trivia

Slide 38

Slide 38 text

構文木からコードへ ਂ͞༏ઌ୳ࡧͭͭ͠จࣈྻΛdump͢Δͱ΋ͱͷίʔυʹͳΔ Token NODE ຌྫ NODE_IF IF cond action_a END space (1) NL (1) + space (2) NL (1) Trivia

Slide 39

Slide 39 text

Red Green Tree: 編集容易な木 C# (Roslyn)ͷൃ໌ Swift (SwiftSyntax)΍ rust-analyzer (LSP)Ͱ΋࢖ΘΕ͍ͯΔ ߏจ໦ΛRed NodeͱGreen Nodeͱ͍͏2ͭͷσʔλߏ଄Ͱද ݱ͢Δ swift-syntaxΛಡ΋͏ʂ https://github.com/swiftlang/swift-syntax

Slide 40

Slide 40 text

Red Green Tree Green Node ࢠ΁ͷࢀরΛ΋ͭ ࣗ਎ͷ෯(width)Λ΋ͭ Red Node ਌΁ͷࢀরΛ΋ͭ Ґஔ৘ใ(offset)Λ΋ͭ Token Green NODE ຌྫ Red NODE NODE_IF width: 90 IF width: 3 NODE_IF width: 56 condition_a width: 11 action_a width: 11 NODE_ELSE width: 61 END width: 4 ELSE width: 5 NODE_IF o ff set: 0 NODE_ELSE o ff set: 25 NODE_IF o ff set: 30

Slide 41

Slide 41 text

Offsetを持っている場合 มԽͷ͋ͬͨϊʔυ/τʔΫϯͷޙଓͷ͢΂ͯͷཁૉʹӨڹ͢Δ ӨڹΛड͚Δཁૉ NODE_IF o ff set: 0 IF o ff set: 0 NODE_IF o ff set: 30 condition_a o ff set: 3 action_a o ff set: 14 NODE_ELSE o ff set: 25 END o ff set: 86 ELSE o ff set: 25 IF o ff set: 30 condition_b o ff set: 35 action_b o ff set: 46 -> 47 END o ff set: 79 NODE_ELSE o ff set: 59 action_c o ff set: 66 ELSE o ff set: 59 ߋ৽!!

Slide 42

Slide 42 text

幅を持っている場合 มԽͷ͋ͬͨϊʔυ/τʔΫϯͷ਌ཁૉʹӨڹ͕ݶΒΕΔ ࢠཁૉ͔ΒܭࢉՄೳ ӨڹΛड͚Δཁૉ NODE_IF width: 90 -> 91 IF width: 3 NODE_IF width: 56 -> 57 condition_a width: 11 action_a width: 11 NODE_ELSE width: 61 -> 62 END width: 4 ELSE width: 5 IF width: 5 condition_b width: 11 action_b width: 13 -> 14 END width: 7 NODE_ELSE width: 20 action_c width: 13 ELSE width: 7 ߋ৽!!

Slide 43

Slide 43 text

まとめ ίʔυΛ࣮ߦ͍ͨ͠: compile.c ίʔυΛղੳ͍ͨ͠: LSP, Linter & Code Formatter ςΩετϕʔεͰͷίʔυͷॻ͖׵͕͑೉͍͠ ߏจ໦ϕʔεͷίʔυॻ͖׵͑΁ ߏจ໦͔ΒίʔυΛ෮ݩ͍ͨ͠ ۩৅ߏจ໦ !! ߏจ໦ॻ͖׵͑ͷӨڹൣғΛڱ͍ͨ͘͠ Red Green Tree !!

Slide 44

Slide 44 text

May 15th - 17th, 2024 NAHA CULTURAL ARTS THEATER NAHArt, Okinawa, Japan 進捗が出た結果、 やることが増えた!!

Slide 45

Slide 45 text

話したいことはまだまだある Red Green Tree͸immutableͰpersistentͳσʔλߏ଄ มߋલޙͷߏจ໦ΛͦΕͧΕอ࣋͢Δ͜ͱͰࠩ෼ΛදࣔͰ͖Δ ΑΓޮ཰తʹߏจ໦Λॻ͖׵͑ΔͨΊʹ൚༻ͷRewriterΛఏڙ͢Δͷ͕Α͍ Open questions ηϛίϩϯɺվߦͳͲ͸trivia͔? ώΞυΩϡϝϯτΛ͏·͘ѻ͑Δ͔? LexerͱparserͷΠϯλʔϑΣΠε͸ࠓޙ΋෼཭͞Ε͍ͯΔͷ͕Α͍ͷ͔? ॻ͖׵͑ޙͷߏจ໦͕valid͔νΣοΫ͢Δ͜ͱ͸Ͱ͖Δ͔?