Slide 1

Slide 1 text

LALR parser generatorͷ࡞Γํ January 20, 2024 BuriKaigi 2024 @yui-knk Yuichiro Kaneko

Slide 2

Slide 2 text

About me • Yuichiro Kaneko • yui-knk (GitHub) / spikeolaf (Twitter) • Treasure Data • Engineering Manager of Applications Backend • The author of ruby/lrama LALR parser generator • CRuby committer, mainly develop parser related features

Slide 3

Slide 3 text

PR: We are hiring!! • https://www.treasuredata.com/company/jobs/

Slide 4

Slide 4 text

Parserͱ͸ ͍ͧ͘๺཮ɺ͍ͧ͘෋ࢁ

Slide 5

Slide 5 text

Parserͷ໾ׂ • ೖྗ͞Εͨจࣈྻʹߏ଄Λ༩͑Δ Class Method Method Assignment @name Call Name capitalize

Slide 6

Slide 6 text

Lexerͷ໾ׂ • ॲཧܥ(ruby)͔ΒΈΔͱͨͩͷόΠτྻ • ·ͣจࣈྻΛద੾ͳ୯ҐͰ۠੾Δඞཁ͕͋Δ • Ruby͸͞·͟·ͳEncodingΛαϙʔτ͍ͯ͠Δ 636c61737320477265657465720a2020646566 20696e697469616c697a65286e616d65290a20 202020406e616d65203d206e616d652e636170 6974616c697a650a2020656e640a0a2020646 5662073616c7574650a2020202070757473202 248656c6c6f20237b406e616d657d21220a202 0656e640a656e640a class Greeter def

Slide 7

Slide 7 text

ParserͱLexer • Lexer͕tokenΛ੾Γग़͠ɺParser͕ߏ଄Խ͢Δ Class Method Method Assignm @name Call Name capitaliz class Greeter def Lexer Parser

Slide 8

Slide 8 text

͍·RubyͰParser͕೤͍ https://twitter.com/OssVision/status/1735433960191299602

Slide 9

Slide 9 text

͍·RubyͰParser͕೤͍ • Bison͕ਏ͍ • Bisonͷversion͸؀ڥʹΑͬͯҟͳΔ • ෳ਺ͷBisonͷversionΛαϙʔτ͢ΔͨΊʹɺ৽͍͠ػೳ͕࢖͑ͳ͍ • Rubyͷparser͸ෳࡶͰɺϝϯςφϯεੑ͕௕Β͘໰୊ࢹ͞Ε͖ͯͨ • Language Server Protocolͷ୆಄ • RubyͰ͸RBS(ܕ)΍RubyCop(੩తղੳث)ͱ͍ͬͨπʔϧ͕׆ൃʹ։ൃ ͞Ε͍ͯΔ • ͦͷ݁ՌɺϢʔβʔͷೖྗதͷϓϩάϥϜͱ͍͏ෆ׬શͳೖྗΛύʔε͢ Δඞཁ͕Ͱ͖ͯͨ (ΤϥʔτϨϥϯτͳύʔαʔ) • ͜ΕΒΛղܾ͢ΔͨΊʹParserʹେن໛ͳվળΛ͍Ε͍ͯΔ

Slide 10

Slide 10 text

Parserͷ࡞Γํ • खॻ͖parser • Parser GeneratorΛར༻ͯ͠ੜ੒͢Δ • Ruby Ͱ͸ͪ͜Βͷํ๏Λ࠾༻͍ͯ͠Δ • Yacc, Bison, ANTLR ͳͲ • Lrama΋parser generator

Slide 11

Slide 11 text

Parser Generator • ઃఆϑΝΠϧΛ΋ͱʹparserΛੜ੒͢Δπʔϧ • RubyͰ͸GNU BisonΛ͍··Ͱ࢖͖ͬͯͨ • Ruby 3.3Ͱ͸BisonΛLramaʹஔ͖׵͑ͨ • https://github.com/ruby/lrama ઃఆϑΝΠϧ parse.c Bison Lrama

Slide 12

Slide 12 text

ઃఆϑΝΠϧͷྫ • BNFͰจ๏Λهड़͢Δ • ͱͯ΋Θ͔Γ΍͍͢

Slide 13

Slide 13 text

Parser Generatorͷར఺ • จ๏͕ཧղ͠΍͍͢ • จ๏ఆٛͱύʔαʔͷ࣮૷ʹဃ཭͕ͳ͍ • จ๏ͷมߋʹରͯ͠ϑΟʔυόοΫΛಘΔ͜ͱ͕Ͱ͖Δ • ίϯϐϡʔλαΠΤϯεͷཧ࿦ʹج͍͍ͮͯΔ • ΤϥʔτϨϥϯτͳύʔαʔΛจ๏ఆ͔ٛΒࣗಈੜ੒Ͱ͖Δ

Slide 14

Slide 14 text

LR parser͸͍͍ͧ • Bisonʹ͍Ζ͍Ζͱػೳ͕ෆ଍͍ͯ͠ΔͷͰ • RubyͰLrama LR parser generatorΛ࣮૷ͯ͠ • RubyͰLramaΛ࢖͏Α͏ʹ͢Δ͘Β͍ʹਪ͍ͯ͠Δ

Slide 15

Slide 15 text

Parser Generatorͷ ࡞Γํ ෋ࢁʹண͍ͨʂ

Slide 16

Slide 16 text

ΞʔΩςΫνϟ • Frontend, Backend, Code Generator͔ΒͳΔ ઃఆϑΝΠϧ Parser Frontend Backend Code 
 Generator Parser Generator

Slide 17

Slide 17 text

Frontend • LexerͱParserΛ༻͍ͯઃఆϑΝΠϧΛ಺෦తͳσʔλߏ଄ʹม׵͢Δ • σʔλߏ଄ͷओ໾͸Rule ઃఆϑΝΠϧ ಺෦දݱ

Slide 18

Slide 18 text

Action • Actionͷ෦෼΋ղੳ͢Δඞཁ͕͋Δ • $$ͳͲͷಛघͳม਺Ͱtokenͷ஋΍Ґஔ৘ใʹΞΫηεͰ͖Δ • จ๏ϑΝΠϧͱ͸ผͷLexerΛ༻ҙ͢Δͷ͕Α͍ • ม਺͸Codeੜ੒࣌ʹparser internalͳม਺ʹஔ͖׵͑Δ

Slide 19

Slide 19 text

Backend • Rule͔ΒState MachineΛੜ੒͢Δ • ߏจղੳදͱ͍͏ͷ͸ཁ͢ΔʹΦʔτϚτϯ ߏจղੳද

Slide 20

Slide 20 text

• ֤RuleΛΦʔτϚτϯʹม׵͢Δ • શͯͷΦʔτϚτϯΛ߹੒ͨ͠΋ͷ͕ߏจղੳද LR parser͸stackΛ΋ͬͨDAF class A body end def m1 body end class B body end

Slide 21

Slide 21 text

LALRҎ֎ͷΞϧΰϦζϜ • ߏจղੳͷͨΊͷΦʔτϚτϯͷ࡞Γํ͸͍Ζ͍Ζ͋Δ • LR(0), SLR(1), LALR(1), LR(1), IELR(1) ͳͲͷΞϧΰϦζϜ͕͋Δ • ղੳՄೳͳݴޠ΍ඞཁͳϝϞϦ͕ͦΕͧΕҟͳΔ • Rule͔ΒΦʔτϚτϯΛ࡞ΔͷͰɺΞϧΰϦζϜͷબ୒͕จ๏ϑΝΠϧͷ γϯλοΫεͱಠཱ͍ͯ͠Δ

Slide 22

Slide 22 text

ߴ଎ͳLook-Aheadू߹ͷܭࢉ • LALR(1)Λ࣮૷͢Δͱ͖ʹ໰୊ʹͳΔͷ͕ɺޮ཰తʹLook-Aheadू߹Λ ܭࢉ͢Δ͜ͱ • “Ef fi cient Computation of LALR(1) Look-Ahead Sets” ͱ͍͏࿦จͷΞ ϧΰϦζϜΛ࢖༻͢ΔͱΑ͍ • https://dl.acm.org/doi/pdf/10.1145/69622.357187

Slide 23

Slide 23 text

Code Generator • State MachineΛλʔήοτͷݴޠʹ߹Θͤ ࣮ͯ૷͢Δ • tableΛࠓͷstateͱtokenͰݕࡧͯ࣍͠ʹ΍Δ ΂͖͜ͱΛܾΊΔ • shift, reduce, accept, error

Slide 24

Slide 24 text

Templateʹ஋ΛຒΊࠐΉ • ࣮ࡍʹ΍Δ͜ͱ͸ඞཁͳม਺Λ੔ཧͯ͠templateʹຒΊࠐΉ࡞ۀ • LramaͩͱERB, Bisonͩͱm4 • ERB͸Ғେ

Slide 25

Slide 25 text

εύʔε(ૄ)ͳߏจղੳද • ॎ͕ঢ়ଶ਺ɺԣ͕τʔΫϯͷछྨͱ͍͏େ͖ͳςʔϒϧ • ઌ΄Ͳͷྫͩͱ70/238Ϛε͔͠࢖͍ͬͯͳ͍ (29%͘Β͍)

Slide 26

Slide 26 text

εύʔε(ૄ)ͳߏจղੳද • ޓ͍ҧ͍ʹͯ͠1ͭͷ഑ྻʹ·ͱΊΔ • ίϯύΫτσʔλߏ଄ͰΑ͘ͳΒͳ͍ͩΖ͏͔?

Slide 27

Slide 27 text

ΞʔΩςΫνϟʔ • RuleͱState Machine͕ͦΕͧΕͷίϯϙʔωϯτؒͷΠϯλʔϑΣΠε ઃఆϑΝΠϧ ύʔαʔ Frontend Backend Code 
 Generator Rule State Machine

Slide 28

Slide 28 text

Parser GeneratorΛ ֦ு͢Δ ͔ʹඒຯͦ͠͏…

Slide 29

Slide 29 text

Named References • Tokenͷ஋ͳͲʹΞΫηε͢Δͱ͖ʹ$1, $2Ͱ͸ͳ͘ɺ$cpath, $bodyͱ ໊લͰΞΫηεͰ͖Δ • Lexerͷ࣮૷͚ͩͰ࣮ݱͰ͖ΔͷͰFrontend͚ͩͷมߋͰ࣮ݱͰ͖ͨ

Slide 30

Slide 30 text

Parameterizing Rules • ෳ਺ճͷ܁Γฦ͠ͱ͍͏ͷ͸จ๏ఆ্ٛΑ͘ग़ ͯ͘Δ • ॻ͖ํͷύλʔϯ͕ܾ·͍ͬͯΔͷͰ͋Ε͹ɺͦ ΕΛந৅Խͯ͠ॻ͖͍ͨ • LramaͰ͸࣮૷ͣΈ

Slide 31

Slide 31 text

Parameterizing Rules • ઃఆϑΝΠϧ͔ΒRuleͷσʔλߏ଄Λͭ͘Δͱ͖ʹల։͢Δ͚ͩͳͷͰɺ Frontend͚ͩͷมߋͰ࣮ݱͰ͖Δ

Slide 32

Slide 32 text

%after-shift • RipperͷΑ͏ͳ໘ന͍ػೳΛ࣮૷͠Α͏ͱ͢ΔͱShift͢Δॠؒ΍ Reduce͢ΔॠؒʹcallbackΛ͜͞͠Έͨ͘ͳΔ • Frontendͷparser/lexerͱCode GeneratorͷtemplateΛ͍͡Ε͹Ͱ͖ Δ

Slide 33

Slide 33 text

·ͱΊ ͦΖͦΖᲳ͕৯΂͍ͨ…

Slide 34

Slide 34 text

·ͱΊ • LR parser͸͍͍ͧ • LR parser generator͸3ͭͷίϯϙʔωϯτ͔ΒͳΓɺݴޠॲཧܥʹߏ଄͕ࣅ ͍ͯΔ • ෼ׂ͞Ε͍ͯΔͨΊػೳ௥Ճͷࡍʹඞཁͳίϯϙʔωϯτ͚ͩΛมߋ͢Ε͹͍͍ • Lrama parser generator͸ΨϯΨϯ։ൃத • ͔Ͷ͜ʹ͖ͬͰRubyͷparserͷ։ൃঢ়گΛ·ͱΊ͍ͯΔ • https://yui-knk.hatenablog.com/ • ruby-jpͱ͍͏slackͷ #lr-parser νϟωϧʹීஈ͍Δ

Slide 35

Slide 35 text

RubyKaigi 2024

Slide 36

Slide 36 text

Thank you!!