Slide 1

Slide 1 text

From LALR to IELR: A Lrama's Next Step Junichi Kobayashi (@junk0612) ESM, Inc. RubyKaigi 2024 Naha Cultural Arts Theater NAHArt 2024/05/17(Fri.)

Slide 2

Slide 2 text

Junichi Kobayashi ● X / GitHub: @junk0612 ● Working at ESM, Inc. ○ Work as a Rails engineer ○ A Member of Parser Club ● Hobbies ○ Parsers ○ Rhythm games, Board games ○ Haiku

Slide 3

Slide 3 text

Contributor of Lrama

Slide 4

Slide 4 text

Contributor of Lrama ● Became in Official Party !!! Committer

Slide 5

Slide 5 text

Night Cruise Sponsor

Slide 6

Slide 6 text

Night Cruise Sponsor

Slide 7

Slide 7 text

Attendees from ESM Attendee @kasumi8pon Attendee @mhirata Attendee @sfjwr Karaoke @fugakkbn Speaker @koic Attendee @wai-doi Speaker @junk0612 Attendee @haruguchi Attendee @htkymtks @S.H. LT

Slide 8

Slide 8 text

Overview of Lrama おきなさい。おきなさい わたしの かわいい ぼうや ……。 きょうは とても たいせつなひ。あなたが はじめて おしろに いくひ だったでしょう。 ―勇者の母親

Slide 9

Slide 9 text

Lrama ● A LALR parser generator built with Ruby ○ https://github.com/ruby/lrama ● Presented in RubyKaigi 2023 by Yuichiro Kaneko ○ https://youtu.be/IhfDsLx784g?si=kO1q6mLpTa1bIRYL ● Use in CRuby 3.3 build process ○ Use BASERUBY when building Ruby

Slide 10

Slide 10 text

Basis of LR Parser https://yui-knk.hatenablog.com/entry/2023/12/06/082203

Slide 11

Slide 11 text

My Contributions to Lrama Slide(JP, presented in Osaka RubyKaigi 03): https://speakerdeck.com/junk0612/lrama-henokontoribiyusiyonwotong-sitexu e-bu-ruby-nopasazieneretashi-qing Slide(EN, presented in RubyConf Taiwan 2023): https://speakerdeck.com/junk0612/understanding-parser-generators-surroun ding-ruby-with-contributing-lrama

Slide 12

Slide 12 text

My Contributions to Lrama https://www.ruby-lang.org/en/news/2023/12/25/ruby-3-3 -0-released/

Slide 13

Slide 13 text

What We Aim to Solve おお えにくす! ゆうしゃロトの ちをひくものよ! そなたのくるのをまっておったぞ。 ―ラルス16世

Slide 14

Slide 14 text

What We Aim to Solve ● Hard to manage the CRuby's lexer state

Slide 15

Slide 15 text

A Motivating Example $ ruby -e 'p 1.. || 2' #=> ???

Slide 16

Slide 16 text

$ ruby -e 'p 1.. || 2' -e:1: syntax error, unexpected '|' p 1.. || 2 -e: compile error (SyntaxError) A Motivating Example

Slide 17

Slide 17 text

$ ruby -e 'p 1.. || 2' -e:1: syntax error, unexpected '|' p 1.. || 2 -e: compile error (SyntaxError) A Motivating Example

Slide 18

Slide 18 text

Exec with Parser Info

Slide 19

Slide 19 text

$ ruby -y -e 'p 1.. || 2' | rg 'Next token' | uniq Next token is token "local variable or method" (1.0-1.1: p) Next token is token "integer literal" (1.2-1.3: 1) Next token is token ".." (1.3-1.5: ) Next token is token '|' (1.6-1.7: ) Next token is token '|' (1.7-1.8: ) Scanned Tokens

Slide 20

Slide 20 text

case '|': if ((c = nextc(p)) == '|') { (...snip...) if (IS_lex_state_for(last_state, EXPR_BEG)) { c = '|'; pushback(p, '|'); return c; } return tOROP; } yylex in parse.y

Slide 21

Slide 21 text

$ ruby -y -e 'p 1.. || 2' | rg 'Next token|lex_state' | uniq lex_state: NONE -> BEG at line 2195 lex_state: BEG -> CMDARG at line 10384 Next token is token "local variable or method" (1.0-1.1: p) lex_state: CMDARG -> END at line 9649 lex_state: END -> END at line 8930 Next token is token "integer literal" (1.2-1.3: 1) lex_state: END -> BEG at line 10872 Next token is token ".." (1.3-1.5: ) lex_state: BEG -> BEG at line 10789 Next token is token '|' (1.6-1.7: ) lex_state: BEG -> BEG|LABEL at line 10808 Next token is token '|' (1.7-1.8: ) lex_state

Slide 22

Slide 22 text

$ ruby -y -e 'p 1.. || 2' | rg 'Next token|lex_state' | uniq lex_state: NONE -> BEG at line 2195 lex_state: BEG -> CMDARG at line 10384 Next token is token "local variable or method" (1.0-1.1: p) lex_state: CMDARG -> END at line 9649 lex_state: END -> END at line 8930 Next token is token "integer literal" (1.2-1.3: 1) lex_state: END -> BEG at line 10872 Next token is token ".." (1.3-1.5: ) lex_state: BEG -> BEG at line 10789 Next token is token '|' (1.6-1.7: ) lex_state: BEG -> BEG|LABEL at line 10808 Next token is token '|' (1.7-1.8: ) lex_state

Slide 23

Slide 23 text

Problems of Changing Lexer ● Hard to identify the scope of impact of the change ○ parse.y is over 16k LOC ○ lex_state is not the only variable that determines lexer behavior

Slide 24

Slide 24 text

Problems of Changing Lexer ● Change cost for future maintenance will almost certainly increase ○ parse.y is always the most readable now ○ (Lrama tries to decrease these cost)

Slide 25

Slide 25 text

Parser and Lexer ● Academically, parser and lexer are considered separable ● In reality, however, the state of the lexer is affected by the state of the parser ○ The same sequence of symbols can have different meanings depending on where they are written, and the tokens you want to cut out are different

Slide 26

Slide 26 text

● || ○ a || b -> '||' ○ ary.each {|| do_something } -> '|' '|' ● <<- ○ p <<-HEREDOC -> '<<-' ○ [] <<-1 -> '<<' '-' ● %s{a} ○ p %s{a} -> '%s{' 'a' '}' ○ 1 %s{a} -> '%' 's' '{' 'a' '}' Examples

Slide 27

Slide 27 text

Scannerless Parser やっぱり修行で得た力と言うのは 他人のために使うものだと私 は思います。 ―アバン

Slide 28

Slide 28 text

Ruby Parser Roadmap https://docs.google.com/presentation/d/1E4v9WPHBLjtvkN7QqulHPGJzKkwIweVfcaMsIQ984_Q/edit?usp=sharing

Slide 29

Slide 29 text

Scannerless Parser ● Performs tokenization and parsing in a single step ● Infer acceptable tokens for each states from the grammar file ○ Lexer state management is done by the computer, not the developer Scannerless Parser Lexer Parser

Slide 30

Slide 30 text

PSLR ● Pseudo-scannerless Minimal LR ● https://tigerprints.clemson.edu/all_dissertations/519/ ● Tokenize only tokens that are acceptable in the current context by pseudo lexer

Slide 31

Slide 31 text

Ruby Parser Roadmap https://docs.google.com/presentation/d/1E4v9WPHBLjtvkN7QqulHPGJzKkwIweVfcaMsIQ984_Q/edit?usp=sharing

Slide 32

Slide 32 text

PSLR ● LALR has insufficient parsing capabilities for the parser part of PSLR ● It requires a LR algorithm with the same parsing capability as Canonical LR ● There is a difference in parsable grammar between Canonical LR and LALR

Slide 33

Slide 33 text

Canonical LR ● Perform a reduction on a rule when the next token is included by the lookahead set ● Widest range of languages that can be parsed by the LR method ● Huge number of states are created ● Large space complexity required

Slide 34

Slide 34 text

LALR (LookAhead LR) ● Merge states with a same core from the Canonical LR automaton ● Slightly less languages can be parsed than Canonical LR ● Compared to Canonical LR, merging the states may cause some conflicts ○ These are called "Mysterious Conflicts" in the document of GNU Bison

Slide 35

Slide 35 text

IELR Overview そして えにくす。どんなに はなれていても オレたちは 友だちだよな! ―キーファ

Slide 36

Slide 36 text

IELR ● Inadequacy Elimination LR ● https://www.sciencedirect.com/science/article/pii/S01676423 09001191 ● As a part of preliminary works of the PSLR paper ● Bridging the gap between LALR and Canonical LR while taking advantage of LALR's strengths of small space complexities

Slide 37

Slide 37 text

IELR Concepts ● Create a parser table for LALR ● Recompute the lookahead sets for each states from the start state ● Verify that the state merge did not cause any Mysterious Conflicts using the original lookahead set and the recomputed lookahead set

Slide 38

Slide 38 text

Implement IELR parser だから 頼んだぜ 勇者さま。オレにも見せてくれよな。 魔王をぶっ倒す 勇者の奇跡ってヤツをさ。 ―カミュ

Slide 39

Slide 39 text

IELR Implementation in Lrama def split_states (...snip...) transition_queue = [] @states.first.transitions.each do |shift, next_state| transition_queue << [@states.first, shift, next_state] end until transition_queue.empty? state, shift, next_state = transition_queue.shift compute_state(state, shift, next_state) next_state.transitions.each do |sh, next_st| transition_queue << [next_state, sh, next_st] end end end

Slide 40

Slide 40 text

def compute_state(state, shift, next_state) k = propagate_lookaheads(state, next_state) s = @ielr_isocores[next_state].find {|st| compatible?(st, k) } if s.nil? split_state(@ielr_isocores[next_state].last) elsif(!@lookaheads_recomputed[s]) @item_lookahead_set[s] = k @lookaheads_recomputed[s] = true else state.update_transition(shift, s) merge_lookaheads(s, k) end end IELR Implementation in Lrama

Slide 41

Slide 41 text

https://github.com/ruby/lrama/pull/398 Pull Request

Slide 42

Slide 42 text

Conclusion 人の 愛は 勇気は 消して 消えることは ありません もし 私が闇に墜ちてしまったら その時は どうか この剣を手に …… ―聖竜

Slide 43

Slide 43 text

Conclusion ● Managing lex_state is difficult ● Solve it to support PSLR parser generation to Lrama ● Implement IELR that is the prerequisite of PSLR ● IELR is bridging the gap of the parse capability between LALR and Canonical LR

Slide 44

Slide 44 text

Future Works ● Merge the PR ● Support generating PSLR parsers ● Refactor parse.y and throw lex_state away

Slide 45

Slide 45 text

Acknowledgements ● @yui-knk, @ydah ○ #lramafriends #LR_parser_gangs ● @koic, @S.H. ○ #esm_parser_club ● My Wife ○ #a_beautiful_life

Slide 46

Slide 46 text

References ● Lrama Repository: https://github.com/ruby/lrama ● Yuichiro Kaneko, "The future vision of Ruby parser", May 2023: https://youtu.be/IhfDsLx784g?si=kO1q6mLpTa1bIRYL ● Yuichiro Kaneko, "Ruby Parser開発日誌(14) - LR parser完全に理解した", Dec 2023: https://yui-knk.hatenablog.com/entry/2023/12/06/082203 ● Junichi Kobayashi, "Lrama へのコントリビューションを通して学ぶ Ruby のパーサ ジェネレータ事情", Sep 2023: https://speakerdeck.com/junk0612/lrama-henokontoribiyusiyonwotong-si texue-bu-ruby-nopasazieneretashi-qing

Slide 47

Slide 47 text

References ● Junichi Kobayashi, "Understanding Parser Generator surrounding Ruby with Contributing Lrama", Dec 2023: https://speakerdeck.com/junk0612/understanding-parser-generators-surr ounding-ruby-with-contributing-lrama ● Ruby 3.3.0 Release Note: https://www.ruby-lang.org/en/news/2023/12/25/ruby-3-3-0-released/ ● Yuichiro Kaneko, Ruby Parser Roadmap: https://docs.google.com/presentation/d/1E4v9WPHBLjtvkN7QqulHPGJzKk wIweVfcaMsIQ984_Q/edit?usp=sharing

Slide 48

Slide 48 text

References ● Joel E. Denny, "PSLR(1): Pseudo-Scannerless Minimal LR(1) for the Deterministic Parsing of Composite Languages", May 2010: https://tigerprints.clemson.edu/all_dissertations/519/ ● Joel E. Denny and Brian A. Malloy, "The IELR(1) algorithm for generating minimal LR(1) parser tables for non-LR(1) grammars with conflict resolution" https://www.sciencedirect.com/science/article/pii/S0167642309001191

Slide 49

Slide 49 text

Presentations around Parsers

Slide 50

Slide 50 text

Presentations around Parsers

Slide 51

Slide 51 text

That's one small patch for Lrama, one giant leap for Ruby.