Slide 1

Slide 1 text

RubyKaigi 2025 Ehime Prefectural Convention Hall 2025/04/17(Thu.) The Implementations of Advanced LR Parser Algorithm Junichi Kobayashi (@junk0612) ESM, Inc.

Slide 2

Slide 2 text

自己紹介 Junichi Kobayashi ● X / GitHub: @junk0612 ● Working at ESM, Inc. ○ Work as a Rails engineer ○ A Member of Parser Club ● Committer of Lrama ● Hobbies ○ Parsers ○ Rhythm games ○ Board games ○ Haiku

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Drinkup Sponsor

Slide 6

Slide 6 text

Ruby on Timeline

Slide 7

Slide 7 text

Attendee @nsgc Attendee @colorbox Attendees from ESM Attendee @S.H. Karaoke @fugakkbn Speaker @koic Attendee @wai-doi Speaker @junk0612 Attendee @kasumi8pon Attendee @mhirata Attendee @haruguchi Attendee @maimux2x

Slide 8

Slide 8 text

✦ An LR parser generator built with Ruby Lrama

Slide 9

Slide 9 text

✦ An algorithm of LR parser generation ✦ https://www.sciencedirect.com/science/article/pii/S0167 642309001191 IELR

Slide 10

Slide 10 text

✦ https://github.com/ruby/lrama/pull/398 ✦ Support since Lrama 0.7 for CRuby 3.5 IELR in Lrama

Slide 11

Slide 11 text

Usage $ gem install lrama # from CLI $ lrama -D lr.type=ielr grammar.y # in grammar file %define lr.type ielr

Slide 12

Slide 12 text

✦ Theoretical side ✦ Utilize the tokens read so far to narrow down the set of possible next tokens more accurately than LALR ✦ Practical side ✦ Propagating conflict info backward and lookahead forward ✦ Identify the conflict's cause and split states if necessary "After All, What is IELR?"

Slide 13

Slide 13 text

Theoretical Side Estimated: 5/30 min

Slide 14

Slide 14 text

✦ Shift-Reduce Parsing ✦ One of Bottom-up Parsing ✦ Use 2 types of actions ✦ Shift: Get a next symbol from Lexer ✦ Reduce: Collapse recognized symbols into a parent node ✦ Use one of the actions depending on the situation ✦ Match one of grammar rules: Reduce ✦ Don't match any grammar rules: Shift Basis of LR Parser

Slide 15

Slide 15 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1

Slide 16

Slide 16 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1

Slide 17

Slide 17 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1

Slide 18

Slide 18 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num

Slide 19

Slide 19 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp

Slide 20

Slide 20 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp

Slide 21

Slide 21 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp

Slide 22

Slide 22 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp num

Slide 23

Slide 23 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp

Slide 24

Slide 24 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp

Slide 25

Slide 25 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp

Slide 26

Slide 26 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp num

Slide 27

Slide 27 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp num exp

Slide 28

Slide 28 text

Behavior Example of LR Parser exp: exp + num | exp * num | num num: 0 | 1 1 + 0 * 1 num exp num exp num exp

Slide 29

Slide 29 text

Issue #1: Identifying Rule Matches

Slide 30

Slide 30 text

Motivating Examples method_call: method_name | method_name '(' args ')' save!(name: 'Junichi')

Slide 31

Slide 31 text

Motivating Examples method_call: method_name | method_name '(' args ')' save!(name: 'Junichi')

Slide 32

Slide 32 text

Motivating Examples method_call: method_name | method_name '(' args ')' save!(name: 'Junichi')

Slide 33

Slide 33 text

Motivating Examples method_call: method_name | method_name '(' args ')' save!(name: 'Junichi')

Slide 34

Slide 34 text

Motivating Examples method_call: method_name | method_name '(' args ')' save!(name: 'Junichi')

Slide 35

Slide 35 text

Motivating Examples method_call: method_name | method_name '(' args ')' save!(name: 'Junichi')

Slide 36

Slide 36 text

Motivating Examples method_call: method_name | method_name '(' args ')' save!(name: 'Junichi')

Slide 37

Slide 37 text

Issue #2: Collecting Next Tokens

Slide 38

Slide 38 text

✦ Modifier conditional statements Motivating Examples statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')'

Slide 39

Slide 39 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 40

Slide 40 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 41

Slide 41 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 42

Slide 42 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 43

Slide 43 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 44

Slide 44 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 45

Slide 45 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 46

Slide 46 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 47

Slide 47 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 48

Slide 48 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 49

Slide 49 text

statement: method_call 'if' exp | method_call 'unless' exp method_call: method_name | method_name '(' args ')' save!(name: 'Junichi') save! if valid? save! end valid?

Slide 50

Slide 50 text

✦ The next token is... ✦ included in same (or child) rule → Shift ✦ included in parent (or ancestors / sibling) rule → Reduce ✦ not included in both of the above → Error Select Actions

Slide 51

Slide 51 text

✦ A set of tokens used to determine whether to Reduce ✦ LALR and IELR differ in how they compute their lookahead sets ✦ LALR: The set of all tokens that can appear immediately after a given nonterminal, based on the entire language ✦ IELR: The set of all tokens that can appear immediately after a given nonterminal, but only within the subset of language elements matching the tokens read so far Lookahead Set

Slide 52

Slide 52 text

✦ conditional statements Motivating Examples cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')'

Slide 53

Slide 53 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 54

Slide 54 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 55

Slide 55 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 56

Slide 56 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 57

Slide 57 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 58

Slide 58 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 59

Slide 59 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 60

Slide 60 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 61

Slide 61 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 62

Slide 62 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 63

Slide 63 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 64

Slide 64 text

cond_stmt: 'if' method_call 'then' body 'end' | 'while' method_call 'do' body 'end' method_call: method_name | method_name '(' args ')' if valid? then save! end while valid? do save! end if valid? do save! end

Slide 65

Slide 65 text

✦ original example in the paper Motivating Examples S: 'a' A 'a' | 'b' A 'b' A: 'a' | 'a' 'a' aaa / aaaa / bab / baab

Slide 66

Slide 66 text

✦ original example Motivating Examples S: 'a' A 'a' | 'b' A 'b' A: 'a' | 'a' 'a' baab

Slide 67

Slide 67 text

✦ original example Motivating Examples S: 'a' A 'a' | 'b' A 'b' A: 'a' | 'a' 'a' baab

Slide 68

Slide 68 text

✦ A Conflict occurs when the parser can't uniquely decide the next action ✦ There are 2 types of Conflict ✦ Shift/Reduce Conflict ✦ Reduce/Reduce Conflict Conflicts

Slide 69

Slide 69 text

Practical Side Estimated: 15/30 min

Slide 70

Slide 70 text

LR Parser Model Parser State 0 State 1 State 2 State 3 NUM + exp State 4 State 5 State 6 * ( exp NUM - … Token Stream Source Code Lexer Grammar File Parser Generator 8 4 1 0

Slide 71

Slide 71 text

Shift Action 8 6 4 0

Slide 72

Slide 72 text

Shift Action 13 8 6 4 0

Slide 73

Slide 73 text

Reduce Action 13 8 6 4 0

Slide 74

Slide 74 text

Reduce Action 8 6 4 0

Slide 75

Slide 75 text

Reduce Action 6 4 0

Slide 76

Slide 76 text

Reduce Action 1 6 4 0

Slide 77

Slide 77 text

✦ Can be parsed IELR but not LALR ✦ Caused by the difference of lookahead set ✦ May resolve by splitting states ✦ There are 3 types of Mysterious Conflicts ✦ Mysterious New Conflicts ✦ Mysterious Invasive Conflicts ✦ Mysterious Mutated Conflicts Mysterious Conflicts

Slide 78

Slide 78 text

Mysterious New Conflicts

Slide 79

Slide 79 text

Mysterious Invasive Conflicts

Slide 80

Slide 80 text

Mysterious Mutated Conflicts

Slide 81

Slide 81 text

Lrama's Implementation States State @states Transition Shift Goto Reduce @transitions @reduces @items @conflicts @id @next_sym @from_state @to_state @next_sym @from_state @to_state @look_ahead @item

Slide 82

Slide 82 text

✦ Annotate conflicting states and their predecessors ✦ Recompute lookahead set from scratch ✦ Propagate it to next state ✦ Split the next state if the propagating lookahead set has no "compatibilities" with already propagated ones IELR Basic Idea

Slide 83

Slide 83 text

✦ Indicates potential contribution to conflicts ✦ Has the following conflict information inside ✦ State ✦ Token ✦ Actions ✦ Items Inadequacy Annotations

Slide 84

Slide 84 text

Inadequacy Annotation

Slide 85

Slide 85 text

Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2] Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}

Slide 86

Slide 86 text

Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2] Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}

Slide 87

Slide 87 text

Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2] Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}

Slide 88

Slide 88 text

Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2] Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}

Slide 89

Slide 89 text

Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2] Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}

Slide 90

Slide 90 text

Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2] Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}

Slide 91

Slide 91 text

Inadequacy Annotation State: 22 Token: 'end' Actions: [S, R1, R2] Items: {R1 => [...], R2 => [...]} State: 27 Token: 'do' Actions: [R5, R6] Items: {R5 => [...], R6 => [...]}

Slide 92

Slide 92 text

1. Lookahead sets: from its associated item's lookahead 2. Kernel lookahead: from the same item's lookahead in predecessor states (dot is one symbol to the left) 3. Non-kernel lookahead: from the follow set of the state's goto on the LHS (i.e., tokens that can appear next) 4. Goto follow set: a. the remainder of the RHS after the goto’s nonterminal b. The lookahead sets of items where the remainder is nullable Recompute Lookahead Set original: https://www.sciencedirect.com/science/article/pii/S0167642309001191 summarized by: @junk0612

Slide 93

Slide 93 text

Check Compatibilities S: 'a' A 'a' | 'b' A 'b' A: 'a' | 'a' 'a'

Slide 94

Slide 94 text

Check Compatibilities ● A: 'a'・, ['a', 'b'] ● A: 'a'・'a', ['a', 'b']

Slide 95

Slide 95 text

Check Compatibilities ● A: 'a'・, ['a', 'b'] ● A: 'a'・'a', ['a', 'b']

Slide 96

Slide 96 text

Check Compatibilities ● A: 'a'・, ['a', 'b'] ● A: 'a'・'a', ['a', 'b'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a

Slide 97

Slide 97 text

Check Compatibilities ● A: 'a'・, ['a', 'b'] ● A: 'a'・'a', ['a', 'b'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a

Slide 98

Slide 98 text

Check Compatibilities ● A: 'a'・, ['a', 'b'] ● A: 'a'・'a', ['a', 'b'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}

Slide 99

Slide 99 text

Check Compatibilities ● A: 'a'・, ['a'] ● A: 'a'・'a', ['a'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}

Slide 100

Slide 100 text

Check Compatibilities ● A: 'a'・, ['a'] ● A: 'a'・'a', ['a'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}

Slide 101

Slide 101 text

Check Compatibilities ● A: 'a'・, ['b'] ● A: 'a'・'a', ['b'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}

Slide 102

Slide 102 text

Check Compatibilities ● A: 'a'・, ['b'] ● A: 'a'・'a', ['b'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a Token: 'a' Actions: [S, R] Items: {R => [A: 'a'・]}

Slide 103

Slide 103 text

Check Compatibilities ● A: 'a'・, ['a'] ● A: 'a'・'a', ['a'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a ● A: 'a'・, ['b'] ● A: 'a'・'a', ['b']

Slide 104

Slide 104 text

Check Compatibilities ● A: 'a'・, ['a'] ● A: 'a'・'a', ['a'] ● S: 'b'・A 'b', ['#'] ● A:・'a', ['b'] ● A:・'a' 'a', ['b'] ● S: 'a'・A 'a', ['#'] ● A:・'a', ['a'] ● A:・'a' 'a', ['a'] a a ● A: 'a'・, ['b'] ● A: 'a'・'a', ['b'] No Conflicts!

Slide 105

Slide 105 text

$ bundle exec lrama -D lr.type=ielr parse.y

Slide 106

Slide 106 text

$ bundle exec lrama -D lr.type=ielr parse.y def compute_ielr (...snip...) rescue Interrupt end

Slide 107

Slide 107 text

✦ Automaton has loops ✦ Compute lookahead sets slowly ✦ Create many "useless" annotations Why got stack

Slide 108

Slide 108 text

Looped Automatons

Slide 109

Slide 109 text

Looped Automatons

Slide 110

Slide 110 text

✦ Introduce some caches ✦ Use strongly connected components algorithm ✦ Following the LALR Lookahead set computation Fasten Computation

Slide 111

Slide 111 text

Useless Annotations Actions: [R1, R2] Items: {R1 => [...], R2 => [...]}

Slide 112

Slide 112 text

Useless Annotations Actions: [R1, R2] Items: {R1 => [...], R2 => [...]} R1 R2

Slide 113

Slide 113 text

Useless Annotations Actions: [R1, R2] Items: {R1 => [...], R2 => [...]} R1 R2

Slide 114

Slide 114 text

$ time bundle exec lrama -D lr.type=ielr parse.y 69.02s user 0.29s system 99% cpu 1:09.38 total

Slide 115

Slide 115 text

Use IELR Parser in Ruby Estimated: 25/30 min

Slide 116

Slide 116 text

Generate IELR parser simply $ lrama --report=states parse.y $ mv parse.output lalr.output $ lrama --report=states -D lr.type=ielr parse.y $ mv parse.output ielr.output $ diff ielr.output lalr.output #=> no diffs!

Slide 117

Slide 117 text

✦ There are 4 'do's in Ruby ✦ keyword_do ✦ keyword_do_cond ✦ keyword_do_block ✦ keyword_do_LAMBDA ✦ The tokens read so far provide sufficient information to decide Identify 4 'do's Smart obj.m(arg) do ... end while(true) do ... end obj.m arg do ... end -> (arg) do ... end

Slide 118

Slide 118 text

while obj.m do #=> keyword_do_cond while(obj.m do #=> keyword_do while(obj.m arg do #=> keyword_do_block while -> do #=> keyword_do_LAMBDA while obj.m -> do #=> keyword_do_LAMBDA while(obj.m -> do end do #=> keyword_do_block

Slide 119

Slide 119 text

✦ Optimize IELR calculation ✦ Introduce IELR to CRuby and make parse.y better Future Work

Slide 120

Slide 120 text

✦ @yui-knk, @ydah ✦ #LR_parser_gangs ✦ @koic, @S.H. ✦ #esm_parser_club ✦ My Wife ✦ #a_beautiful_life Acknowledgements

Slide 121

Slide 121 text

✦ Theoretical side ✦ Utilize the tokens read so far to narrow down the set of possible next tokens more accurately than LALR ✦ Practical side ✦ Propagating conflict info backward and lookahead forward ✦ Identify the conflict's cause and split states if necessary "After All, What is IELR?"