Slide 1

Slide 1 text

Unexplored Region - parse.y - May 11, 2023 in RubyKaigi 2023 @yui-knk Yuichiro Kaneko

Slide 2

Slide 2 text

RubyKaigi 2023

Slide 3

Slide 3 text

5 sessions for Parser Day 1 Day 2 Day 3 LT

Slide 4

Slide 4 text

This is the time known as the “Great Parser Era” !

Slide 5

Slide 5 text

Everyone has interest in Parser

Slide 6

Slide 6 text

Rumors about parse.y

Slide 7

Slide 7 text

Demon Castle parse.y (2017) Rumors about parse.y

Slide 8

Slide 8 text

Monstrous lex_state (2017) Demon Castle parse.y (2017) Rumors about parse.y

Slide 9

Slide 9 text

Monstrous lex_state (2017) Demon Castle parse.y (2017) parse.y is “hell” (2019) Rumors about parse.y

Slide 10

Slide 10 text

Monstrous lex_state (2017) Demon Castle parse.y (2017) parse.y is “hell” (2019) The current parse.y is a hell (2021) Rumors about parse.y

Slide 11

Slide 11 text

Rumors about parse.y

Slide 12

Slide 12 text

Rumors about parse.y This is NOT a parser

Slide 13

Slide 13 text

Rumors about parse.y This is NOT a parser This is a parser

Slide 14

Slide 14 text

This is a quick & safe tour of unexplored region, “parse.y"

Slide 15

Slide 15 text

In 5 minute

Slide 16

Slide 16 text

About me • Yuichiro Kaneko • yui-knk (GitHub) / spikeolaf (Twitter) • Treasure Data • Engineering Manager of Applications Backend • CRuby committer • Mainly develop parser related features other than new syntax • RubyVM::AbstractSyntaxTree (2018, Ruby 2.6) • keep_tokens option (2022, Ruby 3.2) • error_tolerant option (2022, Ruby 3.2)

Slide 17

Slide 17 text

“parse.y” Who's Who in 2023

Slide 18

Slide 18 text

“parse.y” Who's Who in 2023 The patch monster

Slide 19

Slide 19 text

“parse.y” Who's Who in 2023 The patch monster The Creator of Ruby

Slide 20

Slide 20 text

“parse.y” Who's Who in 2023 The patch monster The Creator of Ruby The Organizer of TRICK

Slide 21

Slide 21 text

“parse.y” Who's Who in 2023 The patch monster The Creator of Ruby The Organizer of TRICK Me

Slide 22

Slide 22 text

I’m the weakest of the Big Four

Slide 23

Slide 23 text

Beginner Course Let's understand outline of “parse.y”

Slide 24

Slide 24 text

#1: The location of parse • Goto https://github.com/ruby/ruby • “parse.y” is on the top level

Slide 25

Slide 25 text

#2: Size of “parse.y” • Less than 15,000 • It’s not the largest one 0 5000 10000 15000 20000 common.mk io.c gc.c parse.y compile.c string.c tag: v3_2_0

Slide 26

Slide 26 text

#3: Structure of “parse.y” C codes Declarations Grammar rules C codes

Slide 27

Slide 27 text

#3: Structure of “parse.y” Declarations Grammar rules • From L.1329 to L.6118 • About 5,000 lines

Slide 28

Slide 28 text

#4: Grammar rules

Slide 29

Slide 29 text

#4: Grammar rules Almost same as BNF

Slide 30

Slide 30 text

#4: Grammar rules Almost same as BNF

Slide 31

Slide 31 text

#4: Grammar rules Almost same as BNF Action

Slide 32

Slide 32 text

#4: Grammar rules Almost same as BNF Action Comment

Slide 33

Slide 33 text

#4: Grammar rules Almost same as BNF Action Comment NODE_IF

Slide 34

Slide 34 text

#4: Grammar rules Almost same as BNF Action Comment NODE_IF condition “i == 1”

Slide 35

Slide 35 text

#4: Grammar rules Almost same as BNF Action Comment NODE_IF condition “i == 1” body “1”

Slide 36

Slide 36 text

#4: Grammar rules Almost same as BNF Action NODE_IF Comment condition “i == 1” body “1” else “0”

Slide 37

Slide 37 text

#4: Grammar rules Almost same as BNF Action NODE_IF Comment condition “i == 1” body “1” else “0” At sign means location

Slide 38

Slide 38 text

How's this? It is not so dif fi cult, right?

Slide 39

Slide 39 text

Guidebooks • shioimm/coe401_. “ͨͷ͍͠RubyͷߏจղੳπΞʔ”, March 2023. https:// speakerdeck.com/coe401_/tanosiirubynogou-wen-jie-xi-tua • aamine. “Rubyιʔείʔυ׬શղઆ” ୈ 2 ෦ʮߏจղੳʯ, July 2004. • https://i.loveruby.net/ja/rhg/book/ [JA] • https://ruby-hacking-guide.github.io/ [EN]

Slide 40

Slide 40 text

#5: dump=y option

Slide 41

Slide 41 text

Do I understand all of them?

Slide 42

Slide 42 text

Almost, unfortunately

Slide 43

Slide 43 text

#6: YFLAGS • make YFLAGS=" --report=states,itemsets,lookaheads,solved" miniruby • “parse.tmp.output" is generated

Slide 44

Slide 44 text

Now you’re a beginner of parser

Slide 45

Slide 45 text

Intermediate Course Let's understand parser_params and lexer state

Slide 46

Slide 46 text

“If you fi nd both data and code, you should fi rst investigate the data structure.” Introduction “Understanding data structure” https://ruby-hacking-guide.github.io/intro.html Iron Rule #1

Slide 47

Slide 47 text

What one should do is think toward speci fi c goals: “This part is needed to solve this task” “This code is for overcoming this problem” Chapter 11 Finite-state scanner “Understanding data structure” https://ruby-hacking-guide.github.io/contextual.html Iron Rule #2

Slide 48

Slide 48 text

#7: struct parser_params • Not small …

Slide 49

Slide 49 text

#7: parser_params 2004 • Subscribe now! https://github.com/ruby/ruby/commit/e77ddaf0d1d421da2f655832a45f237558e23115

Slide 50

Slide 50 text

#8: Lexer Buffer • Hospitality !!! • (But this the only diagram…)

Slide 51

Slide 51 text

#9: lastline / nextline • There is two lines, lastline and nextline • Why?

Slide 52

Slide 52 text

For here document?

Slide 53

Slide 53 text

NO

Slide 54

Slide 54 text

#9: lastline / nextline • Ruby sometimes ignores NL (internally this is called “tIGNORED_NL”) • Need to know which token appears next to determine ignore NL or not NL is ignored

Slide 55

Slide 55 text

#9: lastline / nextline 1. A pointer is on the end of the line

Slide 56

Slide 56 text

#9: lastline / nextline 1. A pointer is on the end of the line 2. Get next line

Slide 57

Slide 57 text

#9: lastline / nextline 1. A pointer is on the end of the line 2. Get next line 3. Check next token to determine generate NL

Slide 58

Slide 58 text

#9: lastline / nextline 1. A pointer is on the end of the line 2. Get next line 3. Check next token to determine generate NL 4. Go back to previous line and hold next line

Slide 59

Slide 59 text

#9: lastline / nextline 1. A pointer is on the end of the line 2. Get next line 3. Check next token to determine generate NL 4. Go back to previous line and hold next line 5. “nextline” is set to “lastline”

Slide 60

Slide 60 text

#10: parent_iseq • It’s weird because ISeq is a data structure generated from AST

Slide 61

Slide 61 text

#10: parent_iseq https://www.slideshare.net/mametter/trick-2022-results • This is a trick used in TRICK 2022

Slide 62

Slide 62 text

#10: parent_iseq • #eval binds the context around it • “Context” is represented as “parent_iseq”

Slide 63

Slide 63 text

#eval is evil

Slide 64

Slide 64 text

#11: Growing lex_state v3_2_0

Slide 65

Slide 65 text

#11: Growing lex_state v0_49 v3_2_0 https://github.com/ruby/ruby/blob/v0_49/parse.y • I love v0_49

Slide 66

Slide 66 text

Is this actually needed? • They are needed to distinguish these codes

Slide 67

Slide 67 text

Is this actually needed? • They are needed to distinguish these codes

Slide 68

Slide 68 text

Is this actually needed? • They are needed to distinguish these codes

Slide 69

Slide 69 text

Is this actually needed? • They are needed to distinguish these codes

Slide 70

Slide 70 text

Is this actually needed? • They are needed to distinguish these codes

Slide 71

Slide 71 text

Is this actually needed? • They are needed to distinguish these codes

Slide 72

Slide 72 text

How's this? It is not so dif fi cult, right?

Slide 73

Slide 73 text

Wrong, it’s complex.

Slide 74

Slide 74 text

Open Question: How to sort out lex_state

Slide 75

Slide 75 text

Advanced Course Let's understand actions

Slide 76

Slide 76 text

#4: Grammar rules Action

Slide 77

Slide 77 text

#12: Action is a local variable • Some rules has a lot of actions in the middle of right hand sides

Slide 78

Slide 78 text

#12: Action is a local variable “compstmt” is almost all of Ruby’s syntax

Slide 79

Slide 79 text

String interpolation is very useful, right?

Slide 80

Slide 80 text

With great power comes great responsibility

Slide 81

Slide 81 text

#12: Action is a local variable Save current values Restore values

Slide 82

Slide 82 text

#13: Semantic value is reusable

Slide 83

Slide 83 text

#13: Semantic value is reusable Reuse “->” semantic value because nobody uses it Restore the value

Slide 84

Slide 84 text

#14: lex_ctxt nonterminal symbol • This does not consume any tokens. This exists for just getting current value of “struct lex_context ctxt”!

Slide 85

Slide 85 text

Conclusions It's fun to read parse.y

Slide 86

Slide 86 text

See you next time at “parse.y”

Slide 87

Slide 87 text

Guidebooks • shioimm/coe401_. “ͨͷ͍͠RubyͷߏจղੳπΞʔ”, March 2023. https:// speakerdeck.com/coe401_/tanosiirubynogou-wen-jie-xi-tua • aamine. “Rubyιʔείʔυ׬શղઆ” ୈ 2 ෦ʮߏจղੳʯ, July 2004. • https://i.loveruby.net/ja/rhg/book/ [JA] • https://ruby-hacking-guide.github.io/ [EN] • େງ ३ “LRߏจղੳͷݪཧ”, Feb 2014. https://www.jstage.jst.go.jp/article/jssst/ 31/1/31_1_30/_pdf/-char/ja • A.V. ΤΠϗ ଞ “ίϯύΠϥ[ୈ2൛] ݪཧɾٕ๏ɾπʔϧ” αΠΤϯεࣾ