Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unexplored Region - parse.y -

Unexplored Region - parse.y -

yui-knk

May 13, 2023
Tweet

More Decks by yui-knk

Other Decks in Programming

Transcript

  1. Unexplored Region


    - parse.y -
    May 11, 2023 in RubyKaigi 2023


    @yui-knk


    Yuichiro Kaneko

    View Slide

  2. RubyKaigi 2023

    View Slide

  3. 5 sessions for Parser
    Day 1
    Day 2
    Day 3
    LT

    View Slide

  4. This is the time known as


    the “Great Parser Era” !

    View Slide

  5. Everyone has
    interest in Parser

    View Slide

  6. Rumors about parse.y

    View Slide

  7. Demon Castle
    parse.y (2017)
    Rumors about parse.y

    View Slide

  8. Monstrous
    lex_state (2017)
    Demon Castle
    parse.y (2017)
    Rumors about parse.y

    View Slide

  9. Monstrous
    lex_state (2017)
    Demon Castle
    parse.y (2017) parse.y is “hell” (2019)
    Rumors about parse.y

    View Slide

  10. Monstrous
    lex_state (2017)
    Demon Castle
    parse.y (2017) parse.y is “hell” (2019)
    The current parse.y
    is a hell (2021)
    Rumors about parse.y

    View Slide

  11. Rumors about parse.y

    View Slide

  12. Rumors about parse.y
    This is NOT a parser

    View Slide

  13. Rumors about parse.y
    This is NOT a parser
    This is a parser

    View Slide

  14. This is a quick & safe tour of
    unexplored region, “parse.y"

    View Slide

  15. In 5 minute

    View Slide

  16. About me
    • Yuichiro Kaneko


    • yui-knk (GitHub) / spikeolaf (Twitter)


    • Treasure Data


    • Engineering Manager of Applications Backend


    • CRuby committer


    • Mainly develop parser related features other than new syntax


    • RubyVM::AbstractSyntaxTree (2018, Ruby 2.6)


    • keep_tokens option (2022, Ruby 3.2)


    • error_tolerant option (2022, Ruby 3.2)

    View Slide

  17. “parse.y” Who's Who in 2023

    View Slide

  18. “parse.y” Who's Who in 2023
    The patch monster

    View Slide

  19. “parse.y” Who's Who in 2023
    The patch monster
    The Creator of Ruby

    View Slide

  20. “parse.y” Who's Who in 2023
    The patch monster
    The Creator of Ruby
    The Organizer of TRICK


    View Slide

  21. “parse.y” Who's Who in 2023
    The patch monster
    The Creator of Ruby
    The Organizer of TRICK


    Me

    View Slide

  22. I’m the weakest
    of the Big Four

    View Slide

  23. Beginner


    Course
    Let's understand outline of “parse.y”

    View Slide

  24. #1: The location of parse
    • Goto https://github.com/ruby/ruby


    • “parse.y” is on the top level

    View Slide

  25. #2: Size of “parse.y”
    • Less than 15,000


    • It’s not the largest one
    0
    5000
    10000
    15000
    20000
    common.mk io.c gc.c parse.y compile.c string.c
    tag: v3_2_0

    View Slide

  26. #3: Structure of “parse.y”
    C codes
    Declarations


    Grammar rules


    C codes

    View Slide

  27. #3: Structure of “parse.y”
    Declarations


    Grammar rules


    • From L.1329 to L.6118


    • About 5,000 lines

    View Slide

  28. #4: Grammar rules

    View Slide

  29. #4: Grammar rules
    Almost same as BNF


    View Slide

  30. #4: Grammar rules
    Almost same as BNF


    View Slide

  31. #4: Grammar rules
    Almost same as BNF


    Action

    View Slide

  32. #4: Grammar rules
    Almost same as BNF


    Action
    Comment

    View Slide

  33. #4: Grammar rules
    Almost same as BNF


    Action
    Comment
    NODE_IF

    View Slide

  34. #4: Grammar rules
    Almost same as BNF


    Action
    Comment
    NODE_IF
    condition “i == 1”


    View Slide

  35. #4: Grammar rules
    Almost same as BNF


    Action
    Comment
    NODE_IF
    condition “i == 1”


    body “1”


    View Slide

  36. #4: Grammar rules
    Almost same as BNF


    Action
    NODE_IF
    Comment
    condition “i == 1”


    body “1”


    else “0”


    View Slide

  37. #4: Grammar rules
    Almost same as BNF


    Action
    NODE_IF
    Comment
    condition “i == 1”


    body “1”


    else “0”


    At sign means
    location

    View Slide

  38. How's this? It is not
    so dif
    fi
    cult, right?

    View Slide

  39. Guidebooks
    • shioimm/coe401_. “ͨͷ͍͠RubyͷߏจղੳπΞʔ”, March 2023. https://
    speakerdeck.com/coe401_/tanosiirubynogou-wen-jie-xi-tua


    • aamine. “Rubyιʔείʔυ׬શղઆ” ୈ 2 ෦ʮߏจղੳʯ, July 2004.


    • https://i.loveruby.net/ja/rhg/book/ [JA]


    • https://ruby-hacking-guide.github.io/ [EN]

    View Slide

  40. #5: dump=y option

    View Slide

  41. Do I understand
    all of them?

    View Slide

  42. Almost,
    unfortunately

    View Slide

  43. #6: YFLAGS
    • make YFLAGS=" --report=states,itemsets,lookaheads,solved"
    miniruby


    • “parse.tmp.output" is generated

    View Slide

  44. Now you’re a beginner of parser

    View Slide

  45. Intermediate


    Course
    Let's understand parser_params and lexer state

    View Slide

  46. “If you
    fi
    nd both data and code,
    you should
    fi
    rst investigate the
    data structure.”
    Introduction “Understanding data structure”


    https://ruby-hacking-guide.github.io/intro.html
    Iron Rule #1

    View Slide

  47. What one should do is think toward
    speci
    fi
    c goals: “This part is needed to
    solve this task” “This code is for
    overcoming this problem”
    Chapter 11 Finite-state scanner “Understanding data structure”


    https://ruby-hacking-guide.github.io/contextual.html
    Iron Rule #2

    View Slide

  48. #7: struct parser_params
    • Not small …

    View Slide

  49. #7: parser_params 2004
    • Subscribe now!
    https://github.com/ruby/ruby/commit/e77ddaf0d1d421da2f655832a45f237558e23115

    View Slide

  50. #8: Lexer Buffer
    • Hospitality !!!


    • (But this the only diagram…)

    View Slide

  51. #9: lastline / nextline
    • There is two lines, lastline and
    nextline


    • Why?

    View Slide

  52. For here document?

    View Slide

  53. NO

    View Slide

  54. #9: lastline / nextline
    • Ruby sometimes ignores NL (internally
    this is called “tIGNORED_NL”)


    • Need to know which token appears
    next to determine ignore NL or not
    NL is ignored

    View Slide

  55. #9: lastline / nextline
    1. A pointer is on the end of the line

    View Slide

  56. #9: lastline / nextline
    1. A pointer is on the end of the line


    2. Get next line

    View Slide

  57. #9: lastline / nextline
    1. A pointer is on the end of the line


    2. Get next line


    3. Check next token to determine generate
    NL

    View Slide

  58. #9: lastline / nextline
    1. A pointer is on the end of the line


    2. Get next line


    3. Check next token to determine generate
    NL


    4. Go back to previous line and hold next
    line

    View Slide

  59. #9: lastline / nextline
    1. A pointer is on the end of the line


    2. Get next line


    3. Check next token to determine generate
    NL


    4. Go back to previous line and hold next
    line


    5. “nextline” is set to “lastline”

    View Slide

  60. #10: parent_iseq
    • It’s weird because ISeq
    is a data structure
    generated from AST

    View Slide

  61. #10: parent_iseq
    https://www.slideshare.net/mametter/trick-2022-results
    • This is a trick used in
    TRICK 2022

    View Slide

  62. #10: parent_iseq
    • #eval binds the context around it


    • “Context” is represented as “parent_iseq”

    View Slide

  63. #eval is evil

    View Slide

  64. #11: Growing lex_state
    v3_2_0

    View Slide

  65. #11: Growing lex_state
    v0_49
    v3_2_0
    https://github.com/ruby/ruby/blob/v0_49/parse.y
    • I love v0_49

    View Slide

  66. Is this actually needed?
    • They are needed to distinguish these codes

    View Slide

  67. Is this actually needed?
    • They are needed to distinguish these codes

    View Slide

  68. Is this actually needed?
    • They are needed to distinguish these codes

    View Slide

  69. Is this actually needed?
    • They are needed to distinguish these codes

    View Slide

  70. Is this actually needed?
    • They are needed to distinguish these codes

    View Slide

  71. Is this actually needed?
    • They are needed to distinguish these codes

    View Slide

  72. How's this? It is not
    so dif
    fi
    cult, right?

    View Slide

  73. Wrong, it’s
    complex.

    View Slide

  74. Open Question: How to sort
    out lex_state

    View Slide

  75. Advanced


    Course
    Let's understand actions

    View Slide

  76. #4: Grammar rules
    Action

    View Slide

  77. #12: Action is a local variable
    • Some rules has a lot of actions in the
    middle of right hand sides

    View Slide

  78. #12: Action is a local variable
    “compstmt” is almost all of Ruby’s syntax

    View Slide

  79. String interpolation
    is very useful, right?

    View Slide

  80. With great power comes
    great responsibility

    View Slide

  81. #12: Action is a local variable
    Save current values
    Restore values

    View Slide

  82. #13: Semantic value is reusable

    View Slide

  83. #13: Semantic value is reusable
    Reuse “->” semantic value


    because nobody uses it
    Restore the value

    View Slide

  84. #14: lex_ctxt nonterminal symbol
    • This does not consume any tokens. This exists for just getting
    current value of “struct lex_context ctxt”!

    View Slide

  85. Conclusions
    It's fun to read
    parse.y

    View Slide

  86. See you next time


    at “parse.y”

    View Slide

  87. Guidebooks
    • shioimm/coe401_. “ͨͷ͍͠RubyͷߏจղੳπΞʔ”, March 2023. https://
    speakerdeck.com/coe401_/tanosiirubynogou-wen-jie-xi-tua


    • aamine. “Rubyιʔείʔυ׬શղઆ” ୈ 2 ෦ʮߏจղੳʯ, July 2004.


    • https://i.loveruby.net/ja/rhg/book/ [JA]


    • https://ruby-hacking-guide.github.io/ [EN]


    • େງ ३ “LRߏจղੳͷݪཧ”, Feb 2014. https://www.jstage.jst.go.jp/article/jssst/
    31/1/31_1_30/_pdf/-char/ja


    • A.V. ΤΠϗ ଞ “ίϯύΠϥ[ୈ2൛] ݪཧɾٕ๏ɾπʔϧ” αΠΤϯεࣾ

    View Slide