Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Unexplored Region - parse.y -

Unexplored Region - parse.y -

yui-knk

May 13, 2023
Tweet

More Decks by yui-knk

Other Decks in Programming

Transcript

  1. Monstrous lex_state (2017) Demon Castle parse.y (2017) parse.y is “hell”

    (2019) The current parse.y is a hell (2021) Rumors about parse.y
  2. About me • Yuichiro Kaneko • yui-knk (GitHub) / spikeolaf

    (Twitter) • Treasure Data • Engineering Manager of Applications Backend • CRuby committer • Mainly develop parser related features other than new syntax • RubyVM::AbstractSyntaxTree (2018, Ruby 2.6) • keep_tokens option (2022, Ruby 3.2) • error_tolerant option (2022, Ruby 3.2)
  3. #2: Size of “parse.y” • Less than 15,000 • It’s

    not the largest one 0 5000 10000 15000 20000 common.mk io.c gc.c parse.y compile.c string.c tag: v3_2_0
  4. #4: Grammar rules Almost same as BNF Action NODE_IF Comment

    condition “i == 1” body “1” else “0”
  5. #4: Grammar rules Almost same as BNF Action NODE_IF Comment

    condition “i == 1” body “1” else “0” At sign means location
  6. Guidebooks • shioimm/coe401_. “ͨͷ͍͠RubyͷߏจղੳπΞʔ”, March 2023. https:// speakerdeck.com/coe401_/tanosiirubynogou-wen-jie-xi-tua • aamine.

    “Rubyιʔείʔυ׬શղઆ” ୈ 2 ෦ʮߏจղੳʯ, July 2004. • https://i.loveruby.net/ja/rhg/book/ [JA] • https://ruby-hacking-guide.github.io/ [EN]
  7. “If you fi nd both data and code, you should

    fi rst investigate the data structure.” Introduction “Understanding data structure” https://ruby-hacking-guide.github.io/intro.html Iron Rule #1
  8. What one should do is think toward speci fi c

    goals: “This part is needed to solve this task” “This code is for overcoming this problem” Chapter 11 Finite-state scanner “Understanding data structure” https://ruby-hacking-guide.github.io/contextual.html Iron Rule #2
  9. NO

  10. #9: lastline / nextline • Ruby sometimes ignores NL (internally

    this is called “tIGNORED_NL”) • Need to know which token appears next to determine ignore NL or not NL is ignored
  11. #9: lastline / nextline 1. A pointer is on the

    end of the line 2. Get next line
  12. #9: lastline / nextline 1. A pointer is on the

    end of the line 2. Get next line 3. Check next token to determine generate NL
  13. #9: lastline / nextline 1. A pointer is on the

    end of the line 2. Get next line 3. Check next token to determine generate NL 4. Go back to previous line and hold next line
  14. #9: lastline / nextline 1. A pointer is on the

    end of the line 2. Get next line 3. Check next token to determine generate NL 4. Go back to previous line and hold next line 5. “nextline” is set to “lastline”
  15. #10: parent_iseq • #eval binds the context around it •

    “Context” is represented as “parent_iseq”
  16. #12: Action is a local variable • Some rules has

    a lot of actions in the middle of right hand sides
  17. #14: lex_ctxt nonterminal symbol • This does not consume any

    tokens. This exists for just getting current value of “struct lex_context ctxt”!
  18. Guidebooks • shioimm/coe401_. “ͨͷ͍͠RubyͷߏจղੳπΞʔ”, March 2023. https:// speakerdeck.com/coe401_/tanosiirubynogou-wen-jie-xi-tua • aamine.

    “Rubyιʔείʔυ׬શղઆ” ୈ 2 ෦ʮߏจղੳʯ, July 2004. • https://i.loveruby.net/ja/rhg/book/ [JA] • https://ruby-hacking-guide.github.io/ [EN] • େງ ३ “LRߏจղੳͷݪཧ”, Feb 2014. https://www.jstage.jst.go.jp/article/jssst/ 31/1/31_1_30/_pdf/-char/ja • A.V. ΤΠϗ ଞ “ίϯύΠϥ[ୈ2൛] ݪཧɾٕ๏ɾπʔϧ” αΠΤϯεࣾ