Upgrade to Pro — share decks privately, control downloads, hide ads and more …

What is expected?

yui-knk
December 14, 2019

What is expected?

yui-knk

December 14, 2019
Tweet

More Decks by yui-knk

Other Decks in Programming

Transcript

  1. ࣗݾ঺հ • ۚࢠ ༤Ұ࿠ • Arm Treasure Data ॴଐ •

    Audience νʔϜ (RailsΞϓϦΛॻ͍͍ͯ·͢) • CRuby Committer 2015/12~ • GitHub (yui-knk)
  2. How Ruby script is processed 4UFQ *OQVU 0VUQVU %FCVH 4PVSDF

    5PLFOJ[BUJPO 3VCZTDSJQU 5PLFOT EVNQZ QBSTFZ 1BSTJOH 5PLFOT "45 EVNQQ QBSTFZ $PNQJMF "45 #ZUFDPEF EVNQJ DPNQJMFD Parsing ___ \ Ruby script -> Tokens -> AST -> Byte code (insns / ISeq) __/ __/ Tokenization Compile
  3. Tokenization Parsing ___ \ Ruby script -> Tokens -> AST

    -> Byte code (insns / ISeq) __/ __/ Tokenization Compile
  4. Tokenization • Token͸ҎԼͷ2ͭͷ৘ใΛ΋ͭ • a token type (tINTEGER) • a

    semantic value (1) 1 + 2 ^ ^ ^^ | | |+--- '\n' / "end-of-input" | | +---- tINTEGER (2) | +------ '+' +-------- tINTEGER (1)
  5. Tokenization $ ruby --dump=y -e '1 + 2' | grep

    Shifting Shifting token "integer literal" (1.0-1.1: 1) Shifting token '+' (1.2-1.3: ) Shifting token "integer literal" (1.4-1.5: 2) Shifting token '\n' (1.5-1.5: ) Shifting token "end-of-input" (1.5-1.5: ) On Ruby 2.7.0preview3
  6. Parsing Parsing ___ \ Ruby script -> Tokens -> AST

    -> Byte code (insns / ISeq) __/ __/ Tokenization Compile
  7. Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;

    simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y
  8. Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;

    simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y W W W Rules
  9. Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;

    simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y
  10. Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;

    simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y 1 2.1 3r 4i
  11. Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;

    simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y 1
  12. Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;

    simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y 1
  13. Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;

    simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y Goal
  14. Parsing $ ruby --dump=y -e '1 + 2' # Shifting

    token “integer literal” (1) “integer literal” simple_numeric numeric literal primary arg
  15. Parsing # Shifting token '+' arg '+' # Shifting token

    “integer literal” (2) arg '+' “integer literal” arg '+' simple_numeric arg '+' numeric arg '+' literal arg '+' primary arg '+' arg arg expr stmt top_stmt top_stmts
  16. Parsing # Shifting token '\n' top_stmts '\n' top_stmts term top_stmts

    terms top_stmts opt_terms top_compstmt program # Completed # Shifting token "end-of-input"
  17. Build AST $ ruby --dump=p -e '1 + 2' NODE_SCOPE

    NODE_OPCALL (:+) NODE_LIT (1) NODE_LIST NODE_LIT (2) NODE_SCOPE NODE_OPCALL (:+) NODE_LIT (1) NODE_LIST NODE_LIT (2)
  18. Compile Parsing ___ \ Ruby script -> Tokens -> AST

    -> Byte code (insns / ISeq) __/ __/ Tokenization Compile
  19. Compile • ίϯύΠϧΛ͢Δ • See “compile.c” $ ruby --dump=i -e

    '1 + 2' == disasm: #<ISeq:<main>@-e:1 (1,0)-(1,5)> (catch: FALSE) 0000 putobject_INT2FIX_1_ ( 1)[Li] 0001 putobject 2 0003 opt_plus <calldata!mid:+, argc:1, ARGS_SIMPLE> 0005 leave
  20. 1BSTFSJTUPEBZ`TUPQJD • expected tokens͸ߏจղੳ্ͷ໰୊ Parsing ___ \ Ruby script ->

    Tokens -> AST -> Byte code (insns / ISeq) __/ __/ Tokenization Compile
  21. • ੜ੒نଇ (Production Rule): • ྫ) simple_numeric: tINTEGER • ͖͞΄ͲͷྫͰ͸programΛى఺ʹੜ੒نଇΛద༻ͯ͠

    ͍ͬͯ࡞Δ͜ͱͷͰ͖Δ΋ͷ͕ɺͦͷจ๏ʹΑͬͯఆ ٛ͞ΕΔݴޠͱͳΔ simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; 8IBUJTHSBNNBS
  22. • ඇऴ୺ه߸ (Nonterminal): RuleͷࠨลʹදΕΔه߸ • ྫ) program, numeric, simple_numeric •

    ӈลʹ΋දΕΔ͜ͱ͕͋Δ • ऴ୺ه߸ (Terminal): Ruleͷӈลʹ͔͠දΕͳ͍ه߸ • ྫ) tINTEGER, tUMINUS_NUM, tLOWEST numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ; 8IBUJTHSBNNBS ऴ୺ه߸ ඇऴ୺ه߸
  23. How parser works? L : L ';' E /* Rule

    1 */ | E /* Rule 2 */ ; E : E ',' P /* Rule 3 */ | P /* Rule 4 */ ; P : 'a' /* Rule 5 */ | '(' M ')' /* Rule 6 */ ; M : /* nothing */ /* Rule 7 */ | L /* Rule 8 */ ; a;()
  24. • Shift: ࣍ͷtokenΛstackʹϓογϡ͢Δ • Reduce: ͋ΔϧʔϧΛ࢖ͬͯӈ͔ΒnݸͷtokenΛஔ͖׵͑Δ How parser works? "a"

    ";" "(" ")" $end # Shift "a" ";" "(" ")" $end # Reduce by rule 5 P ";" "(" ")" $end # Reduce by rule 4 E ";" "(" ")" $end # Reduce by rule 2 L ";" "(" ")" $end # Shift L ";" "(" ")" $end # Shift L ";" "(" ")" $end # Reduce by rule 7 L ";" "(" M ")" $end # Shift L ";" "(" M ")" $end # Reduce by rule 6 L ";" P $end # Reduce by rule 4 L ";" E $end # Reduce by rule 1 L $end # Shift L $end # accept Rule 6. P: (M) ʹΑΔreduce stack ೖྗτʔΫϯ
  25. -"-3  UBCMF • sN: Shiftͯ͠NΛstackʹpush • rN: ϧʔϧNͰReduce͢Δ •

    acc: डཧ • ۭന: ߏจΤϥʔ • GOTO: ඇऴ୺ه߸༻ɻNΛstack ʹpush https://www.cs.uic.edu/~spopuri/cparser.html#dragonbook-tables
  26. )PXUPDSFBUFUIFUBCMF • Ruleͷӈลʹ ‘.’ ΛՃ͑ͨ΋ͷΛߟ͑Δ 1~4 • ‘.’ ͸ruleͷͲ͜·ͰಡΜ͔ͩΛ͋ΒΘ͢ •

    2Ͱ͋Ε͹࣍ʹ ‘;’ ͕͘Δ͜ͱΛظ଴͍ͯ͠ΔͷͰɺ ‘;’ ͳ Βshift͢Δ • 4ͳΒreduceͯ͠Lʹ͢Δ • ͜ΕΒΛLR(0)߲ͱ͍ͬͨΓ͢Δ 0 L: L ';' E 1 L: . L ';' E 2 L: L . ';' E 3 L: L ';' . E 4 L: L ';' E .
  27. )PXUPDSFBUFUIFUBCMF • 1ͷ৔߹ɺ࣍ʹظ଴͞ΕΔͷ͸E • E͸ผͷruleͰreduce͢Δ͜ͱʹΑͬͯൃੜ͢Δ͔΋͠Ε ͳ͍(3, 4) • P͸ผͷruleͰ… (5,

    6) • ‘a’ ΍ ‘(’ ͸ऴ୺ه߸ͳͷͰผͷrule͔Β͸ൃੜ͠ͳ͍ • ߲͸͍͔ͭ͘ͷάϧʔϓʹ෼͚ΒΕΔ 1 L: L ';' . E 3 E: . E ',' P 4 | . P 5 P: . 'a' 6 | . '(' M ')'
  28. 5BCMFJTTQBSTF • actionςʔϒϧ͸ 37/78 = 47% ͔͠ຒ·ͬͯͳ͍ • GOTOςʔϒϧ͸ 10/52

    = 19% ͔͠ຒ·ͬͯͳ͍ • Ruby 2.7.3 pre3Ͱ͸ 1234 state, 411 symbols https://www.cs.uic.edu/~spopuri/cparser.html#dragonbook-tables
  29. Compress table (1) • Default Reductions, Default GOTOsʹΑΔѹॖ • actionςʔϒϧ͸ԣํ޲ɺ

    GOTOςʔϒϧ͸ॎํ޲ʹѹ ॖ • state 5 -> r3 • PͷGOTO -> 9 https://www.cs.uic.edu/~spopuri/cparser.html#dragonbook-tables
  30. Compress table (2) • Default ReductionsΛಋೖͯ͠΋ɺ·ͩ sparse • double displacementʹΑΔѹॖ

    https://www.cs.uic.edu/~spopuri/cparser.html#table-compression ΛҰ෦मਖ਼
  31. Compress table (2) • ࣮ࡍͷ஋Λ΋ͭyytable, guard tableͰ͋Δyycheck, offset Λ؅ཧ͢Δyypactͷ3ͭͰදݱ͢Δ 0:

    [ , , , , , 1, 2, ] 2: [ , , , , , 1, 2, ] 3: [ 8, , , 9, , , , ] 4: [ , , , , 10, , , ] 6: [ , , , 9, , , , ] 7: [ , , , , , , , 11] 9: [ , , , , , 1, 2, ] 10: [ , , , , , 1, 2, ] 12: [ , , , , 10, , , ] yycheck [0, 5, 6, 3, 7, 4, 3, 2, 9, -1, -1, -1, 10] yytable [8, 1, 2, 9, 11, 10, 9, 6, 12, 0, 0, 0, 13] yypact [-4, -5, -4, 0, 1, -5, 3, -3, -5, -4, -4, -5, 1, -5]
  32. • state 0ͷέʔεΛߟ͑ͯΈΔ Compress table (2) 0: [ , ,

    , , , 1, 2, ] 2: [ , , , , , 1, 2, ] 3: [ 8, , , 9, , , , ] 4: [ , , , , 10, , , ] 6: [ , , , 9, , , , ] 7: [ , , , , , , , 11] 9: [ , , , , , 1, 2, ] 10: [ , , , , , 1, 2, ] 12: [ , , , , 10, , , ] yycheck [0, 5, 6, 3, 7, 4, 3, 2, 9, -1, -1, -1, 10] yytable [8, 1, 2, 9, 11, 10, 9, 6, 12, 0, 0, 0, 13] yypact [-4, -5, -4, 0, 1, -5, 3, -3, -5, -4, -4, -5, 1, -5] yypact[0] = -4
  33. Compress table (2) index = 5 Ͱ͸yycheckͷ஋ͱҰக͢Δ 0: [ ,

    , , , , 1, 2, ] 2: [ , , , , , 1, 2, ] 3: [ 8, , , 9, , , , ] 4: [ , , , , 10, , , ] 6: [ , , , 9, , , , ] 7: [ , , , , , , , 11] 9: [ , , , , , 1, 2, ] 10: [ , , , , , 1, 2, ] 12: [ , , , , 10, , , ] yycheck [0, 5, 6, 3, 7, 4, 3, 2, 9, -1, -1, -1, 10] yytable [8, 1, 2, 9, 11, 10, 9, 6, 12, 0, 0, 0, 13] yypact [-4, -5, -4, 0, 1, -5, 3, -3, -5, -4, -4, -5, 1, -5]
  34. Compress table (2) index = 5 ͷ஋͸ 1 (= yytable[1])

    0: [ , , , , , 1, 2, ] 2: [ , , , , , 1, 2, ] 3: [ 8, , , 9, , , , ] 4: [ , , , , 10, , , ] 6: [ , , , 9, , , , ] 7: [ , , , , , , , 11] 9: [ , , , , , 1, 2, ] 10: [ , , , , , 1, 2, ] 12: [ , , , , 10, , , ] yycheck [0, 5, 6, 3, 7, 4, 3, 2, 9, -1, -1, -1, 10] yytable [8, 1, 2, 9, 11, 10, 9, 6, 12, 0, 0, 0, 13] yypact [-4, -5, -4, 0, 1, -5, 3, -3, -5, -4, -4, -5, 1, -5]
  35. DPNQSFTTFEUBCMFͷಛ௃ • double displacement • ෮ݩՄೳͳͷͰ໰୊ͳ͍ • default reductions •

    errorൃੜ͕஗Ԇͯ͠expected tokens͕มΘͬͯ͠·͏ • expected tokensͷܭࢉ͕ͦͷ࣌఺ͷstate stackʹґଘ ͢Δ
  36. &SSPSൃੜͷ஗Ԇ • “in”΋ॻ͘͜ͱ͕Ͱ͖Δ $ ruby -wce 'case a; in b;

    end' -e:1: warning: Pattern matching is experimental, and the behavior may change in future versions of Ruby! Syntax OK
  37. &SSPSൃੜͷ஗Ԇ case a; ^ State 375 737 opt_terms: terms .

    ["`when'", "`in'"] ... $default reduce using rule 737 (opt_terms) State 587 ... $default reduce using rule 330 (@18) State 717 331 primary: k_case expr_value opt_terms @18 . case_body k_end 367 k_when: . "`when'" 464 case_body: . k_when case_args then compstmt cases "`when'" shift, and go to state 719 k_when go to state 720 case_body go to state 841 default reduceʹΑͬͯstate 717·ͰҠಈ͢Δ
  38. 4UBDL΁ͷґଘ • ‘;’ ‘,’ ‘$end’͸ຊདྷͳΒerror • ຊདྷerror͔Ͳ͏͔Λ֬ೝ͢Δ ͨΊʹ࣮ࡍʹreduce͢Δඞཁ ͕͋Δ •

    ͜ͷܭࢉͷ݁Ռ͸ͦͷ࣌఺ͷ stackʹґଘ͢Δ • ςετͷෳࡶ͕͞૿͢ https://www.cs.uic.edu/~spopuri/cparser.html#modified-tables ΛҰ෦मਖ਼
  39. • tLABELΛు͖ग़͢ʹ͸EXPR_LABEL|EXPR_ENDFN͕ඞ ཁ 1JUGBMMPGUIFXPSLBSPVOE #define IS_LABEL_POSSIBLE() (\ (IS_lex_state(EXPR_LABEL|EXPR_ENDFN) && !cmd_state)

    || \ IS_ARG()) static enum yytokentype parse_ident(struct parser_params *p, int c, int cmd_state) { ... if (IS_LABEL_POSSIBLE()) { if (IS_LABEL_SUFFIX(0)) { SET_LEX_STATE(EXPR_ARG|EXPR_LABELED); nextc(p); set_yylval_name(TOK_INTERN()); return tLABEL; } }
  40. • fnameͷ࣍ͷΞΫγϣϯͰlex_stateΛηοτ͍ͯ͠Δ 1JUGBMMPGUIFXPSLBSPVOE | k_def singleton dot_or_colon {SET_LEX_STATE(EXPR_FNAME);} fname {

    $<num>4 = p->in_def; p->in_def = 1; SET_LEX_STATE(EXPR_ENDFN|EXPR_LABEL); /* force for args */ local_push(p, 0); $<id>$ = p->cur_arg; p->cur_arg = 0; } f_arglist
  41. #FGPSF • ͱΓ͋͑ͣΞΫγϣϯ(@26)Λ࣮ߦ͢Δ State 858 346 @26: . %empty 347

    primary: k_def singleton dot_or_colon @25 fname . @26 f_arglist bodystmt k_end $default reduce using rule 346 (@26) @26 go to state 961
  42. "GUFS • ΞΫγϣϯΛ࣮ߦ͢ΔલʹtLABELΛཁٻ͢ΔΑ͏ʹͳͬ ͨ State 858 346 @26: . %empty

    ["local variable or method", "global variable", "instance variable", "constant", "class variable", tLABEL, "**", "(", "*", "**arg", "&", '&', '*', '(', ';', '\n'] 347 primary: k_def singleton dot_or_colon @25 fname . @26 f_arglist bodystmt k_end "local variable or method" reduce using rule 346 (@26) ... tLABEL reduce using rule 346 (@26) ... @26 go to state 961
  43. static enum yytokentype -yylex(YYSTYPE *lval, YYLTYPE *yylloc, struct parser_params *p)

    +yylex(YYSTYPE *lval, YYLTYPE *yylloc, struct parser_params *p, int yystate, short *yyss, short *yyssp) { enum yytokentype t; + VALUE yysstack; p->lval = lval; lval->val = Qundef; + + yysstack = yysstack_new(yyss, yyssp); + + if (p->debug) { + VALUE tokens = expected_tokens(yystate, yysstack); + rb_parser_printf(p, "\nexpected_tokens (state = %d): %"PRIsVALUE"\n", yystate, tokens); + } + stackΛίϐʔ͢Δ stateͱstackΛҾ਺ʹ௥Ճ https://github.com/ruby/ruby/compare/master...yui-knk:feature/ expected_tokens_v2_7_0_preview3_heisei_01?expand=1
  44. /* See also: yysyntax_error and yybackup */ static VALUE expected_tokens(const

    int yystate, VALUE yysstack) { VALUE ary = rb_ary_new(); for (int yytoken = 0; yytoken < YYNTOKENS; ++yytoken) { push_expected_token(ary, yystate, yytoken, rb_ary_dup(yysstack)); } return ary; } શtokenʹରͯ͠push_expected_tokenΛݺͼग़͢
  45. static void push_expected_token(VALUE ary, const int yystate, const int yytoken,

    VALUE yysstack) { int yyn = yypact[yystate]; /* See: yydefault label */ if (yypact_value_is_default(yyn)) { int new_state; if ((new_state = default_reduce(yystate, yytoken, yysstack)) >= 0) { /* yysstack is changed */ push_expected_token(ary, new_state, yytoken, yysstack); } return; } default reduction ͷͱ͖ ࣮ࡍʹreduceͯࣗ͠਎Λ࠶ؼݺͼग़͠
  46. yyn += yytoken; if (yyn < 0 || YYLAST <

    yyn || yycheck[yyn] != yytoken) { int new_state; if ((new_state = default_reduce(yystate, yytoken, yysstack)) >= 0) { /* yysstack is changed */ push_expected_token(ary, new_state, yytoken, yysstack); } return; } default reduction ͷͱ͖ ࣮ࡍʹreduceͯࣗ͠਎Λ࠶ؼݺͼग़͠
  47. yyn = yytable[yyn]; if (yyn <= 0) { if (!yytable_value_is_error(yyn))

    { rb_ary_push(ary, rb_str_new2(yytname[yytoken])); return; } } else { rb_ary_push(ary, rb_str_new2(yytname[yytoken])); return; } } reduction ͷͱ͖ shift ͷͱ͖ expected tokensʹ௥Ճ͢Δ
  48. $ ./miniruby --dump=y -e 'case a;' ... Entering state 375

    Reading a token: expected_tokens (state = 375): ["\"`when'\"", "\"`in'\"", "';'"] ... -e:1: syntax error, unexpected end-of-input, expecting `when' ...
  49. • yypact΍yytableͳͲBisonͷ࣮૷ʹڧ͘ґଘ͢Δ࣮૷ʹ ͳ͍ͬͯΔ yyn += yytoken; if (yyn < 0

    || YYLAST < yyn || yycheck[yyn] != yytoken) { int new_state; if ((new_state = default_reduce(yystate, yytoken, yysstack)) >= 0) { /* yysstack is changed */ push_expected_token(ary, new_state, yytoken, yysstack); } return; }
  50. • ੈͷதʹ͸ tool/ytab.sed ͷΑ͏ͳίʔυ΋͋Δ #!/bin/sed -f # This file is

    used when generating code for the Ruby parser. ... s/^yysyntax_error (/&struct parser_params *p, / s/ yysyntax_error (/&p, / s/\( YYFPRINTF *(\)yyoutput,/\1p,/ s/\( YYFPRINTF *(\)yyo,/\1p,/ s/\( YYFPRINTF *(\)stderr,/\1p,/ s/\( YYDPRINTF *((\)stderr,/\1p,/ s/^\([ ]*\)\(yyerror[ ]*([ ]*parser,\)/\1parser_\2/ s!^ *extern char \*getenv();!/* & */! s/^\(#.*\)".*\.tab\.c"/\1"parse.c"/ /^\(#.*\)".*\.y"/s:\\\\:/:g
  51. ࢀߟจݙ • Rubyॲཧܥશൠ • http://i.loveruby.net/ja/rhg/book/ • "Rubyͷ͘͠ΈɹRuby Under a Microscope”

    • Parser • http://i.loveruby.net/ja/rhg/book/ • "ίϯύΠϥ―ݪཧɾٕ๏ɾπʔϧ (Information & Computing)” • Bison • https://www.cs.uic.edu/~spopuri/cparser.html