RNode with code locations

RNode with code locations Jun 1, 2018 in RubyKaigi 2018
@yui-knk Yuichiro Kaneko

Self-introduction • Yuichiro Kaneko

Self-introduction • Yuichiro Kaneko • Asakusa.rb • A CRuby Committer
(2015/12~) • GitHub (yui-knk) • Twitter (spikeolaf)

I will join Treasure Data next week!!!

Today's topic • RNode • Node of Abstract Syntax Tree
• Code location • Location information of RNode

Run the code • Run the code on … •
Ruby 2.4 • Ruby 2.5 $ ruby --dump=p -e '"str".upcase'

$ ruby --dump=p -e '"str".upcase' # @ NODE_SCOPE (line: 1,
code_range: (1,0)-(1,12)) # +- nd_tbl: (empty) # +- nd_args: # | (null node) # +- nd_body: # @ NODE_PRELUDE (line: 1, code_range: (1,0)-(1,12)) # +- nd_head: # | (null node) # +- nd_body: # | @ NODE_CALL (line: 1, code_range: (1,0)-(1,12)) # | +- nd_mid: :upcase # | +- nd_recv: # | | @ NODE_STR (line: 1, code_range: (1,0)-(1,5)) # | | +- nd_lit: "str" # | +- nd_args: # | (null node) # +- nd_compile_option: # +- coverage_enabled: false Ruby 2.5

Agenda • What code locations are • Why code locations
are needed • Ruby crash course • How to implement code locations • The future plan of code locations feature • Conclusion

What code locations are

Location information in programming • Location information of script is
used in various situations.

Exception Traceback (most recent call last): 1: from src/exception.rb:5:in `<main>'
src/exception.rb:2:in `a': undefined method `foo' for "":String (NoMethodError)

Warning src/warning.rb:2: warning: instance variable @a not initialized

Location information in programming • Location information of script is
used in various situations. • "An exception is raised from line number XX" (Exception) • "Instance variable of line number XX not initialized" (Warning) • "No test cases for line number XX" (Coverage)

Is line number enough to represent location?

Location and position • Location is presented by 2 numbers:
• Line number (lineno) • Distance from beginning of line (column)

Location and position • 4 numbers are needed to represent
“begin” and “end”. • "Code position" is a pair of lineno and column. • "Code location" is a pair of begin position and end position. 1 + 2 ^ ^ ^ | | +- @3 (1.4-1.5) | +--- @2 (1.2-1.3) +----- @1 (1.0-1.1) @3 (1.4-1.5) Code position (begin) Code location @3 (1.4-1.5) Lineno (1) Column (4) Code position (end)

Location in Ruby • Ruby holds *only* line numbers until
Ruby 2.4. • Ruby holds line numbers and columns since Ruby 2.5. • Today’s main topic is “Column”.

Minor details about column • 0-based / 1-based • 0-based
• Vary according to programming languages and editors. • From the beginning of line / ﬁle • Line

Minor details about column • Byte length / Character length
• Byte length • “ߏจ໦ʹৄࡉͳҐஔ৘ใΛ΋ͨͤΔܭը” • https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/ Node-position-memo

Why code locations are needed

For coverage features • For branch coverage and method coverage
(Ruby 2.5~). • "An introduction and future of Ruby coverage library” • http://rubykaigi.org/2017/presentations/mametter.html (30:50-)

What is branch coverage • "Branch coverage tells you which
branches are executed, and which not." (doc/NEWS-2.5.0) (a == 2) ? :t : :f

What is branch coverage • You may forget to write
test codes for `then` cases. • `n/m` • `n`: How many times the “then clause” is executed. • `m`: How many times the “else clause” is executed. 0/1: (a == 2) ? :t : :f

Use-case (1) • Code locations can be used for visualizing
branch coverage results. 0/1: (a == 2) ? :t : :f

Use-case (1) • Code locations can be used for visualizing
branch coverage results. 0/1: (a == 2) ? :t : :f YOU SHOULD WRITE TEST !!!

Use-case (2) • One line can contain one or more
branches. • In these case, we can't recognize which clause is executed by only line numbers. (a == b) ? ((c == d) ? :A : :B) : :C obj&.foo? ? "a" : "b"

Ruby crash course

How Ruby script is processed 4UFQ *OQVU 0VUQVU %FCVH 4PSVDF
5PLFOJ[BUJPO 3VCZTDSJQU 5PLFOT EVNQZ QBSTFZ 1BSTJOH 5PLFOT "45 EVNQQ QBSTFZ $PNQJMF "45 #ZUFDPEF EVNQJ DPNQJMFD Parsing ___ \ Ruby script -> Tokens -> AST -> Byte code (insns / ISeq) __/ __/ Tokenization Compile

Ruby script 1 + 2

Tokenization Parsing ___ \ Ruby script -> Tokens -> AST
-> Byte code (insns / ISeq) __/ __/ Tokenization Compile

Tokenization • Each token has • a token type (tINTEGER)
• a semantic value (1) 1 + 2 ^ ^ ^^ | | |+--- '\n' / "end-of-input" | | +---- tINTEGER (2) | +------ '+' +-------- tINTEGER (1)

Tokenization $ ruby --dump=y -e '1 + 2' | grep
Shifting Shifting token tINTEGER (1.0-1.1: ) Shifting token '+' (1.2-1.3: ) Shifting token tINTEGER (1.4-1.5: ) Shifting token '\n' (1.5-1.5: ) Shifting token "end-of-input" (1.5-1.5: ) On Ruby 2.5.1 1 2

Tokenization $ ruby --dump=y -e '1 + 2' | grep
Shifting On Ruby 2.6.0preview1 Shifting token tINTEGER (1.0-1.1: 1) Shifting token '+' (1.2-1.3: ) Shifting token tINTEGER (1.4-1.5: 2) Shifting token '\n' (1.5-1.5: ) Shifting token "end-of-input" (1.5-1.5: )

r61997 / 46e2fad

Parsing Parsing ___ \ Ruby script -> Tokens -> AST

Parsing • Analyzes tokens conforming to the rules of Ruby
syntax. • Builds AST.

Parsing numeric : simple_numeric | tUMINUS_NUM simple_numeric %prec tLOWEST ;
simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt parse.y

simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt W W W Rules parse.y

simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt 1 2.1 3r 4i parse.y

simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt 1 parse.y

simple_numeric : tINTEGER | tFLOAT | tRATIONAL | tIMAGINARY ; %% program : { } top_compstmt Goal parse.y

Parsing $ ruby --dump=y -e '1 + 2' # Shifting
token tINTEGER (1) tINTEGER simple_numeric numeric literal primary arg

Parsing # Shifting token '+' arg '+' # Shifting token
tINTEGER (2) arg '+' tINTEGER arg '+' simple_numeric arg '+' numeric arg '+' literal arg '+' primary arg '+' arg arg expr stmt top_stmt top_stmts

Parsing # Shifting token '\n' top_stmts '\n' top_stmts term top_stmts
terms top_stmts opt_terms top_compstmt program # Completed

Build AST $ ruby --dump=p -e '1 + 2' NODE_SCOPE
NODE_PRELUDE NODE_OPCALL (:+) NODE_LIT (1) NODE_ARRAY NODE_LIT (2) NODE_SCOPE NODE_PRELUDE NODE_OPCALL (:+) NODE_LIT (1) NODE_ARRAY NODE_LIT (2)

Build AST typedef struct RNode { VALUE flags; union {
struct RNode *node; ... } u1; union { struct RNode *node; ... } u2; union { struct RNode *node; ... } u3; rb_code_location_t nd_loc; } NODE;

struct RNode *node; ... } u1; union { struct RNode *node; ... } u2; union { struct RNode *node; ... } u3; rb_code_location_t nd_loc; } NODE; Contain node_type

struct RNode *node; ... } u1; union { struct RNode *node; ... } u2; union { struct RNode *node; ... } u3; rb_code_location_t nd_loc; } NODE; Contain node_type Contain various data

struct RNode *node; ... } u1; union { struct RNode *node; ... } u2; union { struct RNode *node; ... } u3; rb_code_location_t nd_loc; } NODE; Contain node_type Contain various data Contain Location information

Build AST • Builds AST in actions. • $1 stands
for the value of the 1st component (`arg`). arg | arg '+' arg { $$ = call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { NODE *expr; value_expr(recv); value_expr(arg1); expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); nd_set_line(expr, op_loc->beg_pos.lineno); return expr; }

Build AST arg | arg '+' arg { $$ =
call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { NODE *expr; value_expr(recv); value_expr(arg1); expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); nd_set_line(expr, op_loc->beg_pos.lineno); return expr; } Action

call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { NODE *expr; value_expr(recv); value_expr(arg1); expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); nd_set_line(expr, op_loc->beg_pos.lineno); return expr; }

call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { NODE *expr; value_expr(recv); value_expr(arg1); expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); nd_set_line(expr, op_loc->beg_pos.lineno); return expr; } Create NODE_OPCALL

Build AST $ ruby --dump=p -e '1 + 2' NODE_SCOPE
NODE_PRELUDE NODE_OPCALL (:+) NODE_LIT (1) NODE_ARRAY NODE_LIT (2)

Compile Parsing ___ \ Ruby script -> Tokens -> AST

Compile • Do compile. • See “compile.c”. $ ruby --dump=i
-e '1 + 2' == disasm: #<ISeq:<main>@-e:1 (1,0)- (1,5)>============================== 0000 putobject_OP_INT2FIX_O_1_C_ ( 1)[Li] 0001 putobject 2 0003 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache> 0006 leave

Compile • Do compile. • See “compile.c”. $ ruby --dump=i
-e '1 + 2' == disasm: #<ISeq:<main>@-e:1 (1,0)- (1,5)>============================== 0000 putobject_OP_INT2FIX_O_1_C_ ( 1)[Li] 0001 putobject 2 0003 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache> 0006 leave ISeq 4 insn(s)

References • "Ruby Hacking Guide" (Part 2: Syntax analysis) •
https://ruby-hacking-guide.github.io/ [EN] • http://i.loveruby.net/ja/rhg/book/ [JA] • "Ruby Under a Microscope" / “Rubyͷ͘͠Έ"

How to implement code locations

Goal • Branch coverage • To pass code locations to
compile phase. • Method coverage • To store code locations on ISeq. • What should we implement • Embed code locations into each NODE.

Hint • Original source of location information is Ruby script.
• If we want to use location information in "n"th step, we should implement location information in "n-1"th step. • In this case, it's need to pass location information from "Tokenization" to "Compile" to use location information in compile phase. Parsing ___ \ Ruby script -> Tokens -> AST -> Byte code (insns / ISeq) __/ __/ Tokenization Compile

parser_params crash course • One on the main data structure
of parser. • Too Big!!! struct parser_params { rb_imemo_tmpbuf_t *heap; YYSTYPE *lval; struct { rb_strterm_t *strterm; VALUE (*gets)(struct parser_params*,VALUE); VALUE input; VALUE prevline; VALUE lastline; VALUE nextline; const char *pbeg; const char *pcur; const char *pend; const char *ptok; long gets_ptr; enum lex_state_e state; /* track the nest level of any parens "()[]{}" */ int paren_nest; /* keep p->lex.paren_nest at the beginning of lambda "->" to detect tLAMBEG and keyword_do_LAMBDA */ int lpar_beg; /* track the nest level of only braces "{}" */ int brace_nest; } lex; stack_type cond_stack; stack_type cmdarg_stack; int tokidx; int toksiz; int tokline; int heredoc_end; int heredoc_indent; int heredoc_line_indent; char *tokenbuf; struct local_vars *lvtbl; int line_count; int ruby_sourceline; /* current line no. */ char *ruby_sourcefile; /* current source file */ VALUE ruby_sourcefile_string; rb_encoding *enc; token_info *token_info; VALUE compile_option; VALUE debug_buffer; VALUE debug_output; ID cur_arg; rb_ast_t *ast; unsigned int command_start:1; unsigned int eofp: 1; unsigned int ruby__end__seen: 1; unsigned int debug: 1; unsigned int has_shebang: 1; unsigned int in_defined: 1; unsigned int in_main: 1; unsigned int in_kwarg: 1; unsigned int in_def: 1; unsigned int in_class: 1; unsigned int token_seen: 1; unsigned int token_info_enabled: 1; # if WARN_PAST_SCOPE unsigned int past_scope_enabled: 1; # endif unsigned int error_p: 1; unsigned int cr_seen: 1; #ifndef RIPPER /* Ruby core only */ unsigned int do_print: 1; unsigned int do_loop: 1; unsigned int do_chomp: 1; unsigned int do_split: 1; unsigned int warn_location: 1; NODE *eval_tree_begin; NODE *eval_tree; VALUE error_buffer; VALUE debug_lines; VALUE coverage; const struct rb_block *base_block; #else /* Ripper only */ VALUE delayed; int delayed_line; int delayed_col; VALUE value; VALUE result; VALUE parsing_thread; #endif };

parser_params crash course • It has struct for lexer (`lex`).
• Lexer processes input in units of lines. • *Basically* processes from top to bottom. struct parser_params { ... struct { ... VALUE prevline; VALUE lastline; VALUE nextline; const char *pbeg; const char *pcur; const char *pend; const char *ptok; ... } lex; ... }; Lines W W Pointers

What is column /* parse.y */ /* Structure of Lexer
Buffer: lex.pbeg lex.ptok lex.pcur lex.pend | | | | |------------+------------+------------| |<---------->| token */

What is column • When token '+' is recognized (Left).
• When token tINTEGER (2) is recognized (Right). 1 + 2 ^ ^^ ^ | || +--- lex.pend | |+----- lex.pcur | +------ lex.ptok +-------- lex.pbeg 1 + 2 ^ ^ ^ | | +--- lex.pcur, lex.pend | +----- lex.ptok +-------- lex.pbeg

What is column • `lex.ptok - lex.pbeg` (begin) and `lex.pcur
- lex.pbeg` (end). • Column is a diﬀerence between pointers when a token is recognized. |--| lex.pcur - lex.pbeg (end) |-| lex.ptok - lex.pbeg (begin) 1 + 2 ^ ^^ ^ | || +--- lex.pend | |+----- lex.pcur | +------ lex.ptok +-------- lex.pbeg

What is column • We must store columns somewhere before
next token is recognized. 1 + 2 ^ ^^ ^ | || +--- lex.pend | |+----- lex.pcur | +------ lex.ptok +-------- lex.pbeg 1 + 2 ^ ^ ^ | | +--- lex.pcur, lex.pend | +----- lex.ptok +-------- lex.pbeg

From Ruby script to tokens • Copy location information to
`YYLTYPE *yylloc` in `yylex`. • The `yylloc` argument is newly added to `yylex`. • Call `RUBY_SET_YYLLOC` to set `yylloc`.

static enum yytokentype yylex(YYSTYPE *lval, YYLTYPE *yylloc, struct parser_params *p)
{ enum yytokentype t; p->lval = lval; lval->val = Qundef; t = parser_yylex(p); if (has_delayed_token(p)) dispatch_delayed_token(p, t); else if (t != 0) dispatch_scan_event(p, t); if (p->lex.strterm && (p->lex.strterm->flags & STRTERM_HEREDOC)) RUBY_SET_YYLLOC_FROM_STRTERM_HEREDOC(*yylloc); else RUBY_SET_YYLLOC(*yylloc); return t; }

{ enum yytokentype t; p->lval = lval; lval->val = Qundef; t = parser_yylex(p); if (has_delayed_token(p)) dispatch_delayed_token(p, t); else if (t != 0) dispatch_scan_event(p, t); if (p->lex.strterm && (p->lex.strterm->flags & STRTERM_HEREDOC)) RUBY_SET_YYLLOC_FROM_STRTERM_HEREDOC(*yylloc); else RUBY_SET_YYLLOC(*yylloc); return t; } New argument

{ enum yytokentype t; p->lval = lval; lval->val = Qundef; t = parser_yylex(p); if (has_delayed_token(p)) dispatch_delayed_token(p, t); else if (t != 0) dispatch_scan_event(p, t); if (p->lex.strterm && (p->lex.strterm->flags & STRTERM_HEREDOC)) RUBY_SET_YYLLOC_FROM_STRTERM_HEREDOC(*yylloc); else RUBY_SET_YYLLOC(*yylloc); return t; } New argument static enum yytokentype yylex(YYSTYPE *lval, YYLTYPE *yylloc, struct parser_params *p) { enum yytokentype t; p->lval = lval; lval->val = Qundef; t = parser_yylex(p); if (has_delayed_token(p)) dispatch_delayed_token(p, t); else if (t != 0) dispatch_scan_event(p, t); if (p->lex.strterm && (p->lex.strterm->flags & STRTERM_HEREDOC)) RUBY_SET_YYLLOC_FROM_STRTERM_HEREDOC(*yylloc); else RUBY_SET_YYLLOC(*yylloc); return t; } Create token

{ enum yytokentype t; p->lval = lval; lval->val = Qundef; t = parser_yylex(p); if (has_delayed_token(p)) dispatch_delayed_token(p, t); else if (t != 0) dispatch_scan_event(p, t); if (p->lex.strterm && (p->lex.strterm->flags & STRTERM_HEREDOC)) RUBY_SET_YYLLOC_FROM_STRTERM_HEREDOC(*yylloc); else RUBY_SET_YYLLOC(*yylloc); return t; } New argument static enum yytokentype yylex(YYSTYPE *lval, YYLTYPE *yylloc, struct parser_params *p) { enum yytokentype t; p->lval = lval; lval->val = Qundef; t = parser_yylex(p); if (has_delayed_token(p)) dispatch_delayed_token(p, t); else if (t != 0) dispatch_scan_event(p, t); if (p->lex.strterm && (p->lex.strterm->flags & STRTERM_HEREDOC)) RUBY_SET_YYLLOC_FROM_STRTERM_HEREDOC(*yylloc); else RUBY_SET_YYLLOC(*yylloc); return t; } Create token Set `yylloc`

From tokens to Nodes • Now we can use `@n`
in each action. • `@n` stands for the location of the nth component of the right hand side. • `@$` stands for the location of the left hand side grouping (`YYLTYPE yyloc`). • Set by `YYLLOC_DEFAULT`. • https://www.gnu.org/software/bison/manual/html_node/ Tracking-Locations.html#Tracking-Locations

|---| @$ (1.0-1.5) 1 + 2 ^ ^ ^ |
| +- @3 (1.4-1.5) | +--- @2 (1.2-1.3) +----- @1 (1.0-1.1) arg | arg '+' arg { $$ = call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { ... expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); ... } arg | arg '+' arg { $$ = call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { ... expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); ... } 1 2

|---| @$ (1.0-1.5) 1 + 2 ^ ^ ^ |
| +- @3 (1.4-1.5) | +--- @2 (1.2-1.3) +----- @1 (1.0-1.1) arg | arg '+' arg { $$ = call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { ... expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); ... } (1.0-1.5) 1 2

|---| @$ (1.0-1.5) 1 + 2 ^ ^ ^ |
| +- @3 (1.4-1.5) | +--- @2 (1.2-1.3) +----- @1 (1.0-1.1) arg | arg '+' arg { $$ = call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { ... expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); ... } (1.0-1.5) arg | arg '+' arg { $$ = call_bin_op(p, $1, '+', $3, &@2, &@$); } static NODE * call_bin_op(struct parser_params *p, NODE *recv, ID id, NODE *arg1, const YYLTYPE *op_loc, const YYLTYPE *loc) { ... expr = NEW_OPCALL(recv, id, NEW_LIST(arg1, &arg1->nd_loc), loc); ... } 1 2

`NODE_OPCALL` is simple • All location needed to NODE_OPCALL is
supplied when create NODE_OPCALL. • NODE_LIT (1) • mid (:+) • NODE_ARRAY (2) NODE_OPCALL 1 + 2 ----------- ------------ arg : arg '+' arg

`NODE_*ASGN` family is not simple • `NODE_LASGN` (local variable assignment)
or `NODE_IASGN` (instance variable assignment). @a = 1 NODE_SCOPE NODE_IASGN (:@a) NODE_LIT (1)

NODE_IASGN @a ---------- ------------- lhs : user_variable { /*%%%*/ $$
= assignable(p, $1, 0, &@$); /*% %*/ } NODE_IASGN NODE_IASGN = 1 ---------- ------------------------ arg : lhs '=' arg_rhs { /*%%%*/ $$ = node_assign(p, $1, $3, &@$); /*% %*/ /*% ripper: assign!($1, $3) %*/ }

= assignable(p, $1, 0, &@$); /*% %*/ } NODE_IASGN NODE_IASGN = 1 ---------- ------------------------ arg : lhs '=' arg_rhs { /*%%%*/ $$ = node_assign(p, $1, $3, &@$); /*% %*/ /*% ripper: assign!($1, $3) %*/ } Create NODE_IASGN

= assignable(p, $1, 0, &@$); /*% %*/ } NODE_IASGN NODE_IASGN = 1 ---------- ------------------------ arg : lhs '=' arg_rhs { /*%%%*/ $$ = node_assign(p, $1, $3, &@$); /*% %*/ /*% ripper: assign!($1, $3) %*/ } Create NODE_IASGN Location is determined

= assignable(p, $1, 0, &@$); /*% %*/ } NODE_IASGN NODE_IASGN = 1 ---------- ------------------------ arg : lhs '=' arg_rhs { /*%%%*/ $$ = node_assign(p, $1, $3, &@$); /*% %*/ /*% ripper: assign!($1, $3) %*/ } Create NODE_IASGN Update location Location is determined

`NODE_*ASGN` family is not simple • Create `NODE_*ASGN`. • Assign
right hand side. • So it's needed to update location of `NODE_*ASGN` when right hand side is assigned.

`NODE_ITER` is not simple • `NODE_ITER` (method call with block).
3.times { foo } NODE_ITER NODE_CALL (:times) NODE_LIT (3) NODE_SCOPE NODE_VCALL (:foo)

NODE_CALL 3 . times ----------- -------------------------------------------------- method_call | primary_value call_op
operation2 opt_paren_args { } NODE_ITER { NODE_ITER } ----------- --------------------- brace_block : '{' brace_body '}' { } NODE_ITER NODE_CALL NODE_ITER --------- ----------------------- primary | method_call brace_block { /*%%%*/ block_dup_check(p, $1->nd_args, $2); $$ = method_add_block(p, $1, $2, &@$); /*% %*/ }

operation2 opt_paren_args { } NODE_ITER { NODE_ITER } ----------- --------------------- brace_block : '{' brace_body '}' { } NODE_ITER NODE_CALL NODE_ITER --------- ----------------------- primary | method_call brace_block { /*%%%*/ block_dup_check(p, $1->nd_args, $2); $$ = method_add_block(p, $1, $2, &@$); /*% %*/ } `3.times` `{ foo }`

operation2 opt_paren_args { } NODE_ITER { NODE_ITER } ----------- --------------------- brace_block : '{' brace_body '}' { } NODE_ITER NODE_CALL NODE_ITER --------- ----------------------- primary | method_call brace_block { /*%%%*/ block_dup_check(p, $1->nd_args, $2); $$ = method_add_block(p, $1, $2, &@$); /*% %*/ } `3.times` `{ foo }` Update location

`NODE_ITER` is not simple • `NODE_CALL` is created. • `NODE_ITER`
is created. • `NODE_ITER` is added to `NODE_CALL`. • It's needed to update location of `NODE_ITER` when it is passed to `NODE_CALL`.

$ git shortlog -s -n parse.y | head -10 XXX
???? 362 matz 133 mame 88 yui-knk 55 ko1 38 aamine 37 akr 33 naruse 25 usa 7 normal On "v2_6_0_preview1".

$ git shortlog -s -n parse.y | head -10 XXX
???? 362 matz 133 mame 88 yui-knk 55 ko1 38 aamine 37 akr 33 naruse 25 usa 7 normal On "v2_6_0_preview1". Me

$ git shortlog -s -n parse.y | head -10 884
nobu 362 matz 133 mame 88 yui-knk 55 ko1 38 aamine 37 akr 33 naruse 25 usa 7 normal On "v2_6_0_preview1".

nobu 362 matz 133 mame 88 yui-knk 55 ko1 38 aamine 37 akr 33 naruse 25 usa 7 normal On "v2_6_0_preview1". x 10

nobu 362 matz 133 mame 88 yui-knk 55 ko1 38 aamine 37 akr 33 naruse 25 usa 7 normal On "v2_6_0_preview1". x 10 @nobu is the lord of Demon Castle "parse.y".

How to test • Define some rules and check all
ruby files in "test" directory follow the rules. • Related files: • “ext/-test-/ast/ast.c" • "test/-ext-/ast/test_ast.rb" On "v2_6_0_preview1".

"ext/-test-" and "test/-ext-" • "ext/-test-" contains C extensions which are
used in Ruby's tests. • "ext/-test-/ast/ast.c" defines ÀST` module and ÀST::Node` class. On "v2_6_0_preview1".

"ext/-test-" and "test/-ext-" • "test/-ext-" contains test cases which depend
“ext/- test-". On "v2_6_0_preview1".

Rule 1 • `lineno` is initialized with `0` and `column`
with `-1`. • Validate all node locations are update at least once. NODE_IF (line: 1, location: (0,-1)-(0,-1))

Rule 2 • Validate children do not exceed a parent
location. 3.times { foo } NODE_ITER [1.0-1.15] NODE_CALL (:times) [1.0-1.8] NODE_LIT (3) [1.0-1.1] NODE_SCOPE [1.8-1.15] NODE_VCALL (:foo) [1.10-1.13]

location. 3.times { foo } NODE_ITER [1.0-1.15] NODE_CALL (:times) [1.0-1.8] NODE_LIT (3) [1.0-1.1] NODE_SCOPE [1.8-1.15] -> covers [1.10-1.13] NODE_VCALL (:foo) [1.10-1.13]

location. 3.times { foo } NODE_ITER [1.0-1.15] NODE_CALL (:times) [1.0-1.8] -> covers [1.0-1.1] NODE_LIT (3) [1.0-1.1] NODE_SCOPE [1.8-1.15] -> covers [1.10-1.13] NODE_VCALL (:foo) [1.10-1.13]

location. 3.times { foo } NODE_ITER [1.0-1.15] -> covers [1.0-1.8] and [1.8-1.15] NODE_CALL (:times) [1.0-1.8] -> covers [1.0-1.1] NODE_LIT (3) [1.0-1.1] NODE_SCOPE [1.8-1.15] -> covers [1.10-1.13] NODE_VCALL (:foo) [1.10-1.13]

Dir.glob("test/**/*.rb", base: SRCDIR).each do |path| define_method("test_ranges:#{path}") do helper = Helper.new("#{SRCDIR}/#{path}")
helper.validate_range assert_equal([], helper.errors) end end test/-ext-/ast/test_ast.rb

helper.validate_range assert_equal([], helper.errors) end end Check all ruby ﬁles in "test" directory test/-ext-/ast/test_ast.rb

helper.validate_range assert_equal([], helper.errors) end end Check all ruby ﬁles in "test" directory Validate each ﬁle test/-ext-/ast/test_ast.rb

def validate_range0(node) beg_pos, end_pos = node.beg_pos, node.end_pos children = node.children.compact
min = children.map(&:beg_pos).min max = children.map(&:end_pos).max unless beg_pos <= min @errors << { type: :min_validation_error, min: min, beg_pos: beg_pos, node: node } end unless max <= end_pos @errors << { type: :max_validation_error, max: max, end_pos: end_pos, node: node } end children.each do |child| validate_range0(child) end end ast = AST.parse_file(@path) validate_not_cared0(ast) test/-ext-/ast/test_ast.rb

min = children.map(&:beg_pos).min max = children.map(&:end_pos).max unless beg_pos <= min @errors << { type: :min_validation_error, min: min, beg_pos: beg_pos, node: node } end unless max <= end_pos @errors << { type: :max_validation_error, max: max, end_pos: end_pos, node: node } end children.each do |child| validate_range0(child) end end ast = AST.parse_file(@path) validate_not_cared0(ast) Generate AST test/-ext-/ast/test_ast.rb

min = children.map(&:beg_pos).min max = children.map(&:end_pos).max unless beg_pos <= min @errors << { type: :min_validation_error, min: min, beg_pos: beg_pos, node: node } end unless max <= end_pos @errors << { type: :max_validation_error, max: max, end_pos: end_pos, node: node } end children.each do |child| validate_range0(child) end end ast = AST.parse_file(@path) validate_not_cared0(ast) Generate AST test/-ext-/ast/test_ast.rb Check ranges

min = children.map(&:beg_pos).min max = children.map(&:end_pos).max unless beg_pos <= min @errors << { type: :min_validation_error, min: min, beg_pos: beg_pos, node: node } end unless max <= end_pos @errors << { type: :max_validation_error, max: max, end_pos: end_pos, node: node } end children.each do |child| validate_range0(child) end end ast = AST.parse_file(@path) validate_not_cared0(ast) Generate AST test/-ext-/ast/test_ast.rb Check ranges Check children

The future plan of code locations feature

Case 1 (Proc/Method) • Add new methods to Proc/Method which
return their code location. def a(&block) p block.code_location end a do 1 + 2 end # => [[5, 2], [7, 3]] p self.class.instance_method(:a).code_location # => [[1, 0], [3, 3]] https://github.com/yui-knk/ruby/tree/feature/rb_iseq_code_location

Case 2 (NoMethodError) • Give `NoMethodError` more detailed message. class
A def foo nil end end A.new.foo.foo Traceback (most recent call last): /tmp/test.rb:7:in `<main>': undefined method `foo' for nil:NilClass (NoMethodError) A.new.foo.foo ^^^^ https://github.com/yui-knk/ruby/tree/feature/node_id

Case 3 (AST module) AST.parse("1 + 2") # => #<AST::Node(NODE_SCOPE(0)
1:0, 1:5 (4)): > AST.parse("1 + 2").children[1] # => #<AST::Node(NODE_OPCALL(36) 1:0, 1:5 (3)): > AST.parse("1 + 2").children[1].children # => [#<AST::Node(NODE_LIT(59) 1:0, 1:1 (0)): >, #<AST::Node(NODE_ARRAY(42) 1:4, 1:5 (2)): >]

• We discussed this topic at Developers Meeting yesterday.

Committed

Conference Driven Development !!!

Case 3 (AST module) • We can get children nodes.
RubyVM::AST.parse("1 + 2") # => #<RubyVM::AST::Node(NODE_SCOPE(0) 1:0, 1:5): > RubyVM::AST.parse("1 + 2").children[1] # => #<RubyVM::AST::Node(NODE_OPCALL(36) 1:0, 1:5): > RubyVM::AST.parse("1 + 2").children[1].children # => [#<RubyVM::AST::Node(NODE_LIT(59) 1:0, 1:1): >, #<RubyVM::AST::Node(NODE_ARRAY(42) 1:4, 1:5): >]

Case 3 (AST module) • We can get location information.
[RubyVM::AST.parse("1 + 2").first_lineno, RubyVM::AST.parse("1 + 2").first_column] # => [1, 0] [RubyVM::AST.parse("1 + 2").last_lineno, RubyVM::AST.parse("1 + 2").last_column] # => [1, 5]

Enjoy programming with Ruby 2.6.0-preview2!

Conclusion

Acknowledgments • @mametter • @nobu • @ko1 • @shyouhei •
@takeshinoda • @hkdnet • @HaiTo • @littlestarling

Conclusion • AST Node has location information. • Share the
future plan of code locations feature. • If you have any idea to use location information, please let me know :) • https://bugs.ruby-lang.org/ • You now get the map of Demon Castle "parse.y", let's hack “parse.y" :)

Thank you!!!

Bonus track

How to implement more detailed message of `NoMethodError`

Target code class A def foo nil end end A.new.foo.foo

== disasm: #<ISeq:<main>@src/no_method_error2.rb:1 (1,0)-(7,13)> (catch: FALSE) 0000 putspecialobject 3 (
1)[ 0][Li] 0002 putnil [ 9] 0003 defineclass :A, <class:A>, 0 0007 pop 0008 getinlinecache 15, <is:0> ( 7)[ 10][Li] 0011 getconstant :A 0013 setinlinecache <is:0> 0015 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>[ 11] 0018 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 12] 0021 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 13] 0024 leave [ 10] node_id • Store node_id of insn on an ISeq.

• Store node_id of insn on an ISeq. • We
can distinguish between `#foo`s by node_id. == disasm: #<ISeq:<main>@src/no_method_error2.rb:1 (1,0)-(7,13)> (catch: FALSE) 0000 putspecialobject 3 ( 1)[ 0][Li] 0002 putnil [ 9] 0003 defineclass :A, <class:A>, 0 0007 pop 0008 getinlinecache 15, <is:0> ( 7)[ 10][Li] 0011 getconstant :A 0013 setinlinecache <is:0> 0015 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>[ 11] 0018 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 12] 0021 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 13] 0024 leave [ 10]

Exceptions • Exceptions have an ISeq and a program counter
(pc). == disasm: #<ISeq:<main>@src/no_method_error2.rb:1 (1,0)-(7,13)> (catch: FALSE) 0000 putspecialobject 3 ( 1)[ 0][Li] 0002 putnil [ 9] 0003 defineclass :A, <class:A>, 0 0007 pop 0008 getinlinecache 15, <is:0> ( 7)[ 10][Li] 0011 getconstant :A 0013 setinlinecache <is:0> 0015 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>[ 11] 0018 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 12] 0021 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 13] 0024 leave [ 10]

Exceptions • Exceptions have an ISeq and a program counter
(pc). == disasm: #<ISeq:<main>@src/no_method_error2.rb:1 (1,0)-(7,13)> (catch: FALSE) 0000 putspecialobject 3 ( 1)[ 0][Li] 0002 putnil [ 9] 0003 defineclass :A, <class:A>, 0 0007 pop 0008 getinlinecache 15, <is:0> ( 7)[ 10][Li] 0011 getconstant :A 0013 setinlinecache <is:0> 0015 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>[ 11] 0018 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 12] 0021 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 13] 0024 leave [ 10] Exception

Exceptions • Get node_id from an exception. == disasm: #<ISeq:<main>@src/no_method_error2.rb:1
(1,0)-(7,13)> (catch: FALSE) 0000 putspecialobject 3 ( 1)[ 0][Li] 0002 putnil [ 9] 0003 defineclass :A, <class:A>, 0 0007 pop 0008 getinlinecache 15, <is:0> ( 7)[ 10][Li] 0011 getconstant :A 0013 setinlinecache <is:0> 0015 opt_send_without_block <callinfo!mid:new, argc:0, ARGS_SIMPLE>, <callcache>[ 11] 0018 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 12] 0021 opt_send_without_block <callinfo!mid:foo, argc:0, ARGS_SIMPLE>, <callcache>[ 13] 0024 leave [ 10] Exception Get node_id (13)

A.new.foo.foo NODE_CALL (:foo) (line: 7, location: (7,0)-(7,13))* 13 NODE_CALL (:foo)
(line: 7, location: (7,0)-(7,9)) 12 NODE_CALL (:new) (line: 7, location: (7,0)-(7,5)) 11 NODE_CONST (:A) (line: 7, location: (7,0)-(7,1)) 10 • Get location of Node.

(line: 7, location: (7,0)-(7,9)) 12 NODE_CALL (:new) (line: 7, location: (7,0)-(7,5)) 11 NODE_CONST (:A) (line: 7, location: (7,0)-(7,1)) 10 A.new.foo • Get location of Node.

(line: 7, location: (7,0)-(7,9)) 12 NODE_CALL (:new) (line: 7, location: (7,0)-(7,5)) 11 NODE_CONST (:A) (line: 7, location: (7,0)-(7,1)) 10 A.new.foo .foo • Build an error message.

(line: 7, location: (7,0)-(7,9)) 12 NODE_CALL (:new) (line: 7, location: (7,0)-(7,5)) 11 NODE_CONST (:A) (line: 7, location: (7,0)-(7,1)) 10 A.new.foo .foo ^^^^ • Build an error message.

How to implement more detailed message of `NoMethodError` • Add
unique id (per file), “node_id”, to Node. • Store node_id of insn on an ISeq. • Get node_id from an exception. • Parse the source code file and find Node by node_id. • Get location of Node. • Build an error message.

Thank you!!!

RNode with code locations

RNode with code locations

More Decks by yui-knk

Other Decks in Programming

Featured

Transcript