Slide 1

Slide 1 text

Dive into middleware with mruby Apr 19, 2019 in RubyKaigi 2019 @yui-knk Yuichiro Kaneko

Slide 2

Slide 2 text

Self Introduction • Arm Treasure Data • CDP team • Developing Rails application • GitHub (yui-knk) • A CRuby Committer 2015/12~ • Introduced code positions to RNode • Introduced RubyVM::AbstractSyntaxTree

Slide 3

Slide 3 text

PR: We are sponsor!

Slide 4

Slide 4 text

Do you read source codes???

Slide 5

Slide 5 text

Recently I read the source code of PostgreSQL

Slide 6

Slide 6 text

Background • I want to know how RDBMS works to create subset of RDBMS • https://github.com/yui-knk/minidb • Use PostgreSQL as reference • PostgreSQL is written by C • https://speakerdeck.com/yui_knk/make-and-learn-rdbms • There are so many good documents which explains how PostgreSQL works

Slide 7

Slide 7 text

Background • But I can not find the document like “Ruby Hacking Guide” • Should read source code of PostgreSQL • https://github.com/postgres/postgres • But code reading is sometimes hard, why? • RDBMS has many components • Today I only focus on the query engine

Slide 8

Slide 8 text

BTW, do you know how PostgreSQL executes queries?

Slide 9

Slide 9 text

Let me explain it in a slide!

Slide 10

Slide 10 text

+-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun)

Slide 11

Slide 11 text

A lot of rewriting of trees

Slide 12

Slide 12 text

Does it make sense?

Slide 13

Slide 13 text

Are we ready to read codes?

Slide 14

Slide 14 text

Why code reading is so hard (1) • A lot of Node types • Some nodes are like “abstract classes” Line. 508 Line. 28

Slide 15

Slide 15 text

• JoinExpr is a node • This has Node, List…. Data structures are abstracted typedef struct JoinExpr { NodeTag type; JoinType jointype; /* type of join */ bool isNatural; /* Natural join? Will need to shape table */ Node *larg; /* left subtree */ Node *rarg; /* right subtree */ List *usingClause; /* USING clause, if any (list of String) */ Node *quals; /* qualifiers on join, if any */ Alias *alias; /* user-written alias clause, if any */ int rtindex; /* RT index assigned for join, or 0 */ } JoinExpr;

Slide 16

Slide 16 text

• Has byte codes select * from films where did = 1; Why code reading is so hard (2) typedef struct PlanState { NodeTag type; ... /* * Common structural data for all Plan types. These links to subsidiary * state trees parallel links in the associated plan tree (except for the * subPlan list, which does not exist in the plan tree). */ ExprState *qual; /* boolean qual condition */

Slide 17

Slide 17 text

Why code reading is so hard (2) typedef struct ExprState { Node tag; ... /* * Instructions to compute expression's return value. */ struct ExprEvalStep *steps; typedef struct ExprEvalStep { /* * Instruction to be executed. During instruction preparation this is an * enum ExprEvalOp, but later it can be changed to some other type, e.g. a * pointer for computed goto (that's why it's an intptr_t). */ intptr_t opcode;

Slide 18

Slide 18 text

Why code reading is so hard (2) /* * Discriminator for ExprEvalSteps. * */ typedef enum ExprEvalOp { /* entire expression has been evaluated completely, return */ EEOP_DONE, /* apply slot_getsomeattrs on corresponding tuple slot */ EEOP_INNER_FETCHSOME, EEOP_OUTER_FETCHSOME, EEOP_SCAN_FETCHSOME, /* compute non-system Var value */ EEOP_INNER_VAR, EEOP_OUTER_VAR, EEOP_SCAN_VAR,

Slide 19

Slide 19 text

Sometimes the value is important

Slide 20

Slide 20 text

But I can not read the value from source codes

Slide 21

Slide 21 text

I needed tools to dump internal structures

Slide 22

Slide 22 text

Example 1 $ ruby --dump=p -e '1 + 2' # @ NODE_SCOPE (line: 1, location: (1,0)-(1,5)) # +- nd_tbl: (empty) # +- nd_args: # | (null node) # +- nd_body: # @ NODE_OPCALL (line: 1, location: (1,0)-(1,5))* # +- nd_mid: :+ # +- nd_recv: # | @ NODE_LIT (line: 1, location: (1,0)-(1,1)) # | +- nd_lit: 1 # +- nd_args: # @ NODE_ARRAY (line: 1, location: (1,4)-(1,5)) # +- nd_alen: 1 # +- nd_head: # | @ NODE_LIT (line: 1, location: (1,4)-(1,5)) # | +- nd_lit: 2 # +- nd_next: # (null node)

Slide 23

Slide 23 text

Example 2 $ ruby --dump=i -e '1 + 2' == disasm: #@-e:1 (1,0)-(1,5)> (catch: FALSE) 0000 putobject_INT2FIX_1_ ( 1)[Li] 0001 putobject 2 0003 opt_plus , 0006 leave

Slide 24

Slide 24 text

Example 3 $ ruby --dump=y -e '1 + 2' Starting parse Entering state 0 Reducing stack by rule 1 (line 1101): lex_state: EXPR_NONE -> EXPR_BEG at line 1102 vtable_alloc:11362: 0x00007fe8ddca0dd0 vtable_alloc:11363: 0x00007fe8ddc63f00 cmdarg_stack(push): 0 at line 11373 cond_stack(push): 0 at line 11374 -> $$ = nterm @1 (1.0-1.0: ) Stack now 0 Entering state 2 Reading a token: lex_state: EXPR_BEG -> EXPR_END at line 7711 lex_state: EXPR_END -> EXPR_END at line 7074 Next token is token "integer literal" (1.0-1.1: 1)

Slide 25

Slide 25 text

A PROGRAMMERS’S BEST FRIEND

Slide 26

Slide 26 text

By the way • We can load a shared library file into PostgreSQL • PostgreSQL has a lot of hooks • post_parse_analyze_hook_type, planner_hook_type, ExecutorRun_hook_type, etc… • In PostgreSQL we can set run-time configuration parameters by SET

Slide 27

Slide 27 text

How to tackle the problem • Creating plugin which embeds mruby into PostgreSQL • https://github.com/yui-knk/psql_inspect_plugin

Slide 28

Slide 28 text

How to use it lusiadas=# load '/path/to/psql_inspect_plugin/psql_inspect.so'; lusiadas=# set session "psql_inspect.planner_script" = 'stmt = PgInspect::PlannedStmt.current_stmt; puts PgInspect._pp(stmt)'; SET lusiadas=# select code from films where code = 'a'; code ------ (0 rows) Load a shared library Set ruby script Execute SQL

Slide 29

Slide 29 text

How to use it • “puts” method print an argument to log file of PostgreSQL 2019-04-17 17:21:21.120 JST [68345] STATEMENT: select code from films where code = 'a'; psql_inspect_planner_hook with mruby! PlannedStmt: command_type: SELECT IndexOnlyScan: targetlist TargetEntry: resno: 1, resname: code, resjunk: false Var: varno: 65002, varattno: 1 qual

Slide 30

Slide 30 text

How it works • Define thin wrapper class on C (plugin/*.c) void psql_inspect_planned_stmt_class_init(mrb_state *mrb, struct RClass *class) { class_stmt = mrb_define_class_under(mrb, class, "PlannedStmt", psql_inspect_node_class); MRB_SET_INSTANCE_TT(class_stmt, MRB_TT_DATA); mrb_define_class_method(mrb, class_stmt, "current_stmt", psql_inspect_c_current_stmt, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "initialize", psql_inspect_planned_stmt_init, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "command_type", psql_inspect_planned_stmt_command_type, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "plan_tree", psql_inspect_planned_stmt_plan_tree, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "rtable", psql_inspect_planned_stmt_rtable, MRB_ARGS_NONE()); }

Slide 31

Slide 31 text

How it works • Define methods on mruby (mrblib/*.rb) class PgInspect class Node def _pretty_print(pp) pp.add_line "Node: #{type}" end end class PlannedStmt < Node def _pretty_print(pp) pp.add_line "PlannedStmt: command_type: #{command_type}" pp.with_indent_inc do plan_tree._pretty_print(pp) end end end

Slide 32

Slide 32 text

How it works • Write plugin of PostgreSQL /* * Module load callback */ void _PG_init(void) { mrb_s = mrb_open(); psql_inspect_class_init(mrb_s); ... /* planner_hook */ prev_planner_hook = planner_hook; planner_hook = psql_inspect_planner_hook; Entrypoint Creates new mrb_state Set callback function to the hook

Slide 33

Slide 33 text

+-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun) planner_hook

Slide 34

Slide 34 text

static PlannedStmt * psql_inspect_planner_hook(Query *parse, int cursorOptions, ParamListInfo boundParams) { const char *script; PlannedStmt *stmt = standard_planner(parse, cursorOptions, boundParams); script = psql_inspect_get_script(planner_script_guc_name); if (script == NULL) { elog(LOG, "You should set \"%s\"", planner_script_guc_name); return stmt; } psql_inspect_planned_stmt_mruby_env_setup(mrb_s, stmt); psql_inspect_mrb_load_string(mrb_s, script); psql_inspect_planned_stmt_mruby_env_tear_down(mrb_s); return stmt; } Set PgInspect::PlannedStmt.current_stmt Call the original function Get ruby script Execute ruby script

Slide 35

Slide 35 text

2019-04-17 17:21:21.120 JST [68345] STATEMENT: select code from films where code = 'a'; psql_inspect_planner_hook with mruby! PlannedStmt: command_type: SELECT IndexOnlyScan: targetlist TargetEntry: resno: 1, resname: code, resjunk: false Var: varno: 65002, varattno: 1 qual stmt = PgInspect::PlannedStmt.current_stmt; puts PgInspect._pp(stmt)

Slide 36

Slide 36 text

Write wrapper and wrapper… $ wc -l plugin/psql_inspect*c 293 plugin/psql_inspect.c 77 plugin/psql_inspect_bitmapset.c 81 plugin/psql_inspect_dest.c 1048 plugin/psql_inspect_nodes.c 113 plugin/psql_inspect_parse_state.c 480 plugin/psql_inspect_parsenodes.c 93 plugin/psql_inspect_path.c 246 plugin/psql_inspect_path_key.c 322 plugin/psql_inspect_plan.c 115 plugin/psql_inspect_planned_stmt.c 162 plugin/psql_inspect_planner_info.c 482 plugin/psql_inspect_primnodes.c 320 plugin/psql_inspect_query.c 965 plugin/psql_inspect_query_desc.c 83 plugin/psql_inspect_rel_opt_info.c 4880 total

Slide 37

Slide 37 text

Write callback functions … +-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun) planner_hook ExecutorRun_hook post_parse_analyze_hook

Slide 38

Slide 38 text

Why this approach is useful • Can easily try what you want • mruby dynamically evaluate scripts • Can use many functions/classes of mruby • Can implement the plugin step by step • Write codes, read codes, write codes, read codes… • Start from implementing just single wrapper class

Slide 39

Slide 39 text

Dive into PostgreSQL!!!

Slide 40

Slide 40 text

Table example =# \d films Table "public.films" Column | Type | Collation | Nullable | Default -----------+-------------------------+-----------+----------+--------- code | character(5) | | not null | title | character varying(40) | | not null | did | integer | | not null | date_prod | date | | | kind | character varying(10) | | | len | interval hour to minute | | | Indexes: "firstkey" PRIMARY KEY, btree (code) =# select * from films; code | title | did | date_prod | kind | len -------+-------+-----+-----------+------+----- c1 | T1 | 101 | | | c2 | T2 | 102 | | | c3 | T3 | 103 | | | (3 rows)

Slide 41

Slide 41 text

+-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun)

Slide 42

Slide 42 text

Parse • Generate AST from input text • Each node represents user input

Slide 43

Slide 43 text

Parse select * from films where did = 1; ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: catalogname: , schemaname: , relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1

Slide 44

Slide 44 text

Analyze/Rewrite • Generate Query Tree • Replace user input attributes to internal representations • Semantic analysis

Slide 45

Slide 45 text

Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1 Expanded select * from films where did = 1;

Slide 46

Slide 46 text

Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1 Internal representations where did = 1 select * from films where did = 1;

Slide 47

Slide 47 text

Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … # select oid, relname from pg_class where oid = 16389 order by oid asc; oid | relname -------+--------- 16389 | films (1 row) films

Slide 48

Slide 48 text

Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION, relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … where did = 1 /* catalog/pg_operator.dat */ { oid => ’96', oid_symbol => ‘Int4EqualOperator' oprname => ‘=', oprleft => ‘int4', oprright => ‘int4', oprresult => 'bool' }, /* catalog/pg_type.dat */ { oid => ’23', descr => '-2 billion to 2 billion integer, 4-byte storage', typname => ‘int4', typlen => '4' }, select * from films where did = 1;

Slide 49

Slide 49 text

Optimize/Generate tree of plan node • We would have many way to calculate result tuples • How to get tuples from disk • Read files in order (SeqScan) / Use index (IndexScan) / Use only index (IndexOnlyScan) • How to aggregate tuples • Use internal hash table / sort tuples before aggregation • How to join tables • Nested Loops Join / (Sort-)Merge Join / Hash Join

Slide 50

Slide 50 text

Optimize/Generate tree of plan node Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … PlannedStmt: command_type: SELECT SeqScan: targetlist TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 ... qual OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1

Slide 51

Slide 51 text

select * from films where did = 1; QueryDesc (SELECT): planstate SeqScanState: qual ExprSate: SCAN_FETCHSOME, SCAN_VAR, FUNCEXPR_STRICT, QUAL, DONE ss_ScanTupleSlot TupleTableSlot: tts_isempty: true, tts_fixedTupleDescriptor: true tts_tupleDescriptor TupleDesc: natts: 6 attrs PgAttribute: attname: code ... Byte codes

Slide 52

Slide 52 text

Write codes to read codes!!!

Slide 53

Slide 53 text

Thank you !!!