Dive into middleware with mruby

B3ba3ccedfbf4d605f00bafd1a732529?s=47 yui-knk
April 19, 2019

Dive into middleware with mruby

B3ba3ccedfbf4d605f00bafd1a732529?s=128

yui-knk

April 19, 2019
Tweet

Transcript

  1. 2.

    Self Introduction • Arm Treasure Data • CDP team •

    Developing Rails application • GitHub (yui-knk) • A CRuby Committer 2015/12~ • Introduced code positions to RNode • Introduced RubyVM::AbstractSyntaxTree
  2. 6.

    Background • I want to know how RDBMS works to

    create subset of RDBMS • https://github.com/yui-knk/minidb • Use PostgreSQL as reference • PostgreSQL is written by C • https://speakerdeck.com/yui_knk/make-and-learn-rdbms • There are so many good documents which explains how PostgreSQL works
  3. 7.

    Background • But I can not find the document like

    “Ruby Hacking Guide” • Should read source code of PostgreSQL • https://github.com/postgres/postgres • But code reading is sometimes hard, why? • RDBMS has many components • Today I only focus on the query engine
  4. 10.

    +-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query)

    | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun)
  5. 14.

    Why code reading is so hard (1) • A lot

    of Node types • Some nodes are like “abstract classes” Line. 508 Line. 28
  6. 15.

    • JoinExpr is a node • This has Node, List….

    Data structures are abstracted typedef struct JoinExpr { NodeTag type; JoinType jointype; /* type of join */ bool isNatural; /* Natural join? Will need to shape table */ Node *larg; /* left subtree */ Node *rarg; /* right subtree */ List *usingClause; /* USING clause, if any (list of String) */ Node *quals; /* qualifiers on join, if any */ Alias *alias; /* user-written alias clause, if any */ int rtindex; /* RT index assigned for join, or 0 */ } JoinExpr;
  7. 16.

    • Has byte codes select * from films where did

    = 1; Why code reading is so hard (2) typedef struct PlanState { NodeTag type; ... /* * Common structural data for all Plan types. These links to subsidiary * state trees parallel links in the associated plan tree (except for the * subPlan list, which does not exist in the plan tree). */ ExprState *qual; /* boolean qual condition */
  8. 17.

    Why code reading is so hard (2) typedef struct ExprState

    { Node tag; ... /* * Instructions to compute expression's return value. */ struct ExprEvalStep *steps; typedef struct ExprEvalStep { /* * Instruction to be executed. During instruction preparation this is an * enum ExprEvalOp, but later it can be changed to some other type, e.g. a * pointer for computed goto (that's why it's an intptr_t). */ intptr_t opcode;
  9. 18.

    Why code reading is so hard (2) /* * Discriminator

    for ExprEvalSteps. * */ typedef enum ExprEvalOp { /* entire expression has been evaluated completely, return */ EEOP_DONE, /* apply slot_getsomeattrs on corresponding tuple slot */ EEOP_INNER_FETCHSOME, EEOP_OUTER_FETCHSOME, EEOP_SCAN_FETCHSOME, /* compute non-system Var value */ EEOP_INNER_VAR, EEOP_OUTER_VAR, EEOP_SCAN_VAR,
  10. 22.

    Example 1 $ ruby --dump=p -e '1 + 2' #

    @ NODE_SCOPE (line: 1, location: (1,0)-(1,5)) # +- nd_tbl: (empty) # +- nd_args: # | (null node) # +- nd_body: # @ NODE_OPCALL (line: 1, location: (1,0)-(1,5))* # +- nd_mid: :+ # +- nd_recv: # | @ NODE_LIT (line: 1, location: (1,0)-(1,1)) # | +- nd_lit: 1 # +- nd_args: # @ NODE_ARRAY (line: 1, location: (1,4)-(1,5)) # +- nd_alen: 1 # +- nd_head: # | @ NODE_LIT (line: 1, location: (1,4)-(1,5)) # | +- nd_lit: 2 # +- nd_next: # (null node)
  11. 23.

    Example 2 $ ruby --dump=i -e '1 + 2' ==

    disasm: #<ISeq:<main>@-e:1 (1,0)-(1,5)> (catch: FALSE) 0000 putobject_INT2FIX_1_ ( 1)[Li] 0001 putobject 2 0003 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache> 0006 leave
  12. 24.

    Example 3 $ ruby --dump=y -e '1 + 2' Starting

    parse Entering state 0 Reducing stack by rule 1 (line 1101): lex_state: EXPR_NONE -> EXPR_BEG at line 1102 vtable_alloc:11362: 0x00007fe8ddca0dd0 vtable_alloc:11363: 0x00007fe8ddc63f00 cmdarg_stack(push): 0 at line 11373 cond_stack(push): 0 at line 11374 -> $$ = nterm @1 (1.0-1.0: ) Stack now 0 Entering state 2 Reading a token: lex_state: EXPR_BEG -> EXPR_END at line 7711 lex_state: EXPR_END -> EXPR_END at line 7074 Next token is token "integer literal" (1.0-1.1: 1)
  13. 26.

    By the way • We can load a shared library

    file into PostgreSQL • PostgreSQL has a lot of hooks • post_parse_analyze_hook_type, planner_hook_type, ExecutorRun_hook_type, etc… • In PostgreSQL we can set run-time configuration parameters by SET
  14. 27.

    How to tackle the problem • Creating plugin which embeds

    mruby into PostgreSQL • https://github.com/yui-knk/psql_inspect_plugin
  15. 28.

    How to use it lusiadas=# load '/path/to/psql_inspect_plugin/psql_inspect.so'; lusiadas=# set session

    "psql_inspect.planner_script" = 'stmt = PgInspect::PlannedStmt.current_stmt; puts PgInspect._pp(stmt)'; SET lusiadas=# select code from films where code = 'a'; code ------ (0 rows) Load a shared library Set ruby script Execute SQL
  16. 29.

    How to use it • “puts” method print an argument

    to log file of PostgreSQL 2019-04-17 17:21:21.120 JST [68345] STATEMENT: select code from films where code = 'a'; psql_inspect_planner_hook with mruby! PlannedStmt: command_type: SELECT IndexOnlyScan: targetlist TargetEntry: resno: 1, resname: code, resjunk: false Var: varno: 65002, varattno: 1 qual
  17. 30.

    How it works • Define thin wrapper class on C

    (plugin/*.c) void psql_inspect_planned_stmt_class_init(mrb_state *mrb, struct RClass *class) { class_stmt = mrb_define_class_under(mrb, class, "PlannedStmt", psql_inspect_node_class); MRB_SET_INSTANCE_TT(class_stmt, MRB_TT_DATA); mrb_define_class_method(mrb, class_stmt, "current_stmt", psql_inspect_c_current_stmt, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "initialize", psql_inspect_planned_stmt_init, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "command_type", psql_inspect_planned_stmt_command_type, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "plan_tree", psql_inspect_planned_stmt_plan_tree, MRB_ARGS_NONE()); mrb_define_method(mrb, class_stmt, "rtable", psql_inspect_planned_stmt_rtable, MRB_ARGS_NONE()); }
  18. 31.

    How it works • Define methods on mruby (mrblib/*.rb) class

    PgInspect class Node def _pretty_print(pp) pp.add_line "Node: #{type}" end end class PlannedStmt < Node def _pretty_print(pp) pp.add_line "PlannedStmt: command_type: #{command_type}" pp.with_indent_inc do plan_tree._pretty_print(pp) end end end
  19. 32.

    How it works • Write plugin of PostgreSQL /* *

    Module load callback */ void _PG_init(void) { mrb_s = mrb_open(); psql_inspect_class_init(mrb_s); ... /* planner_hook */ prev_planner_hook = planner_hook; planner_hook = psql_inspect_planner_hook; Entrypoint Creates new mrb_state Set callback function to the hook
  20. 33.

    +-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query)

    | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun) planner_hook
  21. 34.

    static PlannedStmt * psql_inspect_planner_hook(Query *parse, int cursorOptions, ParamListInfo boundParams) {

    const char *script; PlannedStmt *stmt = standard_planner(parse, cursorOptions, boundParams); script = psql_inspect_get_script(planner_script_guc_name); if (script == NULL) { elog(LOG, "You should set \"%s\"", planner_script_guc_name); return stmt; } psql_inspect_planned_stmt_mruby_env_setup(mrb_s, stmt); psql_inspect_mrb_load_string(mrb_s, script); psql_inspect_planned_stmt_mruby_env_tear_down(mrb_s); return stmt; } Set PgInspect::PlannedStmt.current_stmt Call the original function Get ruby script Execute ruby script
  22. 35.

    2019-04-17 17:21:21.120 JST [68345] STATEMENT: select code from films where

    code = 'a'; psql_inspect_planner_hook with mruby! PlannedStmt: command_type: SELECT IndexOnlyScan: targetlist TargetEntry: resno: 1, resname: code, resjunk: false Var: varno: 65002, varattno: 1 qual stmt = PgInspect::PlannedStmt.current_stmt; puts PgInspect._pp(stmt)
  23. 36.

    Write wrapper and wrapper… $ wc -l plugin/psql_inspect*c 293 plugin/psql_inspect.c

    77 plugin/psql_inspect_bitmapset.c 81 plugin/psql_inspect_dest.c 1048 plugin/psql_inspect_nodes.c 113 plugin/psql_inspect_parse_state.c 480 plugin/psql_inspect_parsenodes.c 93 plugin/psql_inspect_path.c 246 plugin/psql_inspect_path_key.c 322 plugin/psql_inspect_plan.c 115 plugin/psql_inspect_planned_stmt.c 162 plugin/psql_inspect_planner_info.c 482 plugin/psql_inspect_primnodes.c 320 plugin/psql_inspect_query.c 965 plugin/psql_inspect_query_desc.c 83 plugin/psql_inspect_rel_opt_info.c 4880 total
  24. 37.

    Write callback functions … +-----------------------+ | Query string (char *)

    | +-----------------------+ parse (pg_parse_query) | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun) planner_hook ExecutorRun_hook post_parse_analyze_hook
  25. 38.

    Why this approach is useful • Can easily try what

    you want • mruby dynamically evaluate scripts • Can use many functions/classes of mruby • Can implement the plugin step by step • Write codes, read codes, write codes, read codes… • Start from implementing just single wrapper class
  26. 40.

    Table example =# \d films Table "public.films" Column | Type

    | Collation | Nullable | Default -----------+-------------------------+-----------+----------+--------- code | character(5) | | not null | title | character varying(40) | | not null | did | integer | | not null | date_prod | date | | | kind | character varying(10) | | | len | interval hour to minute | | | Indexes: "firstkey" PRIMARY KEY, btree (code) =# select * from films; code | title | did | date_prod | kind | len -------+-------+-----+-----------+------+----- c1 | T1 | 101 | | | c2 | T2 | 102 | | | c3 | T3 | 103 | | | (3 rows)
  27. 41.

    +-----------------------+ | Query string (char *) | +-----------------------+ parse (pg_parse_query)

    | v +---------------------------------+ | Raw parse tree (struct RawStmt) | +---------------------------------+ analyze/rewrite (pg_analyze_and_rewrite) | v +---------------------------+ | Query Tree (struct Query) | +---------------------------+ optimize/generate tree of plan node (pg_plan_queries) | v +-------------------------+ | Plan tree (struct Plan) | +-------------------------+ setup query execution (ExecInitNode) | v +-----------------------------------+ | PlanState tree (struct PlanState) | +-----------------------------------+ execute query (ExecutorRun)
  28. 43.

    Parse select * from films where did = 1; ParseState:

    parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: catalogname: , schemaname: , relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1
  29. 44.

    Analyze/Rewrite • Generate Query Tree • Replace user input attributes

    to internal representations • Semantic analysis
  30. 45.

    Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION,

    relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1 Expanded select * from films where did = 1;
  31. 46.

    Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION,

    relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … ParseState: parsetree_list RawStmt: stmt SelectStmt: targetList ResTarget: name: , indirection: val ColumnRef: fields: * fromClause RangeVar: relname: films whereClause A_Expr: kind: OP, name: = lexpr ColumnRef: fields: did rexpr A_Const: val: 1 Internal representations where did = 1 select * from films where did = 1;
  32. 47.

    Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION,

    relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … # select oid, relname from pg_class where oid = 16389 order by oid asc; oid | relname -------+--------- 16389 | films (1 row) films
  33. 48.

    Analyze/Rewrite Query: SELECT(id: 0) rtable (# 1) RangeTblEntry: rtekind: RELATION,

    relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … where did = 1 /* catalog/pg_operator.dat */ { oid => ’96', oid_symbol => ‘Int4EqualOperator' oprname => ‘=', oprleft => ‘int4', oprright => ‘int4', oprresult => 'bool' }, /* catalog/pg_type.dat */ { oid => ’23', descr => '-2 billion to 2 billion integer, 4-byte storage', typname => ‘int4', typlen => '4' }, select * from films where did = 1;
  34. 49.

    Optimize/Generate tree of plan node • We would have many

    way to calculate result tuples • How to get tuples from disk • Read files in order (SeqScan) / Use index (IndexScan) / Use only index (IndexOnlyScan) • How to aggregate tuples • Use internal hash table / sort tuples before aggregation • How to join tables • Nested Loops Join / (Sort-)Merge Join / Hash Join
  35. 50.

    Optimize/Generate tree of plan node Query: SELECT(id: 0) rtable (#

    1) RangeTblEntry: relid: 16389 jointree FromExpr: quals OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1 target_list (# 6) TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 … PlannedStmt: command_type: SELECT SeqScan: targetlist TargetEntry: resno: 1, resname: code Var: varno: 1, varattno: 1 TargetEntry: resno: 2, resname: title Var: varno: 1, varattno: 2 TargetEntry: resno: 3, resname: did Var: varno: 1, varattno: 3 ... qual OpExpr: opno: 96, opfuncid: 65 Var: varno: 1, varattno: 3 Const: consttype: 23, constlen: 4, constvalue: 1
  36. 51.

    select * from films where did = 1; QueryDesc (SELECT):

    planstate SeqScanState: qual ExprSate: SCAN_FETCHSOME, SCAN_VAR, FUNCEXPR_STRICT, QUAL, DONE ss_ScanTupleSlot TupleTableSlot: tts_isempty: true, tts_fixedTupleDescriptor: true tts_tupleDescriptor TupleDesc: natts: 6 attrs PgAttribute: attname: code ... Byte codes