Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dive into middleware with mruby

yui-knk
April 19, 2019

Dive into middleware with mruby

yui-knk

April 19, 2019
Tweet

More Decks by yui-knk

Other Decks in Programming

Transcript

  1. Dive into middleware with
    mruby
    Apr 19, 2019 in RubyKaigi 2019

    @yui-knk

    Yuichiro Kaneko

    View Slide

  2. Self Introduction
    • Arm Treasure Data

    • CDP team

    • Developing Rails application

    • GitHub (yui-knk)

    • A CRuby Committer 2015/12~

    • Introduced code positions to RNode

    • Introduced RubyVM::AbstractSyntaxTree

    View Slide

  3. PR: We are sponsor!

    View Slide

  4. Do you read source codes???

    View Slide

  5. Recently I read the source code
    of PostgreSQL

    View Slide

  6. Background
    • I want to know how RDBMS works to create subset of RDBMS

    • https://github.com/yui-knk/minidb

    • Use PostgreSQL as reference

    • PostgreSQL is written by C

    • https://speakerdeck.com/yui_knk/make-and-learn-rdbms

    • There are so many good documents which explains how PostgreSQL
    works

    View Slide

  7. Background
    • But I can not find the document like “Ruby Hacking Guide”

    • Should read source code of PostgreSQL

    • https://github.com/postgres/postgres

    • But code reading is sometimes hard, why?

    • RDBMS has many components

    • Today I only focus on the query engine

    View Slide

  8. BTW, do you know how
    PostgreSQL executes queries?

    View Slide

  9. Let me explain it in a slide!

    View Slide

  10. +-----------------------+
    | Query string (char *) |
    +-----------------------+
    parse (pg_parse_query)
    |
    v
    +---------------------------------+
    | Raw parse tree (struct RawStmt) |
    +---------------------------------+
    analyze/rewrite (pg_analyze_and_rewrite)
    |
    v
    +---------------------------+
    | Query Tree (struct Query) |
    +---------------------------+
    optimize/generate tree of plan node
    (pg_plan_queries)
    |
    v
    +-------------------------+
    | Plan tree (struct Plan) |
    +-------------------------+
    setup query execution (ExecInitNode)
    |
    v
    +-----------------------------------+
    | PlanState tree (struct PlanState) |
    +-----------------------------------+
    execute query (ExecutorRun)

    View Slide

  11. A lot of rewriting of trees

    View Slide

  12. Does it make sense?

    View Slide

  13. Are we ready to read codes?

    View Slide

  14. Why code reading is so hard (1)
    • A lot of Node types

    • Some nodes are like “abstract classes”
    Line. 508
    Line. 28

    View Slide

  15. • JoinExpr is a node

    • This has Node, List….
    Data structures are abstracted
    typedef struct JoinExpr
    {
    NodeTag type;
    JoinType jointype; /* type of join */
    bool isNatural; /* Natural join? Will need to shape table */
    Node *larg; /* left subtree */
    Node *rarg; /* right subtree */
    List *usingClause; /* USING clause, if any (list of String) */
    Node *quals; /* qualifiers on join, if any */
    Alias *alias; /* user-written alias clause, if any */
    int rtindex; /* RT index assigned for join, or 0 */
    } JoinExpr;

    View Slide

  16. • Has byte codes
    select * from films where did = 1;
    Why code reading is so hard (2)
    typedef struct PlanState
    {
    NodeTag type;
    ...
    /*
    * Common structural data for all Plan types. These links to subsidiary
    * state trees parallel links in the associated plan tree (except for the
    * subPlan list, which does not exist in the plan tree).
    */
    ExprState *qual; /* boolean qual condition */

    View Slide

  17. Why code reading is so hard (2)
    typedef struct ExprState
    {
    Node tag;
    ...
    /*
    * Instructions to compute expression's return value.
    */
    struct ExprEvalStep *steps;
    typedef struct ExprEvalStep
    {
    /*
    * Instruction to be executed. During instruction preparation this is an
    * enum ExprEvalOp, but later it can be changed to some other type, e.g. a
    * pointer for computed goto (that's why it's an intptr_t).
    */
    intptr_t opcode;

    View Slide

  18. Why code reading is so hard (2)
    /*
    * Discriminator for ExprEvalSteps.
    *
    */
    typedef enum ExprEvalOp
    {
    /* entire expression has been evaluated completely, return */
    EEOP_DONE,
    /* apply slot_getsomeattrs on corresponding tuple slot */
    EEOP_INNER_FETCHSOME,
    EEOP_OUTER_FETCHSOME,
    EEOP_SCAN_FETCHSOME,
    /* compute non-system Var value */
    EEOP_INNER_VAR,
    EEOP_OUTER_VAR,
    EEOP_SCAN_VAR,

    View Slide

  19. Sometimes the value is
    important

    View Slide

  20. But I can not read the value from
    source codes

    View Slide

  21. I needed tools to dump internal
    structures

    View Slide

  22. Example 1
    $ ruby --dump=p -e '1 + 2'
    # @ NODE_SCOPE (line: 1, location: (1,0)-(1,5))
    # +- nd_tbl: (empty)
    # +- nd_args:
    # | (null node)
    # +- nd_body:
    # @ NODE_OPCALL (line: 1, location: (1,0)-(1,5))*
    # +- nd_mid: :+
    # +- nd_recv:
    # | @ NODE_LIT (line: 1, location: (1,0)-(1,1))
    # | +- nd_lit: 1
    # +- nd_args:
    # @ NODE_ARRAY (line: 1, location: (1,4)-(1,5))
    # +- nd_alen: 1
    # +- nd_head:
    # | @ NODE_LIT (line: 1, location: (1,4)-(1,5))
    # | +- nd_lit: 2
    # +- nd_next:
    # (null node)

    View Slide

  23. Example 2
    $ ruby --dump=i -e '1 + 2'
    == disasm: #@-e:1 (1,0)-(1,5)> (catch: FALSE)
    0000 putobject_INT2FIX_1_ ( 1)[Li]
    0001 putobject 2
    0003 opt_plus ,
    0006 leave

    View Slide

  24. Example 3
    $ ruby --dump=y -e '1 + 2'
    Starting parse
    Entering state 0
    Reducing stack by rule 1 (line 1101):
    lex_state: EXPR_NONE -> EXPR_BEG at line 1102
    vtable_alloc:11362: 0x00007fe8ddca0dd0
    vtable_alloc:11363: 0x00007fe8ddc63f00
    cmdarg_stack(push): 0 at line 11373
    cond_stack(push): 0 at line 11374
    -> $$ = nterm @1 (1.0-1.0: )
    Stack now 0
    Entering state 2
    Reading a token:
    lex_state: EXPR_BEG -> EXPR_END at line 7711
    lex_state: EXPR_END -> EXPR_END at line 7074
    Next token is token "integer literal" (1.0-1.1: 1)

    View Slide

  25. A PROGRAMMERS’S BEST
    FRIEND

    View Slide

  26. By the way
    • We can load a shared library file into PostgreSQL

    • PostgreSQL has a lot of hooks

    • post_parse_analyze_hook_type, planner_hook_type,
    ExecutorRun_hook_type, etc…

    • In PostgreSQL we can set run-time configuration parameters by SET

    View Slide

  27. How to tackle the problem
    • Creating plugin which embeds mruby into PostgreSQL

    • https://github.com/yui-knk/psql_inspect_plugin

    View Slide

  28. How to use it
    lusiadas=# load '/path/to/psql_inspect_plugin/psql_inspect.so';
    lusiadas=# set session "psql_inspect.planner_script" = 'stmt =
    PgInspect::PlannedStmt.current_stmt; puts PgInspect._pp(stmt)';
    SET
    lusiadas=# select code from films where code = 'a';
    code
    ------
    (0 rows)
    Load a shared library
    Set ruby script
    Execute SQL

    View Slide

  29. How to use it
    • “puts” method print an argument to log file of PostgreSQL
    2019-04-17 17:21:21.120 JST [68345] STATEMENT: select code from films where
    code = 'a';
    psql_inspect_planner_hook with mruby!
    PlannedStmt: command_type: SELECT
    IndexOnlyScan:
    targetlist
    TargetEntry: resno: 1, resname: code, resjunk: false
    Var: varno: 65002, varattno: 1
    qual

    View Slide

  30. How it works
    • Define thin wrapper class on C (plugin/*.c)
    void
    psql_inspect_planned_stmt_class_init(mrb_state *mrb, struct RClass *class)
    {
    class_stmt = mrb_define_class_under(mrb, class, "PlannedStmt", psql_inspect_node_class);
    MRB_SET_INSTANCE_TT(class_stmt, MRB_TT_DATA);
    mrb_define_class_method(mrb, class_stmt, "current_stmt", psql_inspect_c_current_stmt,
    MRB_ARGS_NONE());
    mrb_define_method(mrb, class_stmt, "initialize", psql_inspect_planned_stmt_init,
    MRB_ARGS_NONE());
    mrb_define_method(mrb, class_stmt, "command_type", psql_inspect_planned_stmt_command_type,
    MRB_ARGS_NONE());
    mrb_define_method(mrb, class_stmt, "plan_tree", psql_inspect_planned_stmt_plan_tree,
    MRB_ARGS_NONE());
    mrb_define_method(mrb, class_stmt, "rtable", psql_inspect_planned_stmt_rtable,
    MRB_ARGS_NONE());
    }

    View Slide

  31. How it works
    • Define methods on mruby (mrblib/*.rb)
    class PgInspect
    class Node
    def _pretty_print(pp)
    pp.add_line "Node: #{type}"
    end
    end
    class PlannedStmt < Node
    def _pretty_print(pp)
    pp.add_line "PlannedStmt: command_type: #{command_type}"
    pp.with_indent_inc do
    plan_tree._pretty_print(pp)
    end
    end
    end

    View Slide

  32. How it works
    • Write plugin of PostgreSQL
    /*
    * Module load callback
    */
    void
    _PG_init(void)
    {
    mrb_s = mrb_open();
    psql_inspect_class_init(mrb_s);
    ...
    /* planner_hook */
    prev_planner_hook = planner_hook;
    planner_hook = psql_inspect_planner_hook;
    Entrypoint
    Creates new mrb_state
    Set callback function to the hook

    View Slide

  33. +-----------------------+
    | Query string (char *) |
    +-----------------------+
    parse (pg_parse_query)
    |
    v
    +---------------------------------+
    | Raw parse tree (struct RawStmt) |
    +---------------------------------+
    analyze/rewrite (pg_analyze_and_rewrite)
    |
    v
    +---------------------------+
    | Query Tree (struct Query) |
    +---------------------------+
    optimize/generate tree of plan node
    (pg_plan_queries)
    |
    v
    +-------------------------+
    | Plan tree (struct Plan) |
    +-------------------------+
    setup query execution (ExecInitNode)
    |
    v
    +-----------------------------------+
    | PlanState tree (struct PlanState) |
    +-----------------------------------+
    execute query (ExecutorRun)
    planner_hook

    View Slide

  34. static PlannedStmt *
    psql_inspect_planner_hook(Query *parse, int cursorOptions, ParamListInfo
    boundParams)
    {
    const char *script;
    PlannedStmt *stmt = standard_planner(parse, cursorOptions, boundParams);
    script = psql_inspect_get_script(planner_script_guc_name);
    if (script == NULL) {
    elog(LOG, "You should set \"%s\"", planner_script_guc_name);
    return stmt;
    }
    psql_inspect_planned_stmt_mruby_env_setup(mrb_s, stmt);
    psql_inspect_mrb_load_string(mrb_s, script);
    psql_inspect_planned_stmt_mruby_env_tear_down(mrb_s);
    return stmt;
    }
    Set PgInspect::PlannedStmt.current_stmt
    Call the original function
    Get ruby script
    Execute ruby script

    View Slide

  35. 2019-04-17 17:21:21.120 JST [68345] STATEMENT: select code from films where
    code = 'a';
    psql_inspect_planner_hook with mruby!
    PlannedStmt: command_type: SELECT
    IndexOnlyScan:
    targetlist
    TargetEntry: resno: 1, resname: code, resjunk: false
    Var: varno: 65002, varattno: 1
    qual
    stmt = PgInspect::PlannedStmt.current_stmt; puts PgInspect._pp(stmt)

    View Slide

  36. Write wrapper and wrapper…
    $ wc -l plugin/psql_inspect*c
    293 plugin/psql_inspect.c
    77 plugin/psql_inspect_bitmapset.c
    81 plugin/psql_inspect_dest.c
    1048 plugin/psql_inspect_nodes.c
    113 plugin/psql_inspect_parse_state.c
    480 plugin/psql_inspect_parsenodes.c
    93 plugin/psql_inspect_path.c
    246 plugin/psql_inspect_path_key.c
    322 plugin/psql_inspect_plan.c
    115 plugin/psql_inspect_planned_stmt.c
    162 plugin/psql_inspect_planner_info.c
    482 plugin/psql_inspect_primnodes.c
    320 plugin/psql_inspect_query.c
    965 plugin/psql_inspect_query_desc.c
    83 plugin/psql_inspect_rel_opt_info.c
    4880 total

    View Slide

  37. Write callback functions …
    +-----------------------+
    | Query string (char *) |
    +-----------------------+
    parse (pg_parse_query)
    |
    v
    +---------------------------------+
    | Raw parse tree (struct RawStmt) |
    +---------------------------------+
    analyze/rewrite (pg_analyze_and_rewrite)
    |
    v
    +---------------------------+
    | Query Tree (struct Query) |
    +---------------------------+
    optimize/generate tree of plan node
    (pg_plan_queries)
    |
    v
    +-------------------------+
    | Plan tree (struct Plan) |
    +-------------------------+
    setup query execution (ExecInitNode)
    |
    v
    +-----------------------------------+
    | PlanState tree (struct PlanState) |
    +-----------------------------------+
    execute query (ExecutorRun)
    planner_hook
    ExecutorRun_hook
    post_parse_analyze_hook

    View Slide

  38. Why this approach is useful
    • Can easily try what you want

    • mruby dynamically evaluate scripts

    • Can use many functions/classes of mruby

    • Can implement the plugin step by step

    • Write codes, read codes, write codes, read codes…

    • Start from implementing just single wrapper class

    View Slide

  39. Dive into PostgreSQL!!!

    View Slide

  40. Table example
    =# \d films
    Table "public.films"
    Column | Type | Collation | Nullable | Default
    -----------+-------------------------+-----------+----------+---------
    code | character(5) | | not null |
    title | character varying(40) | | not null |
    did | integer | | not null |
    date_prod | date | | |
    kind | character varying(10) | | |
    len | interval hour to minute | | |
    Indexes:
    "firstkey" PRIMARY KEY, btree (code)
    =# select * from films;
    code | title | did | date_prod | kind | len
    -------+-------+-----+-----------+------+-----
    c1 | T1 | 101 | | |
    c2 | T2 | 102 | | |
    c3 | T3 | 103 | | |
    (3 rows)

    View Slide

  41. +-----------------------+
    | Query string (char *) |
    +-----------------------+
    parse (pg_parse_query)
    |
    v
    +---------------------------------+
    | Raw parse tree (struct RawStmt) |
    +---------------------------------+
    analyze/rewrite (pg_analyze_and_rewrite)
    |
    v
    +---------------------------+
    | Query Tree (struct Query) |
    +---------------------------+
    optimize/generate tree of plan node
    (pg_plan_queries)
    |
    v
    +-------------------------+
    | Plan tree (struct Plan) |
    +-------------------------+
    setup query execution (ExecInitNode)
    |
    v
    +-----------------------------------+
    | PlanState tree (struct PlanState) |
    +-----------------------------------+
    execute query (ExecutorRun)

    View Slide

  42. Parse
    • Generate AST from input text

    • Each node represents user input

    View Slide

  43. Parse
    select * from films where did = 1;
    ParseState:
    parsetree_list
    RawStmt:
    stmt
    SelectStmt:
    targetList
    ResTarget: name: , indirection:
    val ColumnRef: fields: *
    fromClause
    RangeVar:
    catalogname: , schemaname: , relname: films
    whereClause
    A_Expr: kind: OP, name: =
    lexpr
    ColumnRef: fields: did
    rexpr
    A_Const: val: 1

    View Slide

  44. Analyze/Rewrite
    • Generate Query Tree

    • Replace user input attributes to internal representations

    • Semantic analysis

    View Slide

  45. Analyze/Rewrite
    Query: SELECT(id: 0)
    rtable (# 1)
    RangeTblEntry: rtekind: RELATION, relid: 16389
    jointree
    FromExpr:
    quals
    OpExpr: opno: 96, opfuncid: 65
    Var: varno: 1, varattno: 3
    Const: consttype: 23, constlen: 4,
    constvalue: 1
    target_list (# 6)
    TargetEntry: resno: 1, resname: code
    Var: varno: 1, varattno: 1
    TargetEntry: resno: 2, resname: title
    Var: varno: 1, varattno: 2
    TargetEntry: resno: 3, resname: did
    Var: varno: 1, varattno: 3

    ParseState:
    parsetree_list
    RawStmt:
    stmt
    SelectStmt:
    targetList
    ResTarget: name: , indirection:
    val ColumnRef: fields: *
    fromClause
    RangeVar:
    relname: films
    whereClause
    A_Expr: kind: OP, name: =
    lexpr
    ColumnRef: fields: did
    rexpr
    A_Const: val: 1
    Expanded
    select * from films where did = 1;

    View Slide

  46. Analyze/Rewrite
    Query: SELECT(id: 0)
    rtable (# 1)
    RangeTblEntry: rtekind: RELATION, relid: 16389
    jointree
    FromExpr:
    quals
    OpExpr: opno: 96, opfuncid: 65
    Var: varno: 1, varattno: 3
    Const: consttype: 23, constlen: 4,
    constvalue: 1
    target_list (# 6)
    TargetEntry: resno: 1, resname: code
    Var: varno: 1, varattno: 1
    TargetEntry: resno: 2, resname: title
    Var: varno: 1, varattno: 2
    TargetEntry: resno: 3, resname: did
    Var: varno: 1, varattno: 3

    ParseState:
    parsetree_list
    RawStmt:
    stmt
    SelectStmt:
    targetList
    ResTarget: name: , indirection:
    val ColumnRef: fields: *
    fromClause
    RangeVar:
    relname: films
    whereClause
    A_Expr: kind: OP, name: =
    lexpr
    ColumnRef: fields: did
    rexpr
    A_Const: val: 1
    Internal representations
    where did = 1
    select * from films where did = 1;

    View Slide

  47. Analyze/Rewrite
    Query: SELECT(id: 0)
    rtable (# 1)
    RangeTblEntry: rtekind: RELATION, relid: 16389
    jointree
    FromExpr:
    quals
    OpExpr: opno: 96, opfuncid: 65
    Var: varno: 1, varattno: 3
    Const: consttype: 23, constlen: 4,
    constvalue: 1
    target_list (# 6)
    TargetEntry: resno: 1, resname: code
    Var: varno: 1, varattno: 1
    TargetEntry: resno: 2, resname: title
    Var: varno: 1, varattno: 2
    TargetEntry: resno: 3, resname: did
    Var: varno: 1, varattno: 3

    # select oid, relname
    from pg_class
    where oid = 16389
    order by oid asc;
    oid | relname
    -------+---------
    16389 | films
    (1 row)
    films

    View Slide

  48. Analyze/Rewrite
    Query: SELECT(id: 0)
    rtable (# 1)
    RangeTblEntry: rtekind: RELATION, relid: 16389
    jointree
    FromExpr:
    quals
    OpExpr: opno: 96, opfuncid: 65
    Var: varno: 1, varattno: 3
    Const: consttype: 23, constlen: 4,
    constvalue: 1
    target_list (# 6)
    TargetEntry: resno: 1, resname: code
    Var: varno: 1, varattno: 1
    TargetEntry: resno: 2, resname: title
    Var: varno: 1, varattno: 2
    TargetEntry: resno: 3, resname: did
    Var: varno: 1, varattno: 3

    where did = 1
    /* catalog/pg_operator.dat */
    { oid => ’96',
    oid_symbol => ‘Int4EqualOperator'
    oprname => ‘=', oprleft => ‘int4',
    oprright => ‘int4',
    oprresult => 'bool' },
    /* catalog/pg_type.dat */
    { oid => ’23',
    descr => '-2 billion to 2 billion
    integer, 4-byte storage',
    typname => ‘int4',
    typlen => '4' },
    select * from films where did = 1;

    View Slide

  49. Optimize/Generate tree of plan node
    • We would have many way to calculate result tuples

    • How to get tuples from disk

    • Read files in order (SeqScan) / Use index (IndexScan) / Use only index
    (IndexOnlyScan)

    • How to aggregate tuples

    • Use internal hash table / sort tuples before aggregation

    • How to join tables

    • Nested Loops Join / (Sort-)Merge Join / Hash Join

    View Slide

  50. Optimize/Generate tree of plan node
    Query: SELECT(id: 0)
    rtable (# 1)
    RangeTblEntry: relid: 16389
    jointree
    FromExpr:
    quals
    OpExpr: opno: 96, opfuncid: 65
    Var: varno: 1, varattno: 3
    Const: consttype: 23,
    constlen: 4,
    constvalue: 1
    target_list (# 6)
    TargetEntry: resno: 1, resname: code
    Var: varno: 1, varattno: 1
    TargetEntry: resno: 2, resname: title
    Var: varno: 1, varattno: 2
    TargetEntry: resno: 3, resname: did
    Var: varno: 1, varattno: 3

    PlannedStmt: command_type: SELECT
    SeqScan:
    targetlist
    TargetEntry: resno: 1, resname: code
    Var: varno: 1, varattno: 1
    TargetEntry: resno: 2, resname: title
    Var: varno: 1, varattno: 2
    TargetEntry: resno: 3, resname: did
    Var: varno: 1, varattno: 3
    ...
    qual
    OpExpr: opno: 96, opfuncid: 65
    Var: varno: 1, varattno: 3
    Const: consttype: 23,
    constlen: 4,
    constvalue: 1

    View Slide

  51. select * from films where did = 1;
    QueryDesc (SELECT):
    planstate
    SeqScanState:
    qual
    ExprSate: SCAN_FETCHSOME, SCAN_VAR, FUNCEXPR_STRICT, QUAL, DONE
    ss_ScanTupleSlot
    TupleTableSlot: tts_isempty: true, tts_fixedTupleDescriptor: true
    tts_tupleDescriptor
    TupleDesc: natts: 6
    attrs
    PgAttribute: attname: code
    ...
    Byte codes

    View Slide

  52. Write codes to read codes!!!

    View Slide

  53. Thank you !!!

    View Slide