Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Virtual Machines

Virtual Machines

Language virtual machines were designed and implemented with different concepts.

In this talk, we will peek into the implementation of some language VMs (especially CRuby and ZendEngine) to understand the complex internal parts.

#rubyconftw #ruby #cruby

Yo-An Lin

July 28, 2019
Tweet

More Decks by Yo-An Lin

Other Decks in Programming

Transcript

  1. In computing, a virtual machine (VM) is an emulation of

    a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination. From Wikipedia, the free encyclopedia
  2. __text: 100000f08: 55 pushq %rbp 100000f09: 48 89 e5 movq

    %rsp, %rbp 100000f0c: 48 83 c7 68 addq $104, %rdi 100000f10: 48 83 c6 68 addq $104, %rsi 100000f14: 5d popq %rbp 100000f15: e9 62 36 00 00 jmp 13922 <radr://5614542+0xfa9f003a> 100000f1a: 55 pushq %rbp 100000f1b: 48 89 e5 movq %rsp, %rbp 100000f1e: 48 8d 46 68 leaq 104(%rsi), %rax 100000f22: 48 8d 77 68 leaq 104(%rdi), %rsi 100000f26: 48 89 c7 movq %rax, %rdi 100000f29: 5d popq %rbp 100000f2a: e9 4d 36 00 00 jmp 13901 <radr://5614542+0xfa9f003a> 100000f2f: 55 ushq %rbp 100000f30: 48 89 e5 movq %rsp, %rbp 100000f33: 4c 8b 46 60 movq 96(%rsi), %r8 100000f37: 48 8b 57 60 movq 96(%rdi), %rdx 100000f3b: 48 8b 4a 30 movq 48(%rdx), %rcx 100000f3f: b8 01 00 00 00 movl $1, %eax 100000f44: 49 39 48 30 cmpq %rcx, 48(%r8) 100000f48: 7f 1a jg 26 <__mh_execute_header+0xf64> 100000f4a: 7d 07 jge 7 <__mh_execute_header+0xf53> 100000f4c: b8 ff ff ff ff movl $4294967295, %eax 100000f51: eb 11 jmp 17 <__mh_execute_header+0xf64> 100000f53: 48 8b 4a 38 movq 56(%rdx), %rcx 100000f57: 49 39 48 38 cmpq %rcx, 56(%r8) 100000f5b: 7f 07 jg 7 <__mh_execute_header+0xf64> 100000f5d: b8 ff ff ff ff movl $4294967295, %eax 100000f62: 7d 02 jge 2 <__mh_execute_header+0xf66> 100000f64: 5d popq %rbp 100000f65: c3 retq 100000f66: 48 83 c7 68 addq $104, %rdi 100000f6a: 48 83 c6 68 addq $104, %rsi 100000f6e: 5d popq %rbp 100000f6f: e9 08 36 00 00 jmp 13832 <radr://5614542+0xfa9f003a> BTTFNCMZDPEF NBDIJOFDPEF
  3. 0DPEF.BSUJO3JDIBSET wTŠ.BSUJO 3JDIBSETJOWFOUFE0 DPEFFNJUUFECZ#$1- w0DPEF 0CKFDUDPEF  w0DPEFWJSUVBMNBDIJOF w#$1- #BTJD$PNCJOFE

    1SPHSBNNJOH-BOHVBHF Martin Richards (born 21 July 1940) is a British computer scientist known for his development of the BCPL programming
  4. x > 100 ? 'foo' : 'bar' X > 0

    ? ‘foo’ : ‘bar’ Lexer
  5. expression => expression + term | expression - term |

    term term => term * factor | term / factor | factor
 factor => ‘(‘ expression ‘)’
 | NUMBER
  6. #BDLUSBDLJOH1BSTFS &9: 9BCcB :C lBCzBTJOUIFJOQVU The backtracking parsers, like classical

    parsers and functional parsers, use a recursive descent algorithm. But: If a stream pattern component does not match the current position of the input stream, the control is given to the next case of the stream pattern component before it. If it is the first stream pattern component, the rule (the stream pattern is left and the next rule is tested. For example, the following grammar: E -> X Y X -> a b | a Y -> b works, with the backtracking algorithm, for the input "a b". Parsing with the non-terminal "E", the non-terminal "X" first accepts the input "a b" with its first rule. Then when "Y" is called, the parsing fails since nothing
  7. --1BSTFS HSBNNBS 4ˠ& &ˠ & &  &ˠ 4㱺& Generally,

    there are multiple possibilities when selecting a rule to expand given (leftmost) non-terminal. In the previous example of the leftmost derivation, in step 2 the parser must choose whether to apply rule 2 or rule 3: 
 To be efficient, the parser must be able to make this choice deterministically when possible, without backtracking. For some grammars, it can do this by peeking on the unread input (without reading). In our example, if the parser 㱺 & & 㱺 & &  & 㱺  &  & 㱺    & 㱺     JOQVU    
  8. A = B + C * 2 BTTJHOˠWBSFYQS FYQSˠUFSN FYQS

    FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  9. A = B + C * 2 WBS BTTJHOˠWBSFYQS FYQSˠUFSN

    FYQS FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  10. A = B + C * 2 WBS WBS UFSN

    BTTJHOˠWBSFYQS FYQSˠUFSN FYQS FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  11. A = B + C * 2 WBS WBS UFSN

    WBS GBDUPS UFSN BTTJHOˠWBSFYQS FYQSˠUFSN FYQS FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  12. A = B + C * 2 WBS WBS UFSN

    WBS GBDUPS DPOTU GBDUPS UFSN BTTJHOˠWBSFYQS FYQSˠUFSN FYQS FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  13. A = B + C * 2 WBS WBS UFSN

    WBS GBDUPS FYQS UFSN DPOTU GBDUPS UFSN BTTJHOˠWBSFYQS FYQSˠUFSN FYQS FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  14. A = B + C * 2 WBS WBS UFSN

    WBS GBDUPS FYQS UFSN DPOTU GBDUPS UFSN FYQS BTTJHOˠWBSFYQS FYQSˠUFSN FYQS FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  15. A = B + C * 2 WBS BTTJHO WBS

    UFSN WBS GBDUPS FYQS UFSN DPOTU GBDUPS UFSN FYQS BTTJHOˠWBSFYQS FYQSˠUFSN FYQS FYQSˠUFSN UFSNˠUFSN∗GBDUPS UFSNˠGBDUPS GBDUPSˠ FYQS GBDUPSˠDPOTU GBDUPSˠWBS DPOTUˠJOUFHFS
  16. --WT-3 w--QBSTFSTJTFBTJFSUPVOEFSTUBOEBOEJNQMFNFOU:PVDBOIBOE XSJUFSFDVSTJWFEFTDFOUQBSTFSTXJUIDPEFUIBUDMPTFMZNBUDIFTUIF HSBNNBS w-3QBSTFSTBSFTUSJDUMZNPSFQPXFSGVMUIBO--QBSTFST BOEJO BEEJUJPO -"-3QBSTFSTDBOSVOJO0 O MJLF--QBSTFST4PZPV

    XPOUpOEBOZGVODUJPOBMBEWBOUBHFTPG--PWFS-3 w5IVT UIFPOMZBEWBOUBHFPG--JTUIBU-3TUBUFNBDIJOFTBSFRVJUF BCJUNPSFDPNQMFYBOEEJ⒏DVMUUPVOEFSTUBOE BOE-3QBSTFST UIFNTFMWFTBSFOPUFTQFDJBMMZJOUVJUJWF0OUIFPUIFSIBOE -- QBSTFSDPEFXIJDIJTBVUPNBUJDBMMZHFOFSBUFEDBOCFWFSZFBTZUP VOEFSTUBOEBOEEFCVH
  17. expression : expression ‘+’ term { $$ = $1 +

    $3 } | expression ‘-‘ term { $$ = $1 - $3 } | term { $$ = $1 } ; term : term ‘*’ factor { $$ = $1 * $3 } | term ‘/‘ factor { $$ = $1 * $4 } | factor { $$ = $1 }
 ;
 factor : ‘(‘ expression ‘)’ { $$ = $2 } | NUMBER { $$ = $1 }
 | ID { $$ = valueof($1); } parser.y $POUFYU'SFF(SBNNBSEFpOFEJO#JTPO4ZOUBY
  18. +BWB#ZUFDPEF iconst_2 istore_1 iload_1 sipush 1000 if_icmpge 44 iconst_2 istore_2

    iload_2 iload_1 if_icmpge 31 iload_1 iload_2 irem ifne 25 goto 38 iinc 2, 1
  19. 3VCZ:"37 2 self 41 1$ putself putobject 2 putobject 2

    opt_plus opt_send_simple <!..puts leave
  20. 3VCZ:"37 2 2 self 41 1$ putself putobject 2 putobject

    2 opt_plus opt_send_simple <!..puts leave
  21. 3VCZ:"37 4 self 41 1$ putself putobject 2 putobject 2

    opt_plus opt_send_simple <!..puts leave SFUVSOWBMVF
  22. 3FHJTUFSCBTFE#ZUFDPEF 1$ QSPHSBNDPVOUFS Operator Operand Operand Result Operator Operand Operand

    Result Operator Operand Operand Result Operator Operand Operand Result Operator Operand Operand Result Operator Operand Operand Result Operator Operand Operand Result
  23. ;FOE&OHJOF R0: R1: TMP_VAR ~4: 1$ ASSIGN !0, 1 ASSIGN

    !1, 2 ADD ~4 !0, !1 ECHO ~4 RETURN
  24. ;FOE&OHJOF R0: 1 R1: TMP_VAR ~4: 1$ ASSIGN !0, 1

    ASSIGN !1, 2 ADD ~4 !0, !1 ECHO ~4 RETURN
  25. ;FOE&OHJOF R0: 1 R1: 2 TMP_VAR ~4: 1$ ASSIGN !0,

    1 ASSIGN !1, 2 ADD ~4 !0, !1 ECHO ~4 RETURN
  26. ;FOE&OHJOF R0: 1 R1: 2 TMP_VAR ~4: 3 1$ ASSIGN

    !0, 1 ASSIGN !1, 2 ADD ~4 !0, !1 ECHO ~4 RETURN
  27. ;FOE&OHJOF R0: 1 R1: 2 TMP_VAR ~4: 3 ASSIGN !0,

    1 ASSIGN !1, 2 ADD ~4 !0, !1 ECHO ~4 RETURN 1$
  28. def iterator yield 'yield, ' yield 'blocks,' yield 'Ruby' end

    iterator {|yeilded| print "use #{yeilded}"} iterator.c JTFRNFUIPE
  29. def iterator yield 'yield, ' yield 'blocks,' yield 'Ruby' end

    iterator {|yeilded| print "use #{yeilded}"} iterator.c JTFR UPQ
  30. 0002 trace 1 0004 putself 0005 putstring "hello, world" 0007

    send :puts, 1, nil, 8, <ic:0> 0013 trace 16 rb_block_t iseq
  31. JTFR iseq_trace_data iseq_link_element *prev *next custom fields iseq_insn_data iseq_link_element *prev

    *next custom fields iseq_insn_data iseq_link_element *prev *next custom fields iseq_link_anchor anchor *last
  32. $3VCZ iseq_compile_each0(rb_iseq_t *iseq, LINK_ANCHOR *const ret, const NODE *node, int

    popped) switch (type) { case NODE_BLOCK:{ while (node && nd_type(node) == NODE_BLOCK) { CHECK(COMPILE_(ret, "BLOCK body", node->nd_head, (node->nd_next ? 1 : popped))); node = node->nd_next; } if (node) { CHECK(COMPILE_(ret, "BLOCK next", node->nd_next, popped)); } break; } case NODE_IF: case NODE_UNLESS: CHECK(compile_if(iseq, ret, node, popped, type)); break; case NODE_CASE: CHECK(compile_case(iseq, ret, node, popped)); break; case NODE_CASE2: CHECK(compile_case2(iseq, ret, node, popped)); break;
  33. $3VCZJTFR /* T_IMEMO/iseq */ /* typedef rb_iseq_t is in method.h

    */ struct rb_iseq_struct { VALUE flags; /* 1 */ VALUE wrapper; /* 2 */ struct rb_iseq_constant_body *body; /* 3 */ union { /* 4, 5 words */ struct iseq_compile_data *compile_data; /* used at compile time */ struct { VALUE obj; int index; } loader; struct { struct rb_hook_list_struct *local_hooks; rb_event_flag_t global_trace_events; } exec; } aux; };
  34. $3VCZ#ZUFDPEF wJTFR JOTUSVDUJPOTFRVFODF JTPSHBOJ[FECZTDPQFT /0%&@4$01& FH UPQ NFUIPE DMBTT CMPDL

    wJTFRJTBEPVCMFMJOLFEMJTU w5IFSC@JTFR@DPNQJMF@OPEFGVODUJPOJUFSBUFTUIF"45 OPEFTBOEHFOFSBUFTUIFSFMBUFESC@JTFR@UTUSVDUVSF wJTFRBSF3VCZPCKFDUT UIFZDBOCFHBSCBHFDPMMFDUFE
  35. [FOE@PQ@BSSBZ zend_op_array zend_op *opcodes zval *literals zend_string **vars zend_op const

    void* handler zend_uchar opcode znode_op op2 znode_op op1 znode_op result zval 223 zval abc zend_op const void* handler zend_uchar opcode znode_op op2 znode_op op1 znode_op result zend_op const void* handler zend_uchar opcode znode_op op2 znode_op op1 znode_op result zval abc zval abc
  36. [FOE@DPNQJMF@WBS zend_op *zend_compile_var(znode *result, zend_ast *ast, uint32_t type, int by_ref)

    /* {{{ */ { CG(zend_lineno) = zend_ast_get_lineno(ast); switch (ast->kind) { case ZEND_AST_VAR: return zend_compile_simple_var(result, ast, type, 0); case ZEND_AST_DIM: return zend_compile_dim(result, ast, type); case ZEND_AST_PROP: return zend_compile_prop(result, ast, type, by_ref); case ZEND_AST_STATIC_PROP: return zend_compile_static_prop(result, ast, type, by_ref, 0); case ZEND_AST_CALL: zend_compile_call(result, ast, type); return NULL; case ZEND_AST_METHOD_CALL: zend_compile_method_call(result, ast, type); return NULL; case ZEND_AST_STATIC_CALL: zend_compile_static_call(result, ast, type); return NULL; case ZEND_AST_ZNODE: *result = *zend_ast_get_znode(ast);
  37. [FOE@PQ struct _zend_op { const void *handler; znode_op op1; znode_op

    op2; znode_op result; uint32_t extended_value; uint32_t lineno; zend_uchar opcode; zend_uchar op1_type; zend_uchar op2_type; zend_uchar result_type; };
  38. typedef struct RVALUE { union { struct { VALUE flags;

    /* always 0 for freed obj */ struct RVALUE *next; } free; struct RBasic basic; struct RObject object; struct RClass klass; struct RFloat flonum; struct RString string; struct RArray array; struct RRegexp regexp; struct RHash hash; struct RData data; struct RTypedData typeddata; struct RStruct rstruct; struct RBignum bignum; struct RFile file; struct RMatch match; struct RRational rational; struct RComplex complex; union { ... } imemo; struct { struct RBasic basic; VALUE v1; VALUE v2; VALUE v3; } values; } as; } RVALUE; RVALUE.as.basic RVALUE.as.object RVALUE.as.klass
  39. typedef unsigned long VALUE 32-bit VALUE space MSB ------------------------ LSB

    false 00000000000000000000000000000000 true 00000000000000000000000000000010 nil 00000000000000000000000000000100 undef 00000000000000000000000000000110 symbol ssssssssssssssssssssssss00001110 object oooooooooooooooooooooooooooooo00 fixnum fffffffffffffffffffffffffffffff1
  40. static inline int rb_type(VALUE obj) { if (RB_IMMEDIATE_P(obj)) { if

    (RB_FIXNUM_P(obj)) return RUBY_T_FIXNUM; if (RB_FLONUM_P(obj)) return RUBY_T_FLOAT; if (obj == RUBY_Qtrue) return RUBY_T_TRUE; if (RB_STATIC_SYM_P(obj)) return RUBY_T_SYMBOL; if (obj == RUBY_Qundef) return RUBY_T_UNDEF; } else if (!RB_TEST(obj)) { if (obj == RUBY_Qnil) return RUBY_T_NIL; if (obj == RUBY_Qfalse) return RUBY_T_FALSE; } return RB_BUILTIN_TYPE(obj); } 5:1& PCK NBDSPSFUVSOTUIFUZQFPGB7"-6&
  41. struct RObject { struct RBasic basic; union { struct {

    uint32_t numiv; VALUE *ivptr; void *iv_index_tbl; } heap; VALUE ary[ROBJECT_EMBED_LEN_MAX]; } as; }; TIBSFTUIFTBNFDPNNPOTUSVDUGSPNUIFNFNPSZ
  42. struct RString { struct RBasic basic; union { struct {

    long len; char *ptr; union { long capa; VALUE shared; } aux; } heap; char ary[RSTRING_EMBED_LEN_MAX + 1]; } as; };
  43. struct RArray { struct RBasic basic; union { struct {

    long len; union { long capa; VALUE shared; } aux; const VALUE *ptr; } heap; const VALUE ary[RARRAY_EMBED_LEN_MAX]; } as; };
  44. typedef union _zend_value { zend_long lval; /* long value */

    double dval; /* double value */ zend_refcounted *counted; zend_string *str; zend_array *arr; zend_object *obj; zend_resource *res; zend_reference *ref; zend_ast_ref *ast; zval *zv; void *ptr; zend_class_entry *ce; zend_function *func; struct { uint32_t w1; uint32_t w2; } ww; } zend_value;
  45. struct _zend_string { zend_refcounted_h gc; zend_ulong h; /* hash value

    */ size_t len; char val[1]; }; TIBSFTUIFTBNFDPNNPOTUSVDUGSPNUIFNFNPSZ
  46. struct _zend_array { zend_refcounted_h gc; union { struct { ZEND_ENDIAN_LOHI_4(

    zend_uchar flags, zend_uchar _unused, zend_uchar nIteratorsCount, zend_uchar _unused2) } v; uint32_t flags; } u; uint32_t nTableMask; Bucket *arData; uint32_t nNumUsed; uint32_t nNumOfElements; uint32_t nTableSize; uint32_t nInternalPointer; zend_long nNextFreeElement; dtor_func_t pDestructor; };
  47. #ZUFDPEF%JTQBUDI-PPQ zend_op_array zend_op {opcode, op1, op2, result} zend_user_opcode_handlers ZEND_NOP_SPEC ZEND_ADD_SPEC_CONST_CONST

    ZEND_SUB_SPEC_CONST_CONST ZEND_MUL_SPEC_CONST_CONST ZEND_DIV_SPEC_CONST_CONST … … zend_op {opcode, op1, op2, result} zend_op {opcode, op1, op2, result} zend_op {opcode, op1, op2, result} zend_op {opcode, op1, op2, result} Executor Loop lookup iterate
  48. x86 Memory Model Stack Heap Uninitialized data Initialized data Text

    High address Low address command-line arguments function calls dynamic memory allocation (dynamic variable) declared variables (uninitialized) initialized variables (predefined data) program code
  49. struct _zend_execute_data { const zend_op *opline; /* executed opline */

    zend_execute_data *call; /* current call */ zval *return_value; zend_function *func; /* executed function */ zval This; /* this + call_info + num_args */ zend_execute_data *prev_execute_data; zend_array *symbol_table; };
  50. zend_execute_data zval zval … function args zval num args zval

    zval … compiled vars zval zval … extra args (optional) execution information zend_function this symbol table return value … 16 bytes 16 bytes 16 bytes 16 bytes 16 bytes 16 bytes 16 bytes 16 bytes 16 bytes 16 bytes 72 bytes Grow lower address 0x10000000 zend_execute_data vars … GVODUJPONFUIPEDBMMTBSFTJNVMBUFECZUIFEFpOFE$TUSVDU
  51. stack call frame zend_arena page zend_vm_stack top end prev zend_vm_stack

    top end prev zend_execute_data symbol table current call executed zend function return value this runtime cache prev_execute_data vars (zval array) 0 1
  52. typedef struct rb_control_frame_struct { const VALUE *pc; /* program counter

    */ VALUE *sp; /* stack pointer */ const rb_iseq_t *iseq; /* instruction sequence */ VALUE self; /* self */ const VALUE *ep; /* environment pointer */ const void *block_code; /* iseq or ifunc */ const VALUE *bp; /* cfp[6] */ } rb_control_frame_t;
  53. SC@DPOUSPM@GSBNF w5IFSFBSFUXPLJOETPGTUBDLJOUIF$3VCZ7.DBMM TUBDLBOEPQFSBOETUBDL JOUFSOBMTUBDL  w4JODF3VCZJTTUBDLCBTFE7. FBDIDPOUSPMGSBNFIBT POFPQFSBOETUBDL JOUFSOBMTUBDL 

    w5IFFOWJSPONFOUQPJOUFSJTVTFEUPSFGFSFODJOHPUIFS DPOUSPMGSBNF FH$MPTVSFDBOSFGFSFODFUIFMPDBM WBSJBCMFTGSPNUIFQBSFOUTDPQF
  54. #JOBSZ'PSNBUT w&-'1&BSFGPSNBUTVTFEGPSUIF04QSPHSBN MPBEFS5IFGPSNBUDPOUBJOTUIFOBUJWFDPEF UIF OBUJWFJOTUSVDUJPOT F H YJOTUSVDUJPOTPSBSN JOTUSVDUJPOT 

    w#ZUFDPEFCJOBSZGPSNBUTBSFVTFEGPS7JSUVBM .BDIJOFTFH +7.VTFTDMBTT %BMWJL7.VTFTEY %&9GPSNBU 1ZUIPOVTFTQZD 1&1 ))7. VTFT))#$