Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

Modular Virtual Machine Architecture on a Meta-...

Wei Zhang
December 07, 2011

Modular Virtual Machine Architecture on a Meta-Circular JVM

Wei Zhang

December 07, 2011
Tweet

More Decks by Wei Zhang

Other Decks in Programming

Transcript

  1. 张Җ / Wei Zhang Department: EECS Program: Computer Systems &

    Software 2 Committee: Professor Michael Franz, Chair Professor Pai Chou Professor Rainer Doemer Professor Kwei-Jay Lin Professor Harry Xu
  2. 4 Motivation Duplicated modules between VMs Dalvik V8 ActionScript VM

    Parser Interp Parser Parser GC GC JIT GC JIT Interp JIT
  3. Objectives: + Smaller footprint for multiple language implementations + Simplify

    implementation for new VMs 6 Motivation JavaScript Python Ruby Host VM
  4. + Easy to implement and maintain + Fast edit-compile-run cycle

    + More portable + More memory efficient 8 Interpreter
  5. 10 Interpreter Inefficient Interpreter 1000X Efficient Interpreter 10X Optimizing Compiler

    1X Performance slowdown[1] [1]: M. Anton Ertl and D. Gregg, Journal of Instruction-Level Parallelism 2003
  6. 11 Cost of interpretation[1] Instruction dispatch Operand access Performing the

    computation [1]: D. Gregg et al., The Case for Virtual Register Machines, IVME 2003 IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB
  7. for (;;) switch(program[ip++]){ /*...*/ case add: sp[1]=sp[0]+sp[1]; sp++; break; /*...*/

    } 12 Instruction Dispatch Instruction stream Dispatching loop Instruction implementations Switch-based dispatch IF ID EX ME WB
  8. Inst thread[]= {&add, &pop...}; goto *thread++; add: sp[1]=sp[0]+sp[1]; sp++; goto

    *thread++; /*...*/ 13 Instruction Dispatch Threaded-code Instruction implementations Direct Threading[1] [1]: James R. Bell, Threaded Code, CACM, 1973 IF ID EX ME WB
  9. 14 thread: &get &a &get &b &add thread: &i_get_a &i_get_b

    &i_add i_get_a: &get &a i_get_b: &get &b i_add: &add get: sp[0]=*(*ip+1); sp++; goto *(*ip++); add: sp[0]=sp[1]+sp[0]; goto *(*ip++); Direct Threading Indirect Threading Indirection Operation Routine Indirect Threading Instruction Dispatch IF ID EX ME WB
  10. 15 thread: call get_a call get_b call add Subroutine Threading

    Instruction Dispatch Threaded-code Instruction implementations IF ID EX ME WB
  11. 16 Operand Access 1:push 1 2:push 2 3:push 4 4:mul

    5:add 6:set a To perform: a = 1+2*4 1 1 2 1 2 4 1 8 9 1 push 1 push 1 push 2 pop,1 push 2 pop,1 push 1 pop Bytecode Stack Stack ops 10 stack ops IF ID EX ME WB
  12. 17 Operand Access 1:push 1 2:push 2 3:push 4 4:mul

    5:add 6:set a 0 0 1 push 1 pop 0 0 Bytecode Stack Stack ops 2 stack ops Stack caching[1] : keep top-of-stack in registers 1 1 1 2 2 4 1 8 9 [1]: M. Anton Ertl, Stack Caching for Interpreters, PLDI 1995 IF ID EX ME WB
  13. Stack-based vs. Register-based architecture 18 Operand Access 1:get a 2:get

    b 3:add 4:set c 1:add c a b With register-based architecture: -35% native instruction count[1] +45% code size[1] [1]: Yunhe Shi et al., Virtual Machine Showdown: Stack versus registers, TACO 2008 IF ID EX ME WB
  14. 19 1 + 2 -> 3 ‘a’ + ‘b’ ->

    ‘ab’ 1 + ‘a’ -> ‘1a’ ‘1’ + 2 -> ‘12’ ... Possibilities of : a + b Runtime Overhead of Dynamic Typing IF ID EX ME WB
  15. 20 if (Num && Num) { return a + b;

    } else if (String && String) return a.concat(b); } else if (Num && String) return a.toString().concat(b); } else if (String && Num) return a.concat(b.toString()); } ... Runtime overhead of Dynamic Typing Implementation of : add IF ID EX ME WB
  16. Quickening[1] 21 Instruction Stream Generic Instruction Quickened Instruction Stream Specialized

    Instruction [1]: S. Brunthaler, Inline Caching Meets Quickening, ECOOP 2010 IF ID EX ME WB Performing the Computation get a get b add get a get b nadd generic add number add guard To fallback
  17. 22 • VM offers benefits for hosted applications • Can

    we utilize those benefits while building our VM? Targeting a Host VM
  18. + Cross platform + Automatic memory management + Libraries +

    Better IDE support 23 Targeting a Host VM (JVM/CLR)
  19. Compile to Java bytecode: - Same complexity as writing compiler

    - Need to emulate language semantics which do not map well with JVM 24 Targeting JVM: Option #1 x.js x.class JVM
  20. Interpreter running on JVM: - Lack of low level machine

    control - Overhead of double interpretation 25 Targeting JVM: Option #2 x.js Interpreter JVM
  21. 27 Meta-Circular VM [1]: John McCarthy, LISP 1.5 Programmer’s Manual

    1961 [2]: A. Goldberg & D. Robson, Smalltalk-80: the Language and Its Implementation 1983 • Meta-Circular virtual machine is written in the same language it implements • Original idea: meta-circular evaluator in LISP[1] • Smalltalk: the blue book reference implementation[2]
  22. 28 Meta-Circular JVM Conventional JVM Maxine VM[1] [1]: B. Titzer

    et al., VEE 2010 Courtesy Bernd Mathiske Application JDK OS Native library JVM Java code Application JDK OS Native library JVM Native code
  23. Modular VM Guest VMs Host VM 31 Ruby Python JavaScript

    Runtime JIT GC Execution Parser Execution Parser Execution Parser
  24. 32 Current Progress • MBS JavaScript VM • Parser generated

    using ANTLR • Two interpreters • No regex yet • 19/26 SunSpider benchmarks run
  25. 33 Managed Bytecode Script VM Output Parser JavaScript AST walker

    ANTLR Runtime Baseline interpreter Optimized interpreter Bytecode
  26. 35 Comparator: Rhino JavaScript VM • Open source JavaScript VM

    written in Java from Mozilla Foundation • Compile JavaScript source to Java classfile • Interpretation mode is included (AST) MBS Maxine VM Rhino Maxine VM
  27. 36 Performance: MBS vs. Rhino on Maxine 0% 20% 40%

    60% 80% 100% 120% 140% 160% 3d_cube.js 3d_morph.js 3d_raytrace.js access_binary_trees.js access_fannkuch.js access_nbody.js access_nsieve.js bitops_3bit_bits_in_byte.js bitops_bits_in_byte.js bitops_bitwise_and.js bitops_nsieve_bits.js controlflow_recursive.js crypto_md5.js crypto_sha1.js math_cordic.js math_partial_sums.js math_spectral_norm.js string_base64.js string_fasta.js arithmetic mean geometry mean Geometric Mean -25% Arithmetic Mean -18%
  28. 37 MBS versus Rhino • MBS is 10X smaller than

    Rhino (jar file size) • MBS is written in a very short period • MBS’s performance is comparable with that of Rhino .jar 4 MB .jar 400 KB MBS Rhino
  29. 38 Future Work •Threaded Code Dispatch Direct threading / Subroutine

    threading Jython / JRuby •Quickening Dynamic derivative generation Automation
  30. 40 MBS versus Rhino: LOC LOC MBS Rhino Front-end 20k

    7k Interpreter 5.6k 3k Runtime 3.4k ~21k Classfile generation - ~22k Other - ~20k Total 29k 73k Total w/o front-end 9k 66k
  31. 41 •LLVM[1] Framework for building optimizing compilers Persistent low level

    intermediate representation •Tracing PyPy’s interpreter[2] Running Python interpreter on Python tracing JIT Trace hot code in work load Modular VM: Related Work [1]: Chris Lattner and Vikram Adve, LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, CGO 2004 [2]: C.F. Bolz et al., Tracing the Meta-Level: PyPy’s Tracing JIT Compiler, ICOOOLPS
  32. 42 •Customize object layout for dynamic languages Customizable scheme for

    object layout Efficient resizable object layout[1] Memory Optimization on Maxine [1]: C. Chamers et al., An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes, OOPSLA 1989