Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Modular Virtual Machine Architecture on a Meta-Circular JVM

0660a45bc590da31068780de9b34a0df?s=47 Wei Zhang
December 07, 2011

Modular Virtual Machine Architecture on a Meta-Circular JVM


Wei Zhang

December 07, 2011

More Decks by Wei Zhang

Other Decks in Programming


  1. Modular Virtual Machine Architecture on a Meta-Circular JVM Wei Zhang

  2. 张Җ / Wei Zhang Department: EECS Program: Computer Systems &

    Software 2 Committee: Professor Michael Franz, Chair Professor Pai Chou Professor Rainer Doemer Professor Kwei-Jay Lin Professor Harry Xu
  3. Virtual machines on mobile devices... Motivation 3 Android Dalvik V8

    ActionScript VM
  4. 4 Motivation Duplicated modules between VMs Dalvik V8 ActionScript VM

    Parser Interp Parser Parser GC GC JIT GC JIT Interp JIT
  5. Drawbacks: - Big footprint for multiple language implementations - Expensive

    investment for new VMs 5 Motivation
  6. Objectives: + Smaller footprint for multiple language implementations + Simplify

    implementation for new VMs 6 Motivation JavaScript Python Ruby Host VM
  7. 7 • Interpreter • Targeting a Host VM • Meta-Circular

    VM • Early Results Outline
  8. + Easy to implement and maintain + Fast edit-compile-run cycle

    + More portable + More memory efficient 8 Interpreter
  9. - But it’s slow... 9 Interpreter

  10. 10 Interpreter Inefficient Interpreter 1000X Efficient Interpreter 10X Optimizing Compiler

    1X Performance slowdown[1] [1]: M. Anton Ertl and D. Gregg, Journal of Instruction-Level Parallelism 2003
  11. 11 Cost of interpretation[1] Instruction dispatch Operand access Performing the

    computation [1]: D. Gregg et al., The Case for Virtual Register Machines, IVME 2003 IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB
  12. for (;;) switch(program[ip++]){ /*...*/ case add: sp[1]=sp[0]+sp[1]; sp++; break; /*...*/

    } 12 Instruction Dispatch Instruction stream Dispatching loop Instruction implementations Switch-based dispatch IF ID EX ME WB
  13. Inst thread[]= {&add, &pop...}; goto *thread++; add: sp[1]=sp[0]+sp[1]; sp++; goto

    *thread++; /*...*/ 13 Instruction Dispatch Threaded-code Instruction implementations Direct Threading[1] [1]: James R. Bell, Threaded Code, CACM, 1973 IF ID EX ME WB
  14. 14 thread: &get &a &get &b &add thread: &i_get_a &i_get_b

    &i_add i_get_a: &get &a i_get_b: &get &b i_add: &add get: sp[0]=*(*ip+1); sp++; goto *(*ip++); add: sp[0]=sp[1]+sp[0]; goto *(*ip++); Direct Threading Indirect Threading Indirection Operation Routine Indirect Threading Instruction Dispatch IF ID EX ME WB
  15. 15 thread: call get_a call get_b call add Subroutine Threading

    Instruction Dispatch Threaded-code Instruction implementations IF ID EX ME WB
  16. 16 Operand Access 1:push 1 2:push 2 3:push 4 4:mul

    5:add 6:set a To perform: a = 1+2*4 1 1 2 1 2 4 1 8 9 1 push 1 push 1 push 2 pop,1 push 2 pop,1 push 1 pop Bytecode Stack Stack ops 10 stack ops IF ID EX ME WB
  17. 17 Operand Access 1:push 1 2:push 2 3:push 4 4:mul

    5:add 6:set a 0 0 1 push 1 pop 0 0 Bytecode Stack Stack ops 2 stack ops Stack caching[1] : keep top-of-stack in registers 1 1 1 2 2 4 1 8 9 [1]: M. Anton Ertl, Stack Caching for Interpreters, PLDI 1995 IF ID EX ME WB
  18. Stack-based vs. Register-based architecture 18 Operand Access 1:get a 2:get

    b 3:add 4:set c 1:add c a b With register-based architecture: -35% native instruction count[1] +45% code size[1] [1]: Yunhe Shi et al., Virtual Machine Showdown: Stack versus registers, TACO 2008 IF ID EX ME WB
  19. 19 1 + 2 -> 3 ‘a’ + ‘b’ ->

    ‘ab’ 1 + ‘a’ -> ‘1a’ ‘1’ + 2 -> ‘12’ ... Possibilities of : a + b Runtime Overhead of Dynamic Typing IF ID EX ME WB
  20. 20 if (Num && Num) { return a + b;

    } else if (String && String) return a.concat(b); } else if (Num && String) return a.toString().concat(b); } else if (String && Num) return a.concat(b.toString()); } ... Runtime overhead of Dynamic Typing Implementation of : add IF ID EX ME WB
  21. Quickening[1] 21 Instruction Stream Generic Instruction Quickened Instruction Stream Specialized

    Instruction [1]: S. Brunthaler, Inline Caching Meets Quickening, ECOOP 2010 IF ID EX ME WB Performing the Computation get a get b add get a get b nadd generic add number add guard To fallback
  22. 22 • VM offers benefits for hosted applications • Can

    we utilize those benefits while building our VM? Targeting a Host VM
  23. + Cross platform + Automatic memory management + Libraries +

    Better IDE support 23 Targeting a Host VM (JVM/CLR)
  24. Compile to Java bytecode: - Same complexity as writing compiler

    - Need to emulate language semantics which do not map well with JVM 24 Targeting JVM: Option #1 x.js x.class JVM
  25. Interpreter running on JVM: - Lack of low level machine

    control - Overhead of double interpretation 25 Targeting JVM: Option #2 x.js Interpreter JVM
  26. Restricted by well defined JVM interface 26 Problem Application Guest

    VM JVM
  27. 27 Meta-Circular VM [1]: John McCarthy, LISP 1.5 Programmer’s Manual

    1961 [2]: A. Goldberg & D. Robson, Smalltalk-80: the Language and Its Implementation 1983 • Meta-Circular virtual machine is written in the same language it implements • Original idea: meta-circular evaluator in LISP[1] • Smalltalk: the blue book reference implementation[2]
  28. 28 Meta-Circular JVM Conventional JVM Maxine VM[1] [1]: B. Titzer

    et al., VEE 2010 Courtesy Bernd Mathiske Application JDK OS Native library JVM Java code Application JDK OS Native library JVM Native code
  29. 29 Meta-Circular JVM Guest VM Maxine Compiler Garbage collector Runtime

  30. 30 Meta-Circular JVM Word target = ArrayAccess.getWord(threadedCode, index); Intrinsics.jump(target.asAddress()); ...

    Jump to an address using Maxine internal:
  31. Modular VM Guest VMs Host VM 31 Ruby Python JavaScript

    Runtime JIT GC Execution Parser Execution Parser Execution Parser
  32. 32 Current Progress • MBS JavaScript VM • Parser generated

    using ANTLR • Two interpreters • No regex yet • 19/26 SunSpider benchmarks run
  33. 33 Managed Bytecode Script VM Output Parser JavaScript AST walker

    ANTLR Runtime Baseline interpreter Optimized interpreter Bytecode
  34. 34 •Baseline Interpreter Standard Java Switch-based dispatching •Optimized Interpreter Direct

    call threading +30% Quickening +8% Managed Bytecode Script VM
  35. 35 Comparator: Rhino JavaScript VM • Open source JavaScript VM

    written in Java from Mozilla Foundation • Compile JavaScript source to Java classfile • Interpretation mode is included (AST) MBS Maxine VM Rhino Maxine VM
  36. 36 Performance: MBS vs. Rhino on Maxine 0% 20% 40%

    60% 80% 100% 120% 140% 160% 3d_cube.js 3d_morph.js 3d_raytrace.js access_binary_trees.js access_fannkuch.js access_nbody.js access_nsieve.js bitops_3bit_bits_in_byte.js bitops_bits_in_byte.js bitops_bitwise_and.js bitops_nsieve_bits.js controlflow_recursive.js crypto_md5.js crypto_sha1.js math_cordic.js math_partial_sums.js math_spectral_norm.js string_base64.js string_fasta.js arithmetic mean geometry mean Geometric Mean -25% Arithmetic Mean -18%
  37. 37 MBS versus Rhino • MBS is 10X smaller than

    Rhino (jar file size) • MBS is written in a very short period • MBS’s performance is comparable with that of Rhino .jar 4 MB .jar 400 KB MBS Rhino
  38. 38 Future Work •Threaded Code Dispatch Direct threading / Subroutine

    threading Jython / JRuby •Quickening Dynamic derivative generation Automation
  39. Thanks 39

  40. 40 MBS versus Rhino: LOC LOC MBS Rhino Front-end 20k

    7k Interpreter 5.6k 3k Runtime 3.4k ~21k Classfile generation - ~22k Other - ~20k Total 29k 73k Total w/o front-end 9k 66k
  41. 41 •LLVM[1] Framework for building optimizing compilers Persistent low level

    intermediate representation •Tracing PyPy’s interpreter[2] Running Python interpreter on Python tracing JIT Trace hot code in work load Modular VM: Related Work [1]: Chris Lattner and Vikram Adve, LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, CGO 2004 [2]: C.F. Bolz et al., Tracing the Meta-Level: PyPy’s Tracing JIT Compiler, ICOOOLPS
  42. 42 •Customize object layout for dynamic languages Customizable scheme for

    object layout Efficient resizable object layout[1] Memory Optimization on Maxine [1]: C. Chamers et al., An Efficient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes, OOPSLA 1989