Modular Virtual Machine Architecture on a Meta-Circular JVM

Modular Virtual Machine Architecture on a Meta-Circular JVM Wei Zhang
1

张Җ / Wei Zhang Department: EECS Program: Computer Systems &
Software 2 Committee: Professor Michael Franz, Chair Professor Pai Chou Professor Rainer Doemer Professor Kwei-Jay Lin Professor Harry Xu

Virtual machines on mobile devices... Motivation 3 Android Dalvik V8
ActionScript VM

4 Motivation Duplicated modules between VMs Dalvik V8 ActionScript VM
Parser Interp Parser Parser GC GC JIT GC JIT Interp JIT

Drawbacks: - Big footprint for multiple language implementations - Expensive
investment for new VMs 5 Motivation

Objectives: + Smaller footprint for multiple language implementations + Simplify
implementation for new VMs 6 Motivation JavaScript Python Ruby Host VM

7 • Interpreter • Targeting a Host VM • Meta-Circular
VM • Early Results Outline

+ Easy to implement and maintain + Fast edit-compile-run cycle
+ More portable + More memory efﬁcient 8 Interpreter

- But it’s slow... 9 Interpreter

10 Interpreter Inefﬁcient Interpreter 1000X Efﬁcient Interpreter 10X Optimizing Compiler
1X Performance slowdown[1] [1]: M. Anton Ertl and D. Gregg, Journal of Instruction-Level Parallelism 2003

11 Cost of interpretation[1] Instruction dispatch Operand access Performing the
computation [1]: D. Gregg et al., The Case for Virtual Register Machines, IVME 2003 IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB

for (;;) switch(program[ip++]){ /*...*/ case add: sp[1]=sp[0]+sp[1]; sp++; break; /*...*/
} 12 Instruction Dispatch Instruction stream Dispatching loop Instruction implementations Switch-based dispatch IF ID EX ME WB

Inst thread[]= {&add, &pop...}; goto *thread++; add: sp[1]=sp[0]+sp[1]; sp++; goto
*thread++; /*...*/ 13 Instruction Dispatch Threaded-code Instruction implementations Direct Threading[1] [1]: James R. Bell, Threaded Code, CACM, 1973 IF ID EX ME WB

14 thread: &get &a &get &b &add thread: &i_get_a &i_get_b
&i_add i_get_a: &get &a i_get_b: &get &b i_add: &add get: sp[0]=*(*ip+1); sp++; goto *(*ip++); add: sp[0]=sp[1]+sp[0]; goto *(*ip++); Direct Threading Indirect Threading Indirection Operation Routine Indirect Threading Instruction Dispatch IF ID EX ME WB

15 thread: call get_a call get_b call add Subroutine Threading
Instruction Dispatch Threaded-code Instruction implementations IF ID EX ME WB

16 Operand Access 1:push 1 2:push 2 3:push 4 4:mul
5:add 6:set a To perform: a = 1+2*4 1 1 2 1 2 4 1 8 9 1 push 1 push 1 push 2 pop,1 push 2 pop,1 push 1 pop Bytecode Stack Stack ops 10 stack ops IF ID EX ME WB

17 Operand Access 1:push 1 2:push 2 3:push 4 4:mul
5:add 6:set a 0 0 1 push 1 pop 0 0 Bytecode Stack Stack ops 2 stack ops Stack caching[1] : keep top-of-stack in registers 1 1 1 2 2 4 1 8 9 [1]: M. Anton Ertl, Stack Caching for Interpreters, PLDI 1995 IF ID EX ME WB

Stack-based vs. Register-based architecture 18 Operand Access 1:get a 2:get
b 3:add 4:set c 1:add c a b With register-based architecture: -35% native instruction count[1] +45% code size[1] [1]: Yunhe Shi et al., Virtual Machine Showdown: Stack versus registers, TACO 2008 IF ID EX ME WB

19 1 + 2 -> 3 ‘a’ + ‘b’ ->
‘ab’ 1 + ‘a’ -> ‘1a’ ‘1’ + 2 -> ‘12’ ... Possibilities of : a + b Runtime Overhead of Dynamic Typing IF ID EX ME WB

20 if (Num && Num) { return a + b;
} else if (String && String) return a.concat(b); } else if (Num && String) return a.toString().concat(b); } else if (String && Num) return a.concat(b.toString()); } ... Runtime overhead of Dynamic Typing Implementation of : add IF ID EX ME WB

Quickening[1] 21 Instruction Stream Generic Instruction Quickened Instruction Stream Specialized
Instruction [1]: S. Brunthaler, Inline Caching Meets Quickening, ECOOP 2010 IF ID EX ME WB Performing the Computation get a get b add get a get b nadd generic add number add guard To fallback

22 • VM offers beneﬁts for hosted applications • Can
we utilize those beneﬁts while building our VM? Targeting a Host VM

+ Cross platform + Automatic memory management + Libraries +
Better IDE support 23 Targeting a Host VM (JVM/CLR)

Compile to Java bytecode: - Same complexity as writing compiler
- Need to emulate language semantics which do not map well with JVM 24 Targeting JVM: Option #1 x.js x.class JVM

Interpreter running on JVM: - Lack of low level machine
control - Overhead of double interpretation 25 Targeting JVM: Option #2 x.js Interpreter JVM

Restricted by well deﬁned JVM interface 26 Problem Application Guest
VM JVM

27 Meta-Circular VM [1]: John McCarthy, LISP 1.5 Programmer’s Manual
1961 [2]: A. Goldberg & D. Robson, Smalltalk-80: the Language and Its Implementation 1983 • Meta-Circular virtual machine is written in the same language it implements • Original idea: meta-circular evaluator in LISP[1] • Smalltalk: the blue book reference implementation[2]

28 Meta-Circular JVM Conventional JVM Maxine VM[1] [1]: B. Titzer
et al., VEE 2010 Courtesy Bernd Mathiske Application JDK OS Native library JVM Java code Application JDK OS Native library JVM Native code

29 Meta-Circular JVM Guest VM Maxine Compiler Garbage collector Runtime
system

30 Meta-Circular JVM Word target = ArrayAccess.getWord(threadedCode, index); Intrinsics.jump(target.asAddress()); ...
Jump to an address using Maxine internal:

Modular VM Guest VMs Host VM 31 Ruby Python JavaScript
Runtime JIT GC Execution Parser Execution Parser Execution Parser

32 Current Progress • MBS JavaScript VM • Parser generated
using ANTLR • Two interpreters • No regex yet • 19/26 SunSpider benchmarks run

33 Managed Bytecode Script VM Output Parser JavaScript AST walker
ANTLR Runtime Baseline interpreter Optimized interpreter Bytecode

34 •Baseline Interpreter Standard Java Switch-based dispatching •Optimized Interpreter Direct
call threading +30% Quickening +8% Managed Bytecode Script VM

35 Comparator: Rhino JavaScript VM • Open source JavaScript VM
written in Java from Mozilla Foundation • Compile JavaScript source to Java classﬁle • Interpretation mode is included (AST) MBS Maxine VM Rhino Maxine VM

36 Performance: MBS vs. Rhino on Maxine 0% 20% 40%
60% 80% 100% 120% 140% 160% 3d_cube.js 3d_morph.js 3d_raytrace.js access_binary_trees.js access_fannkuch.js access_nbody.js access_nsieve.js bitops_3bit_bits_in_byte.js bitops_bits_in_byte.js bitops_bitwise_and.js bitops_nsieve_bits.js controlﬂow_recursive.js crypto_md5.js crypto_sha1.js math_cordic.js math_partial_sums.js math_spectral_norm.js string_base64.js string_fasta.js arithmetic mean geometry mean Geometric Mean -25% Arithmetic Mean -18%

37 MBS versus Rhino • MBS is 10X smaller than
Rhino (jar ﬁle size) • MBS is written in a very short period • MBS’s performance is comparable with that of Rhino .jar 4 MB .jar 400 KB MBS Rhino

38 Future Work •Threaded Code Dispatch Direct threading / Subroutine
threading Jython / JRuby •Quickening Dynamic derivative generation Automation

Thanks 39

40 MBS versus Rhino: LOC LOC MBS Rhino Front-end 20k
7k Interpreter 5.6k 3k Runtime 3.4k ~21k Classﬁle generation - ~22k Other - ~20k Total 29k 73k Total w/o front-end 9k 66k

41 •LLVM[1] Framework for building optimizing compilers Persistent low level
intermediate representation •Tracing PyPy’s interpreter[2] Running Python interpreter on Python tracing JIT Trace hot code in work load Modular VM: Related Work [1]: Chris Lattner and Vikram Adve, LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation, CGO 2004 [2]: C.F. Bolz et al., Tracing the Meta-Level: PyPy’s Tracing JIT Compiler, ICOOOLPS

42 •Customize object layout for dynamic languages Customizable scheme for
object layout Efﬁcient resizable object layout[1] Memory Optimization on Maxine [1]: C. Chamers et al., An Efﬁcient Implementation of Self, a Dynamically-Typed Object-Oriented Language Based on Prototypes, OOPSLA 1989

Modular Virtual Machine Architecture on a Meta-...

Modular Virtual Machine Architecture on a Meta-Circular JVM

More Decks by Wei Zhang

Other Decks in Programming

Featured

Transcript