Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework

Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework

Yusuke Izawa

March 13, 2023
Tweet

More Decks by Yusuke Izawa

Other Decks in Research

Transcript

  1. Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT

    Compiler Framework Yusuke Izawa, Hidehiko Masuhara, and Carl Friedrich Bolz-Tereick MoreVMs’23 workshop, March 13, 2023
  2. Background: RPython [Bol+09] • A language implementation framework to develop

    a high-performance virtual machine (VM) − Generate a VM w/ tracing JIT compiler from an interpreter • Used for generating several VMs such as PyPy [RP06], Pycket [Bau+15], and so forth Interpreter RPy Interpreter Tracing JIT RPython translator C Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 2/24
  3. Background: Modern VMs Employ Multilevel Compilation • Supported in modern

    VMs such as HotSpotTM , V8, and so forth • Balances code quality and compilation time by changing compilation levels execution speed slow fast short-lived app. e.g., batch-process long-lived app. e.g., server-side. compilation time fast slow interp. lightweight JIT JIT Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 3/24
  4. Background: Modern VMs Employ Multilevel Compilation • Long-lived programs are

    applied to a JIT compiler − generating quality code but consuming compilation time execution speed slow fast short-lived app. e.g., batch-process long-lived app. e.g., server-side. compilation time fast slow interp. lightweight JIT JIT Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 3/24
  5. Background: Modern VMs Employ Multilevel Compilation • Long-lived programs are

    applied to a JIT compiler − generating quality code but consuming compilation time • Short-lived programs need to be run with a lightweight compiler − generating code quickly execution speed slow fast short-lived app. e.g., batch-process long-lived app. e.g., server-side. compilation time fast slow interp. lightweight JIT JIT Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 3/24
  6. RPython is Not Yet Competitive w/ Language-Specific VMs • In

    particular: no support for multilevel compilation on RPython − Dilemma: hard to extend generated VMs from RPython execution speed slow fast short-lived app. e.g., batch-process long-lived app. e.g., server-side. compilation time fast slow generated interp. lightweight JIT meta-tracing JIT We support Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 4/24
  7. Dilemma: Hard to Extend Generated VMs (1) • In a

    language-specific VM, all components are manageable virtual machine JIT compiler exec. Source Bytecode Target Selector Interpreter IR Translator Optimizer Native code Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 5/24
  8. Dilemma: Hard to Extend Generated VMs (2) • Dilemma: In

    RPython, only an interpreter (and bytecode compiler) can be managed − How to add lightweight compilation to RPython w/ lower effort? generated VM JIT compiler exec. Source Bytecode (or AST) Target Selector Interpreter IR Translator Optimizer Native code Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 6/24
  9. How to Add Lightweight Compilation to Meta-Tracing JIT w/ Lower

    Effort?: Hint instruction-based Approach • Approach: control the behavior of meta-tracing JIT by inserting hint instructions into an interpreter, not creating compilers from scratch generated VM Bytecode Interpreter Meta-tracing JIT Compiler Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 7/24
  10. How to Add Lightweight Compilation to Meta-Tracing JIT w/ Lower

    Effort?: Hint instruction-based Approach • Approach: control the behavior of meta-tracing JIT by inserting hint instructions into an interpreter, not creating compilers from scratch − Hint instruction: a pseudo function that can influence the behavior of meta-tracing JIT generated VM Bytecode Interpreter Hint insts Meta-tracing JIT Compiler Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 7/24
  11. How to Add Lightweight Compilation to Meta-Tracing JIT w/ Lower

    Effort?: Hint instruction-based Approach • Previous work: add threaded code generation to meta-tracing JIT [JOT ’22] generated VM Bytecode Interpreter Hint insts Meta-tracing JIT Compiler Native from method- based threaded code Native from trace- based JIT Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 7/24
  12. How to Add Lightweight Compilation to Meta-Tracing JIT w/ Lower

    Effort?: Hint instruction-based Approach • Previous work: add threaded code generation to meta-tracing JIT [JOT ’22] • This work: realize inline caching [DS84] in threaded code generation and multilevel compilation generated VM Bytecode Interpreter Hint insts Meta-tracing JIT Compiler Native from method- based threaded code + inline caching Native from trace- based JIT Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 7/24
  13. Proposal: Multilevel RPython Interpreter w/ hint instructions Multilevel RPython User

    writes VM generation time runtime Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 8/24
  14. Proposal: Multilevel RPython � Each interpreter represents each compilation level

    Interpreter w/ hint instructions Multilevel RPython interp. w/ hints for threaded code gen. interp. w/ hints for tracing JIT Meta-tracing JIT Compiler VM generation time runtime Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 8/24
  15. Proposal: Multilevel RPython � Each interpreter represents each compilation level

    Interpreter w/ hint instructions Multilevel RPython interp. w/ hints for threaded code gen. interp. w/ hints for tracing JIT Meta-tracing JIT Compiler source program native (threaded code) VM generation time runtime Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 8/24
  16. Proposal: Multilevel RPython � Each interpreter represents each compilation level

    Interpreter w/ hint instructions Multilevel RPython interp. w/ hints for threaded code gen. interp. w/ hints for tracing JIT Meta-tracing JIT Compiler source program native (threaded code) Level Shifting tracing-JIT-suitable hot spot found VM generation time runtime Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 8/24
  17. Proposal: Multilevel RPython � Each interpreter represents each compilation level

    Interpreter w/ hint instructions Multilevel RPython interp. w/ hints for threaded code gen. interp. w/ hints for tracing JIT Meta-tracing JIT Compiler source program native (threaded code) Level Shifting tracing-JIT-suitable hot spot found native (tracing JIT) VM generation time runtime Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 8/24
  18. Level Shifting Between Threaded Code Gen. and Tracing JIT def

    interp_threaded(pc)� threaded_driver.can_enter_jit(��) while True: instr �� bytecode[pc��] if instr �� JUMP_BACKWARD� if bytecode.counts[pc] > THRESHOLD� raise ContinueInTracing(pc) bytecode.couts[pc] += 1 pc = bytecode[pc �] elif �� Interpreter for threaded code generation Interpreter for tracing JIT compilation def interp(��) while True: try: result = interp_threaded(pc) catch ContinueInTracing as e: pc = e.pc try: result = interp_tracing(pc) catch ContinueInThreaded as e: pc = e.pc def interp_tracing(pc)� while True: instr �� bytecode[pc��] if instr �� JUMP_BACKWARD� if bytecode.counts[pc] �� THRESHOLD� raise ContinueInThreaded(pc) bytecode.couts[pc] += 1 pc = bytecode[pc��] tracing_driver.can_enter_jit(��) elif �� level up level down Profiles the exec. if exceeds threshold: start lightweight compilation Interpreter Shi�ter entry Profiles the exec. if exceeds threshold: start heavyweight compilation Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 9/24
  19. Brief Recap: Threaded Code Generation [JOT ’22] • Generate threaded

    code [Bel73] by inserting hint instructions into an interpreter � small code size and short compilation time � method calls in threaded code was slow (next) Bytecode L0: INC JUMP_IF L1 INC JUMP L0 L1: CALL "g" RET Interpreter Hint insts Trace styled w/ threaded code L0: call(INC) guard_false(.., L1) call(INC) jump(L0) L1: call(CALL("g")) finish(..) Asm output L0: call INC jnz L1 call INC jmp L0 L1: call CALL ret Method-based Shallow Tracing and Traversal Stack Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 10/24
  20. Technical Problem: Method Calls in Threaded Code Was Slow •

    Every time go into the system method lookup routine of jit_merge_point Slow if instr == CALL: method_f = pop(stack) # method_f is compiled handler_CALL(method_f, stack) calling method call(handler_CALL, method_f, stack) trace def handler_CALL(method, stack): r = interp(method, stack) push(r, stack) lookup method_f by jit_merge_point interpreter class A: method_f compiled trace trace Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 11/24
  21. Solution: Inline Caching w/ Hint Instructions • Convert into a

    direct call by inline caching w/ hint instructions Fast calling method w/ hint instructions guard_ptr_eq(method_f, A) call_assembler(method_f) ... fast path w/ IC def handler_CALL(method, stack): r = interp(method, stack) push(r, stack) lookup method_f by jit_merge_point interpreter class A: method _f compiled trace trace call(handler_CALL, ..) slow path guard failed Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 12/24
  22. Inline Caching w/ Hint Instructions • Technique: tame interpreter w/

    hint instructions − record the runtime type of a method during interpretation − verify the runtime type of a method − call call assembler at CALL when the verification passed def handler_CALL(stack): method = pop(stack) if we_are_interpreted(): record_tbl[pc] = method.type ... guard_ptr_eq(method_f, A) r = call_assembler(method, stack) setitem(r, stack) ... fast path call(handler_CALL, ..) ... slow path Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 13/24
  23. Inline Caching w/ Hint Instructions • Technique: tame interpreter w/

    hint instructions − record the runtime type of a method during interpretation − verify the runtime type of a method − call call assembler at CALL when the verification passed while True: instr = bytecode[pc++] if instr == CALL: method_f = pop(stack) if check_typ(method_f, pc): ... guard_ptr_eq(method_f, A) r = call_assembler(method, stack) setitem(r, stack) ... fast path call(handler_CALL, ..) ... slow path Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 13/24
  24. Inline Caching w/ Hint Instructions • Technique: tame interpreter w/

    hint instructions − record the runtime type of a method during interpretation − verify the runtime type of a method − call call assembler at CALL when the verification passed if check_typ(method_f, pc): r = call_assembler(method_f, stack, ..) push(r) else: handler_CALL(stack, pc, ..) ... guard_ptr_eq(method_f, A) r = call_assembler(method, stack) setitem(r, stack) ... fast path call(handler_CALL, ..) ... slow path Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 13/24
  25. Inline Caching w/ Hint Instructions • Technique: tame interpreter w/

    hint instructions − record the runtime type of a method during interpretation − verify the runtime type of a method − call call assembler at CALL when the verification passed def handler_CALL(stack): method = pop(stack) if we_are_interpreted(): record_tbl[pc] = method.type while True: instr = bytecode[pc++] if instr == CALL: method_f = pop(stack) if check_typ(method_f, pc): r = call_assembler(method_f, stack, ..) push(r) else: handler_CALL(stack, pc, ..) ... guard_ptr_eq(method_f, A) r = call_assembler(method, stack) setitem(r, stack) ... fast path call(handler_CALL, ..) ... slow path Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 13/24
  26. Implementation • Implemented on PySOM − PySOM: A subset of

    Smalltalk implementation by RPython • 14000 LOC in RPython • 10 out of 72 instructions are instrumented to do threaded code generation − jump_on_false, jump_backward, return_local, ... • Total LOC: − PySOM: about 450 LOC addition − RPython: about 600 LOC addition Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 14/24
  27. Evaluation and Experiment: Overview Micro-Benchmark Evaluation • Evaluate the cost

    and benefit of two JITs: threaded code generation (+ inline caching) and tracing JIT Multilevel Experiment in a Simulated Real-World Workload • Evaluate the performance of multilevel compilation in RPython against single level compilation − Multilevel: threaded code + tracing JIT − Single level: tracing JIT Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 15/24
  28. Micro-benchmark Evaluation: What is Evaluated? Evaluate the cost and benefit

    of two different JITs • Cost: compilation time − Correlation between bytecode size and compile time • Benefit: peak performance at steady state − Comparison with interpreter execution Targets • Targets: PySOM original + Are We Fast Yet? [MDM16] micro benchmark • Methodology: Ran 2000 times in one set for each program, iterated 30 sets threaded code genera- tion and tracing JIT Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 16/24
  29. Compilation Time and Bytecode Sizes • Threaded code: compilation time

    is proportional to bytecode size • Tracing JIT: unstable 100 200 300 400 500 600 700 Bytecode size (byte) 0 20 40 60 80 Compilation time (ms) Dispatch Fibonacci Sum Recurse Loop Sieve List Storage Queens Mandelbrot Fannkuch Bounce BubbleSort TreeSort QuickSort y=0.0411 x + 5.0 R2=0.118 y=0.0044 x + 0.31 R2=0.7087 Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 17/24
  30. Peak Performance at Steady State • Overall: threaded code was

    4 % faster than interpreter, 94 % slower than tracing JIT − Inline caching improved threaded code approx. 20 % Fibonacci Mandelbrot Recurse Sum Loop Dispatch Sieve BubbleSort TreeSort Fannkuch List Permute Queens QuickSort Bounce Storage mean Benchmark 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Peak performance (norm. to interp) Threaded code Threaded code (no IC) Tracing JIT better Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 18/24
  31. Multilevel JIT Experiment: Is Adding Threaded Code Generation Beneficial for

    RPython? • Experimentation in real-world applications is currently difficult − difficult to access to the enterprise app, current implementation size � Simulated a real-world workload w/ large benchmarks − Richards + Json + CD + DeltaBlue Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 19/24
  32. #Invocation: Simulated Larger Application • Simulated Application: Richards + Json

    + CD + DeltaBlue #substringFrom:to: a JsonParser#read #ifNil: #charAt: a JsonParser#readChar: a JsonParser#skipWhiteSpace a JsonParser#isWhiteSpace a Vector#append: a JsonParser#readValue a JsonParser#readArrayElement: a JsonArray#add: a JsonParser#startCapture a JsonParser#endCapture a JsonParser#readDigit a JsonParser#isDigit a JsonParser#readStringInternal a JsonParser#readRequiredChar: a JsonParser#readString Class(JsonString)#new: a JsonString#initializeWith: a JsonString#ifNil: Class(Vector)#new: a Vector#initialize: Class(Vector)#new a JsonParser#readNumber a JsonParser#readFraction a JsonParser#readExponent Class(JsonNumber)#new: a JsonNumber#initializeWith: a JsonNumber#ifNil: a JsonLiteral#ifNil: a HashIndexTable#hashSlotFor: a Vector#size a JsonParser#readObjectKeyValuePair: a JsonParser#readName a JsonObject#add:with: a HashIndexTable#at:put: a JsonParser#readNull Class(JsonLiteral)#NULL a JsonParser#readArray Class(JsonArray)#new a JsonArray#initialize a JsonArray#ifNil: 2#<> 1#<> a JsonParser#readObject Class(JsonObject)#new a JsonObject#initialize Class(HashIndexTable)#new a HashIndexTable#initialize Class(Array)#new:withAll: a JsonObject#ifNil: parent#ifNil: index#ifNil: 255#ifNil: texts#ifNil: cellBackgrounds#ifNil: 6#<> 252#ifNil: 4#<> 62#ifNil: 175#ifNil: 5#<> 173#ifNil: 127#ifNil: 168#ifNil: a JsonParser#readTrue Class(JsonLiteral)#TRUE 238#ifNil: 0#<> 0#ifNil: 3#<> 9#<> 7#<> 159#ifNil: 41#ifNil: 233#ifNil: 50#ifNil: 8#<> 236#ifNil: style#ifNil: 10#ifNil: text#ifNil: 52#ifNil: Selection#ifNil: bounds#ifNil: tabIndex#ifNil: 114#ifNil: -1#ifNil: a Variable#value a Variable#value: a EqualityConstraint#execute a OrderedCollection#isEmpty a Variable#stay a Variable#mark a ScaleConstraint#execute a OrderedCollection#append: a Variable#determinedBy a Variable#walkStrength a OrderedCollection#do: a Strength#arithmeticValue a Variable#constraints #forward#notNil a Variable#mark: a OrderedCollection#add: a Pair#key a EqualityConstraint#output a ScaleConstraint#output a OrderedCollection#removeFirst a Strength#weaker: a Variable#walkStrength: a Variable#stay: a Planner#addConstraintsConsuming:to: a ScaleConstraint#isSatisfied a EqualityConstraint#isSatisfied Class(OrderedCollection)#new: a OrderedCollection#initialize: a Strength#weakest: Class(Planner)#current a Variable#addConstraint: a Plan#execute a Plan#do: a EditConstraint#execute a Variable#determinedBy: Class(OrderedCollection)#with: a Strength#stronger: a Planner#addPropagate:mark: a Vector#do: a ScaleConstraint#recalculate a Array#last a Array#first a EqualityConstraint#inputsDo: a EqualityConstraint#recalculate a Set#do: a EqualityConstraint#satisfy: a EqualityConstraint#chooseMethod: a Dictionary#at: a Pair#value a Planner#newMark a StayConstraint#output Class(Strength)#of: a Planner#incrementalAdd: Class(Strength)#absoluteWeakest a Plan#addLast: a Plan#append: a Planner#constraintsConsuming:do: a ScaleConstraint#inputsDo: a Variable#initialize nil#notNil a EqualityConstraint#markUnsatisfied #backward#notNil Class(Variable)#value: a ScaleConstraint#inputsKnown: a StayConstraint#isSatisfied a OrderedCollection#at: a OrderedCollection#checkIndex:ifValid: a EditConstraint#output a Vector#append: a StayConstraint#satisfy: a StayConstraint#chooseMethod: a StayConstraint#inputsDo: a StayConstraint#recalculate a StayConstraint#isInput a StayConstraint#execute a ScaleConstraint#satisfy: a ScaleConstraint#chooseMethod: Class(Variable)#new Class(StayConstraint)#var:strength: a StayConstraint#var:strength: a StayConstraint#addConstraint a StayConstraint#addToGraph Class(EqualityConstraint)#var:var:strength: a EqualityConstraint#var:var:strength: a EqualityConstraint#addConstraint a EqualityConstraint#addToGraph a EqualityConstraint#inputsKnown: Class(ScaleConstraint)#var:var:var:var:strength: a ScaleConstraint#src:scale:offset:dst:strength: Class(Vector)#new: a Vector#initialize: Class(Vector)#new a Vector#forEach: a Vector#size a RedBlackTree#forEach: nil#isNil a Simulator#simulate: a CollisionDetector#handleNewFrame: a CollisionDetector#reduceCollisionSet: a Vector#append: a CD#name us#print CD#print us#println a CD#innerBenchmarkLoop: a CD#benchmark: Class(Simulator)#new: a Simulator#init: Class(CallSign)#new: a CallSign#init: Class(CollisionDetector)#new a CollisionDetector#initialize a CD#verify:resultFor: a BenchmarkHarness#print:run: : iterations=1 runtime: #print a System#resolve: a Vector#removeFirst a Vector#isEmpty Class(Constants)#GoodVoxelSize Class(Vector2D)#x:y: a Vector2D#initX:y: a BenchmarkHarness#respondsTo: Class(BenchmarkHarness)#hasMethod: a BenchmarkHarness#run: a BenchmarkHarness#initialize a BenchmarkHarness#processArguments: a Vector#appendAll: a BenchmarkHarness#loadBenchmarkClass: Class(CD)#ifNil: 10#asInteger a BenchmarkHarness#runBenchmark Class(CD)#new Class(Constants)#initialize a CD#oneTimeSetup Starting #+ CD#asString Starting CD#+ benchmark.#asString Starting CD benchmark.#println Starting CD benchmark.#print a BenchmarkHarness#doRuns: 649359#print 649359#print 89147#print 89147#print 23670#print 23670#print 18732#print 18732#print 18825#print 18825#print 18855#print 18855#print 18882#print 18882#print 18748#print 18748#print 18666#print 18666#print 35643#print 35643#print a BenchmarkHarness#reportBenchmark:result: : iterations=#print 10#print 10#print average: #print 91052#print 91052#print total: #print 910527#print 910527#print #println #print a BenchmarkHarness#printTotal Method 0 50000 100000 150000 200000 250000 300000 #invoke #invoke Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 20/24
  33. #Invocation: Threshold for Simulated Larger App • Determined by the

    following heuristics: − about 30% of the methods are compiled by threaded code − about 20% are by tracing JIT • Based on the result of DaCapo benchmark [Bla+06] #substringFrom:to: a JsonParser#read #ifNil: #charAt: a JsonParser#readChar: a JsonParser#skipWhiteSpace a JsonParser#isWhiteSpace a Vector#append: a JsonParser#readValue sonParser#readArrayElement: a JsonArray#add: a JsonParser#startCapture a JsonParser#endCapture a JsonParser#readDigit a JsonParser#isDigit JsonParser#readStringInternal sonParser#readRequiredChar: a JsonParser#readString Class(JsonString)#new: a JsonString#initializeWith: a JsonString#ifNil: Class(Vector)#new: a Vector#initialize: Class(Vector)#new a JsonParser#readNumber a JsonParser#readFraction a JsonParser#readExponent Class(JsonNumber)#new: a JsonNumber#initializeWith: a JsonNumber#ifNil: a JsonLiteral#ifNil: HashIndexTable#hashSlotFor: a Vector#size rser#readObjectKeyValuePair: a JsonParser#readName a JsonObject#add:with: a HashIndexTable#at:put: a JsonParser#readNull Class(JsonLiteral)#NULL a JsonParser#readArray Class(JsonArray)#new a JsonArray#initialize a JsonArray#ifNil: 2#<> 1#<> a JsonParser#readObject Class(JsonObject)#new a JsonObject#initialize Class(HashIndexTable)#new a HashIndexTable#initialize Class(Array)#new:withAll: a JsonObject#ifNil: parent#ifNil: index#ifNil: 255#ifNil: texts#ifNil: cellBackgrounds#ifNil: 6#<> 252#ifNil: 4#<> 62#ifNil: 175#ifNil: 5#<> 173#ifNil: 127#ifNil: 168#ifNil: a JsonParser#readTrue Class(JsonLiteral)#TRUE 238#ifNil: 0#<> 0#ifNil: 3#<> 9#<> 7#<> 159#ifNil: 41#ifNil: 233#ifNil: 50#ifNil: 8#<> 236#ifNil: style#ifNil: 10#ifNil: text#ifNil: 52#ifNil: Selection#ifNil: bounds#ifNil: tabIndex#ifNil: 114#ifNil: -1#ifNil: a Variable#value a Variable#value: a EqualityConstraint#execute a OrderedCollection#isEmpty a Variable#stay a Variable#mark a ScaleConstraint#execute a OrderedCollection#append: a Variable#determinedBy a Variable#walkStrength a OrderedCollection#do: a Strength#arithmeticValue a Variable#constraints #forward#notNil a Variable#mark: a OrderedCollection#add: a Pair#key a EqualityConstraint#output a ScaleConstraint#output rderedCollection#removeFirst a Strength#weaker: a Variable#walkStrength: a Variable#stay: addConstraintsConsuming:to: a ScaleConstraint#isSatisfied EqualityConstraint#isSatisfied lass(OrderedCollection)#new: a OrderedCollection#initialize: a Strength#weakest: Class(Planner)#current a Variable#addConstraint: a Plan#execute a Plan#do: a EditConstraint#execute a Variable#determinedBy: lass(OrderedCollection)#with: a Strength#stronger: Planner#addPropagate:mark: a Vector#do: a ScaleConstraint#recalculate a Array#last a Array#first EqualityConstraint#inputsDo: qualityConstraint#recalculate a Set#do: a EqualityConstraint#satisfy: lityConstraint#chooseMethod: a Dictionary#at: a Pair#value a Planner#newMark a StayConstraint#output Class(Strength)#of: a Planner#incrementalAdd: s(Strength)#absoluteWeakest a Plan#addLast: a Plan#append: er#constraintsConsuming:do: a ScaleConstraint#inputsDo: a Variable#initialize nil#notNil tyConstraint#markUnsatisfied #backward#notNil Class(Variable)#value: caleConstraint#inputsKnown: a StayConstraint#isSatisfied a OrderedCollection#at: Collection#checkIndex:ifValid: a EditConstraint#output a Vector#append: a StayConstraint#satisfy: tayConstraint#chooseMethod: a StayConstraint#inputsDo: a StayConstraint#recalculate a StayConstraint#isInput a StayConstraint#execute a ScaleConstraint#satisfy: aleConstraint#chooseMethod: Class(Variable)#new StayConstraint)#var:strength: StayConstraint#var:strength: tayConstraint#addConstraint a StayConstraint#addToGraph Constraint)#var:var:strength: yConstraint#var:var:strength: alityConstraint#addConstraint ualityConstraint#addToGraph alityConstraint#inputsKnown: int)#var:var:var:var:strength: #src:scale:offset:dst:strength: Class(Vector)#new: a Vector#initialize: Class(Vector)#new a Vector#forEach: a Vector#size a RedBlackTree#forEach: nil#isNil a Simulator#simulate: nDetector#handleNewFrame: Detector#reduceCollisionSet: a Vector#append: a CD#name us#print CD#print us#println a CD#innerBenchmarkLoop: a CD#benchmark: Class(Simulator)#new: a Simulator#init: Class(CallSign)#new: a CallSign#init: Class(CollisionDetector)#new a CollisionDetector#initialize a CD#verify:resultFor: BenchmarkHarness#print:run: : iterations=1 runtime: #print a System#resolve: a Vector#removeFirst a Vector#isEmpty ss(Constants)#GoodVoxelSize Class(Vector2D)#x:y: a Vector2D#initX:y: nchmarkHarness#respondsTo: chmarkHarness)#hasMethod: a BenchmarkHarness#run: BenchmarkHarness#initialize kHarness#processArguments: a Vector#appendAll: arness#loadBenchmarkClass: Class(CD)#ifNil: 10#asInteger markHarness#runBenchmark Class(CD)#new Class(Constants)#initialize a CD#oneTimeSetup Starting #+ CD#asString Starting CD#+ benchmark.#asString arting CD benchmark.#println Starting CD benchmark.#print BenchmarkHarness#doRuns: 649359#print 649359#print 89147#print 89147#print 23670#print 23670#print 18732#print 18732#print 18825#print 18825#print 18855#print 18855#print 18882#print 18882#print 18748#print 18748#print 18666#print 18666#print 35643#print 35643#print ess#reportBenchmark:result: : iterations=#print 10#print 10#print average: #print 91052#print 91052#print total: #print 910527#print 910527#print #println #print BenchmarkHarness#printTotal 0 10000 20000 30000 40000 50000 #invoke (omitted after 50000) 2000 Threaded Tracing #invoke Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 21/24
  34. Performance: Simulated Larger Application Result • Multilevel is the fastest

    − about 14 % faster than tracing − about 5 % faster than tracing (w/ same threshold to multilevel) Policy compile time elapsed time comp / elapse Multilevel 66ms 90ms 0.73 Tracing 73ms 104ms 0.70 Tracing (same threshold) 62ms 94ms 0.65 Threaded 32ms 135ms 0.23 Interpreter 0ms 131ms 0 Policy Loop threshold Function threshold Multilevel 1539 2039 Tracing 1039 1639 geo_mean Experiment 0.0 0.2 0.4 0.6 0.8 1.0 Performance (normalized to interp) Multilevel Tracing Tracing (threshold = multilevel) Threaded better Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 22/24
  35. Adaptive Compilation in RPython Perform multilevel compilation with “one interpreter”

    and “one engine” optimization level threaded code lightweight 2 · · · tracing method tracing + method Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 23/24
  36. Adaptive Compilation in RPython Perform multilevel compilation with “one interpreter”

    and “one engine” optimization level threaded code lightweight 2 · · · tracing method tracing + method one generic interp. → common interp + a bit different definitions Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 23/24
  37. Adaptive Compilation in RPython Perform multilevel compilation with “one interpreter”

    and “one engine” optimization level threaded code lightweight 2 · · · tracing method tracing + method one generic interp. → common interp + a bit different definitions perform on one engine = RPython Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 23/24
  38. Conclusion • Showed it is possible to add a new

    behavior in a meta-tracing JIT compiler framework by using hint instructions − threaded code generation [JOT ’22] − inline caching [This talk] • Multilevel compilation showed 14% better overall performance in the application that simulated a real-word workload than tracing-JIT-only compilation Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 24/24
  39. How to Reduce the RPython’s Compilation Time Initial trace is

    so long on tracing recursive calls pc = 0; bytecode = [..]; stack = [..] while True: instr = bytecode[pc++] if instr == INT: n = ord(bytecode[pc++]) push(stack, n) elif instr == LT: y, x = pop(stack), pop(stack) if x < y: push(stack, True) else: push(stack, False) elif instr == JUMP_IF: if not top(stack): pc = bytecode[pc++] elif instr == JUMP_BACK: pc = bytecode[pc++] elif instr == CALL: target = bytecode[pc++] r = interp(stack, target) push(r) elif .. def sum(x): if x < 1: return 1 else: return n + sum(n-1) Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 1/6
  40. How to Reduce the RPython’s Compilation Time Initial trace is

    so long on tracing recursive calls pc = 0; bytecode = [..]; stack = [..] while True: instr = bytecode[pc++] if instr == INT: n = ord(bytecode[pc++]) push(stack, n) elif instr == LT: y, x = pop(stack), pop(stack) if x < y: push(stack, True) else: push(stack, False) elif instr == JUMP_IF: if not top(stack): pc = bytecode[pc++] elif instr == JUMP_BACK: pc = bytecode[pc++] elif instr == CALL: target = bytecode[pc++] r = interp(stack, target) push(r) elif .. def sum(x): if x < 1: return 1 else: return n + sum(n-1) v4 = list_read(v3, v0) v5 = add(v0, 1) guard_eq(v4, LT) v6 = add(v5, 1) v7 = list_pop(v1) v8 = list_pop(v1) guard_type(v7, int) guard_type(v8, int) guard_not_less_than(v7, v8) list_append(v1, False) v10 = list_read(v3, v0) guard_eq(v10, DUP) ... guard_eq(v18, SUB) ... guard_ed(v26, CALL) ... (inlined) ... guard_eq(v34, DUP) ... auard_eq(v42, SUB) ... ... (stop inlining) ... ... guard_eq(v58, CALL) v1092 = call("sum", ..) ... finish(v2000) Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 1/6
  41. How to Reduce the RPython’s Compilation Time RPython consumes time

    on .. • inlining a user program’s function call Initial trace v4 = list_read(v3, v0) v5 = add(v0, 1) guard_eq(v4, LT) v6 = add(v5, 1) v7 = list_pop(v1) v8 = list_pop(v1) guard_type(v7, int) guard_type(v8, int) guard_not_less_than(v7, v8) list_append(v1, False) v10 = list_read(v3, v0) guard_eq(v10, DUP) ... guard_eq(v18, SUB) ... guard_ed(v26, CALL) ... (inlined) ... guard_eq(v34, DUP) ... guard_eq(v42, SUB) ... ... (stop inlining) ... ... guard_eq(v58, CALL) v1092 = call("sum", ..) ... finish(v2000) Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 2/6
  42. How to Reduce the RPython’s Compilation Time RPython consumes time

    on .. • inlining a user program’s function call • optimizing the initial trace Optimized trace (constants and lists are folded) v6 = list_pop(v1) guard_type(v6, int) guard_not_less_than(v6, 1) v16 = dict_get(v2, "n") guard_type(v16, int) v21 = int_sub(v16, 1) ... (inlined) ... finish(v64) Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 2/6
  43. How to Reduce the RPython’s Compilation Time RPython consumes time

    on .. • inlining a user program’s function call • optimizing the initial trace • recompiling after guard failure (if failed many times) Optimized trace (constants and lists are folded) v6 = list_pop(v1) guard_type(v6, int) guard_not_less_than(v6, 1) v16 = dict_get(v2, "n") guard_type(v16, int) v21 = int_sub(v16, 1) ... (inlined) ... finish(v64) Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 2/6
  44. How to Reduce the RPython’s Compilation Time Issue: how to

    reduce compilation time in RPython Optimization / Code gen. Tracing RPython traces recursions Recompilation at guard failing Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 3/6
  45. How to Reduce the RPython’s Compilation Time Issue: how to

    reduce compilation time in RPython Optimization / Code gen. Tracing RPython traces recursions Recompilation at guard failing Leave CALL instead Lightweight compilation Method-based compilation Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 3/6
  46. How to Reduce the RPython’s Compilation Time Issue: how to

    reduce compilation time in RPython Optimization / Code gen. Tracing RPython traces recursions Recompilation at guard failing Leave CALL instead Lightweight compilation Method-based compilation Shallow Tracing Traversal stack Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 3/6
  47. Shallow Tracing Leaves CALLs and Doesn’t Exec. Bodies • @dont_look_inside

    : leaves a call to the decorated fun at tracing − but executes the body at tracing • if dummy: return : skips executing the body during tracing − dummy turns into False after tracing while True: instr = bytecode[pc++] if instr == INC: handler_INC(stack) elif instr == CALL: handler_CALL(stack) elif ... ... ... @dont_look_inside def handler_INC(stack,dummy=True): if dummy: return x = stack[sp--] z = add(x, 1) stack[sp++] = z @dont_look_inside def handler_CALL(stack,dummy=True): if dummy: return r = interp(stack, ..) push(r, stack) # INC call(handler_INC, ..) # CALL call(handler_CALL, ..) ... Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 4/6
  48. Shallow Tracing is Implemented as Decorator • @dont_look_inside : leaves

    a call to the decorated fun at tracing − but executes the body at tracing • if dummy: return : skips executing the body during tracing − dummy turns into False after tracing while True: instr = bytecode[pc++] if instr == INC: handler_INC(stack) elif instr == CALL: handler_CALL(stack) elif ... ... ... @enable_shallow_tracing def handler_INC(stack): x = stack[sp--] z = add(x, 1) stack[sp++] = z @enable_shallow_tracing def handler_CALL(stack): r = interp(stack, ..) push(r) # INC call(handler_INC, ..) # CALL call(handler_CALL, ..) ... Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 5/6
  49. Realize Next Level by One Step “Deeper” Shallow Tracing •

    Fold stack manipulations in shallow tracing while True: instr = bytecode[pc++] if instr == INC: handler_INC(stack) elif instr == CALL: handler_CALL(stack) elif ... ... ... def handler_INC(stack): x = stack[sp--] z = add(x, 1) stack[sp++] = z @enable_shallow_tracing def add(x, y): return x + y L0: x1 = stack[sp--] z2 = add(x1, 1) stack[sp++] = z2 guard_true(..) x3 = stack[sp--] z4 = add(x3, 1) stack[sp++] = z4 jump(L0) L1: ... finish(..) Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 6/6
  50. Realize Next Level by One Step “Deeper” Shallow Tracing •

    Fold stack manipulations in shallow tracing while True: instr = bytecode[pc++] if instr == INC: handler_INC(stack) elif instr == CALL: handler_CALL(stack) elif ... ... ... def handler_INC(stack): x = stack[sp--] z = add(x, 1) stack[sp++] = z @enable_shallow_tracing def add(x, y): return x + y L0: x1 = stack[sp--] z2 = add(x1, 1) stack[sp++] = z2 guard_true(..) x3 = stack[sp--] z4 = add(x3, 1) stack[sp++] = z4 jump(L0) L1: ... finish(..) z4 = add(z2, 1) Interpreter Taming to Realize Multiple Compilations in a Meta-Tracing JIT Compiler Framework MoreVMs’23 6/6