Slide 1

Slide 1 text

Efficient and Thread-Safe Objects for Dynamically-Typed Languages Benoit Daloze Stefan Marr Daniele Bonetta Hanspeter Mössenböck

Slide 2

Slide 2 text

Introduction We are in the multi-core era, but: Dynamic languages have poor support for parallel execution (e.g.: Ruby, Python, JavaScript, . . . ) Object models are not thread-safe or inefficient Allow adding or removing fields at run time 2 / 35

Slide 3

Slide 3 text

How is this executed? @field @field = value 3 / 35

Slide 4

Slide 4 text

How is this executed? @field @field = value . . . when done concurrently on the same object? 3 / 35

Slide 5

Slide 5 text

A simple class class Foo def a @a end def a=(v) @a = v end def b @b end def b=(v) @b = v end end 4 / 35

Slide 6

Slide 6 text

What could go wrong? obj = Foo.new Thread.new { obj.a = "a" } Thread.new { obj.b = "b" obj.fields # => [:a, :b] OK obj.b # => "b" OK } 5 / 35

Slide 7

Slide 7 text

What could go wrong? obj = Foo.new Thread.new { obj.a = "a" } Thread.new { obj.b = "b" obj.fields # => [:a] ?? obj.b # => nil ?? } 6 / 35

Slide 8

Slide 8 text

What could go wrong? obj = Foo.new Thread.new { obj.a = "a" } Thread.new { obj.b = "b" obj.fields # => [:b] OK obj.b # => "a" ?? } 7 / 35

Slide 9

Slide 9 text

Outline Objects Models The Problems One Solution Performance 8 / 35

Slide 10

Slide 10 text

Objects Models Objects Models The Problems One Solution Performance 9 / 35

Slide 11

Slide 11 text

The Truffle Object Storage Model Based on maps from the SELF programming language An Efficient Implementation of SELF, a Dynamically-Typed Object-Oriented Language Based on Prototypes. C. Chambers, D. Ungar & E. Lee., 1991. 10 / 35

Slide 12

Slide 12 text

The Truffle Object Storage Model An Object Storage Model for the Truffle Language Implementation Framework A. Wöß, C. Wirth, D. Bonetta, C. Seaton, C. Humer & H. Mössenböck, 2014. 11 / 35

Slide 13

Slide 13 text

The Truffle Object Storage Model An Object Storage Model for the Truffle Language Implementation Framework A. Wöß, C. Wirth, D. Bonetta, C. Seaton, C. Humer & H. Mössenböck, 2014. 12 / 35

Slide 14

Slide 14 text

The Problems Objects Models The Problems One Solution Performance 13 / 35

Slide 15

Slide 15 text

The 3 Safety Problems Lost Field Definitions Out-Of-Thin-Air Values Lost Field Updates 14 / 35

Slide 16

Slide 16 text

Lost Field Definitions 15 / 35

Slide 17

Slide 17 text

Out-Of-Thin-Air Values 16 / 35

Slide 18

Slide 18 text

Lost Field Updates 17 / 35

Slide 19

Slide 19 text

Defining a new field Grow the object storage (allocate, copy, update pointer) obj.storage = copy(obj.storage, size+1) and write the value: obj.storage[size-1] = value Update the Shape pointer: obj.shape = newShape Two reference fields cannot be read and written atomically, unless using synchronization! 18 / 35

Slide 20

Slide 20 text

Can we just synchronize field updates? Writing to a field and loop 0 50 100 150 200 250 300 30 290 Median time per 10M writes (ms) Unsafe Synchronized 19 / 35

Slide 21

Slide 21 text

One Solution Objects Models The Problems One Solution Performance 20 / 35

Slide 22

Slide 22 text

Local and Shared Objects 21 / 35

Slide 23

Slide 23 text

Local and Shared Objects 22 / 35

Slide 24

Slide 24 text

Synchronize only on shared objects writes Choices: Synchronize only on shared objects writes Unsynchronized reads on shared objects Motivation: Reads are more frequent than writes on shared objects 28× more frequent in concurrent DaCapo benchmarks! A Black-box Approach to Understanding Concurrency in DaCapo. T. Kalibera, M. Mole, R. Jones, and J. Vitek, 2012. 23 / 35

Slide 25

Slide 25 text

One Solution: synchronize on shared objects Lost Field Definitions and Updates Synchronize writes, but only on shared objects Local objects need no synchronization Out-Of-Thin-Air Values Different storage locations for each field: A storage location of an object is only ever used for one field 24 / 35

Slide 26

Slide 26 text

Tracking the set of shared objects All globally-reachable objects are initially shared, transitively Write to shared object =⇒ share value, transitively # Share 1 Array, 1 Object, 1 Hash and 1 String shared_obj.field = [Object.new, { "a" => 1 }] 25 / 35

Slide 27

Slide 27 text

Sharing: writing to a field of a shared object void share(DynamicObject object) { if (!isShared(obj.shape)) { object.shape = sharedShape(obj.shape); for (location : obj.getObjectLocations()) { share(location.get(obj)); // recursive call } } } void writeBarrier(DynamicObject sharedObject, Object value) if (value instanceof DynamicObject) { share(value); } synchronized (sharedObject) { location.set(sharedObject, value); } } 26 / 35

Slide 28

Slide 28 text

Sharing a Rectangle containing two Points shared_obj.field = Rectangle.new( Point.new(1, 2), Point.new(4, 3)) 27 / 35

Slide 29

Slide 29 text

Optimized Sharing for a Rectangle and two Points Compiled with Truffle: Self-optimizing AST interpreters. T. Würthinger, A. Wöß, L. Stadler, G. Duboscq, D. Simon & C. Wimmer, 2012. 28 / 35

Slide 30

Slide 30 text

Optimized Sharing result after Partial Evaluation void shareRectangle(DynamicObject rect) { if (rect.shape == localRectangleShape) { rect.shape = sharedRectangleShape; } else { /* Deoptimize */ } DynamicObject tl = rect.object1; if (tl.shape == localPointShape) { tl.shape = sharedPointShape; } else { /* Deoptimize */ } DynamicObject br = rect.object2; if (br.shape == localPointShape) { br.shape = sharedPointShape; } else { /* Deoptimize */ } } 29 / 35

Slide 31

Slide 31 text

Performance Objects Models The Problems One Solution Performance 30 / 35

Slide 32

Slide 32 text

Performance: Are we fast yet? q q q q MRI 2.3 JRuby 9.0.4 Node.js JRuby+Truffle Java 1.8.0u66 1 5 10 25 50 75 Cross-Language Compiler Benchmarking: Are We Fast Yet? S. Marr, B. Daloze, H. Mössenböck, 2016. 31 / 35

Slide 33

Slide 33 text

Impact on Sequential Performance Peak performance, normalized to Unsafe, lower is better q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.0 0.5 1.0 1.5 2.0 2.5 Bounce DeltaBlue JSON List NBody Richards Towers Unsafe Safe All Shared All Shared synchronizes on all object writes. All object-related benchmarks from Cross-Language Compiler Benchmarking: Are We Fast Yet? S. Marr, B. Daloze, H. Mössenböck, 2016. 32 / 35

Slide 34

Slide 34 text

Performance for Parallel Actor Benchmarks q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.0 0.5 1.0 1.5 2.0 2.5 APSP RadixSort Trapezoidal Runtime normalized to Unsafe (lower is better) Scala Akka Unsafe Safe No Deep Sharing Benchmarks from Savina – An Actor Benchmark Suite. S. Imam & V. Sarkar, 2014. 33 / 35

Slide 35

Slide 35 text

Conclusion Concurrently growing objects need synchronization to not lose updates or new fields Distinguish local/shared objects reduces overhead Only synchronize on shared object writes Needs a write barrier (can be specialized) Thread-safe objects in dynamic languages Zero cost on sequential peak performance Low overhead on parallel code 34 / 35

Slide 36

Slide 36 text

Efficient and Thread-Safe Objects for Dynamically-Typed Languages Benoit Daloze Stefan Marr Daniele Bonetta Hanspeter Mössenböck