Pro Yearly is on sale from $80 to $50! »

Making Collections in Dynamic Languages Thread-Safe and Efficient

0ea7f61aec8fee539be0cf39b7bab77c?s=47 Benoit Daloze
September 29, 2017

Making Collections in Dynamic Languages Thread-Safe and Efficient

Built-in collections in dynamic programming languages are either inefficient or thread-unsafe. In this talk, I will show an approach to make the two built-in collections in Ruby (Array and Hash) both as efficient as unsynchronized implementations and yet thread-safe as if they were always synchronized. Moreover, we show that common operations on these collections can scale to many cores.

0ea7f61aec8fee539be0cf39b7bab77c?s=128

Benoit Daloze

September 29, 2017
Tweet

Transcript

  1. Concurrent Storage Strategies: Making Collections in Dynamic Languages Thread-Safe and

    Efficient Benoit Daloze Arie Tal Stefan Marr Hanspeter Mössenböck Erez Petrank
  2. Introduction We are in the multi-core era, but: Dynamically-typed languages

    have poor support for parallel execution (e.g.: Ruby, Python, JavaScript, . . . ) Built-in collections are either inefficient or thread-unsafe 2 / 33
  3. Built-in collections Implem. Synchronization on collections CRuby Global Interpreter Lock

    =⇒ no parallelism CPython Global Interpreter Lock =⇒ no parallelism Jython synchronized =⇒ slow single-threaded, no scaling JRuby No synchronization =⇒ unsafe with multiple threads Nashorn No synchronization =⇒ unsafe with multiple threads 3 / 33
  4. Appending concurrently array = [] # Create 100 threads 100.times.map

    { Thread.new { # Append 1000 integers to the array 1000.times { |i| array << i } } }.each { |thread| thread.join } puts array.size 4 / 33
  5. Appending concurrently MRI/CRuby, the reference implementation with a GIL: ruby

    append.rb 100000 5 / 33
  6. Appending concurrently MRI/CRuby, the reference implementation with a GIL: ruby

    append.rb 100000 JRuby, on the JVM with concurrent threads: jruby append.rb 64324 5 / 33
  7. Appending concurrently MRI/CRuby, the reference implementation with a GIL: ruby

    append.rb 100000 JRuby, on the JVM with concurrent threads: jruby append.rb 64324 # If you are not lucky ConcurrencyError: Detected invalid array contents due to unsynchronized modifications with concurrent users << at org/jruby/RubyArray.java:1256 block at append.rb:8 zsh: exit 1 5 / 33
  8. Appending concurrently TruffleRuby, on top of GraalVM with concurrent threads:

    truffleruby append.rb 77148 # If you are not lucky append.rb:8:in ’<<’: 1338 (RubyTruffleError) ArrayIndexOutOfBoundsException IntegerArrayMirror.set from append.rb:8:in ’block (2 levels) in <main>’ zsh: exit 1 6 / 33
  9. Appending concurrently TruffleRuby, with Thread-Safe Collections: truffleruby-safe append.rb 100000 7

    / 33
  10. Ruby built-in collections Array (a stack, a queue, a deque,

    set-like operations) Hash (compare keys by #hash + #eql? or by identity) String (mutable) That’s all! 8 / 33
  11. Goals Dynamic languages have few but very versatile built-in collections

    Enables a programming style that does not require so many upfront decisions (e.g.: choosing a collection implementation) Use them for both single-threaded and multi-threaded workloads The collections should be efficient The collections should scale when used concurrently 9 / 33
  12. Outline Tracking Sharing Concurrent Arrays Performance 10 / 33

  13. Tracking Sharing Tracking Sharing Concurrent Arrays Performance 11 / 33

  14. Local and Shared Objects Only synchronize on objects which are

    accessed concurrently Expensive to track exactly, so we make an over-approximation: track all objects which can be accessed concurrently, based on reachability 12 / 33
  15. Local and Shared Objects Efficient and Thread-Safe Objects for Dynamically-Typed

    Languages. B. Daloze, S. Marr, D. Bonetta, H. Mössenböck, OOPSLA’16. 13 / 33
  16. Local and Shared Objects Efficient and Thread-Safe Objects for Dynamically-Typed

    Languages. B. Daloze, S. Marr, D. Bonetta, H. Mössenböck, OOPSLA’16. 14 / 33
  17. Extending Sharing to Collections Collections are objects, they can track

    sharing the same way Shared collections use a write barrier when adding an element to the collection shared_array[3] = Object.new shared_hash["foo"] = "bar" Collections can change their representation when shared 15 / 33
  18. Impact on Single-Threaded Performance Peak performance, normalized to TruffleRuby, lower

    is better 0.9 1.0 1.1 1.2 1.3 1.4 Bounce List Mandelbrot NBody Permute Queens Sieve Storage Towers DeltaBlue Json Richards Benchmark TruffleRuby TruffleRuby with Concurrent Collections No difference because these benchmarks do not use shared collections. Benchmarks from Cross-Language Compiler Benchmarking: Are We Fast Yet? S. Marr, B. Daloze, H. Mössenböck, DLS’16. 16 / 33
  19. Concurrent Arrays Tracking Sharing Concurrent Arrays Performance 17 / 33

  20. Array storage strategies empty int[] long[] Object[] double[] store int

    store double store long store Object store Object store Object array = [] # empty array << 1 # int[] array << "foo" # Object[] 18 / 33
  21. A Closer Look at Array class RubyArray { // null,

    int[], long[], double[] or Object[] Object storage; // Invariant: size <= storage.length int size; } 19 / 33
  22. Concurrent Arrays Goals: Each Array operation should appear atomic Keep

    the compact representation of storage strategies Scale concurrent reads and writes, as they are frequent in many usages 20 / 33
  23. Concurrent Array Strategies SharedFixedStorage Object[] SharedFixedStorage double[] SharedFixedStorage long[] empty

    int[] long[] Object[] double[] storage strategies concurrent strategies store int store double store long store Object store Object store Object SharedDynamicStorage empty int[] long[] Object[] double[] SharedFixedStorage int[] internal storage change: <<, delete, etc storage transition on sharing 21 / 33
  24. SharedFixedStorage Assumes the storage (e.g. int[16]) does not need to

    change =⇒ Array size and type of the elements fits the storage If so, the Array can be accessed without any synchronization, in parallel and without any overhead (except the write barrier) 22 / 33
  25. Migrating to SharedDynamicStorage What if we need to change the

    storage? $array = [1, 2, 3] # SharedFixedStorage # All of these migrate to SharedDynamicStorage $array[1] = Object.new $array << 4 $array.delete_at(1) We use a Guest-Language Safepoint to migrate to SharedDynamicStorage 23 / 33
  26. SharedDynamicStorage SharedDynamicStorage uses a lock to synchronize operations To keep

    scalability when writing on different parts of the Array, an exclusive lock or a read-write lock is not enough. We use a Layout Lock: read, writes and layout changes Layout Lock: A Scalable Locking Paradigm for Concurrent Data Layout Modifications. N. Cohen, A. Tal, E. Petrank, PPoPP’17 24 / 33
  27. Performance Tracking Sharing Concurrent Arrays Performance 25 / 33

  28. Array Benchmarks All threads work on a single Array Each

    thread has its own section of the Array With 6 different synchronization mechanisms: SharedFixedStorage: no synchronization ReentrantLock, Synchronized: from Java StampedLock: a read-write lock LayoutLock: scalable reads and writes and layout changes LightweightLayoutLock: an improvement over LayoutLock 26 / 33
  29. Scalability of Array Reads 0 5 10 15 20 12

    4 8 12 16 202224 28 32 36 40 44 Threads Throughput LightweightLayoutLock SharedFixedStorage LayoutLock ReentrantLock StampedLock Synchronized Throughput in billions of accesses per second. 27 / 33
  30. Scalability of Array with 50% reads/50% writes 0 2 4

    6 12 4 8 12 16 202224 28 32 36 40 44 Threads Throughput LightweightLayoutLock SharedFixedStorage LayoutLock ReentrantLock StampedLock Synchronized Throughput in billions of accesses per second. 28 / 33
  31. Scalability of Array Appends 0 10 20 30 12 4

    8 12 16 202224 28 32 36 40 44 Threads Throughput LightweightLayoutLock LayoutLock ReentrantLock StampedLock Synchronized Appends are considered layout changes and use the exclusive lock. Throughput in millions of appends per second. 29 / 33
  32. PyPy’s Parallel Mandelbrot 30 / 33

  33. PyPy’s Parallel Mandelbrot 0 4 8 12 16 20 24

    28 32 12 4 8 12 16 202224 28 32 36 40 44 Threads Scalability relative to Local SharedFixedStorage Local Parallelized by distributing 64 groups of rows between threads dynamically using a global queue. 31 / 33
  34. Scalability of Hash 0 200 400 600 12 4 8

    12 16 202224 28 32 36 40 44 Threads Throughput LightweightLayoutLock LayoutLock Local ReentrantLock StampedLock 80% lookups, 10% puts, 10% removes over a range of 65536 keys Throughput in millions of operations per second. 32 / 33
  35. Conclusion Standard built-in collections in dynamic languages can be thread-safe

    and yet as efficient as unsynchronized collections We can make Array and Hash scale up to 44 cores linearly with SharedFixedStorage and the Lightweight Layout Lock We enable parallel programming with the existing built-in collections, not requiring upfront decisions by the programmer (e.g.: choosing a collection implementation based on concurrency or usage) 33 / 33