Slide 1

Slide 1 text

Bringing Concurrency to Ruby Charles Oliver Nutter @headius

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Me, about 40 weeks of the year

Slide 4

Slide 4 text

Me, the other 12 weeks of the year

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

Concurrency

Slide 11

Slide 11 text

Parallelism?

Slide 12

Slide 12 text

Concurrency • Two or more jobs • Making progress • Over a given time span

Slide 13

Slide 13 text

Parallelism • Two or more computations • Executing at the same moment in time

Slide 14

Slide 14 text

Examples • Thread APIs: concurrency • Actore APIs: concurrency • Native thread, process: parallelism • If the underlying system supports it • SIMD, GPU, vector operations: parallelism

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

You Need Both • Work that can split into concurrent jobs • Platform that runs those jobs in parallel • In an ideal world, scales with job count • In our world, each job adds overhead

Slide 19

Slide 19 text

Process-level Concurrency • Separate processes running concurrently • As parallel as OS/CPU can make them • Low risk due to isolated memory space • High memory requirements • High communication overhead

Slide 20

Slide 20 text

Thread-level Concurrency • Threads in-process running concurrently • As parallel as OS/CPU can make them • Higher risk due to shared memory space • Lower memory requirements • Low communication overhead

Slide 21

Slide 21 text

Popular Platforms Concurrency Parallelism GC Notes MRI 1.8.7 ✔ ✘ Single thread, stop- the-world Large C core would need much work MRI 1.9+ ✔ ✘ Single thread, stop- the-world Few changes since 1.9.3 JRuby (JVM) ✔ ✔ Many concurrent and parallel options JVM is the “best” platform for conc Rubinius ✔ ✔ Single thread, stop- the-world, partial concurrent old gen Promising, but a long road ahead Topaz ✘ ✘ Single thread, stop- the-world Incomplete impl Node.js (V8) ✘ ✘ Single thread, stop- the-world No threads in JS CPython ✔ ✘ Reference-counting Reference counting kills parallelism Pypy ✔ ✘ Single thread, stop- the-world Exploring STM to enable concurrency

Slide 22

Slide 22 text

Idle System

Slide 23

Slide 23 text

No content

Slide 24

Slide 24 text

MRI 1.8.7, One Thread

Slide 25

Slide 25 text

MRI 1.8.7, Eight Threads

Slide 26

Slide 26 text

Timeslicing Thread 1 Thread 2 Thread 3 Thread 4 Native thread Native thread Native thread Native thread “Green” or “virtual” or “userspace” threads share a single native thread. The CPU then schedules that thread on available CPUs. Time’s up Time’s up Time’s up

Slide 27

Slide 27 text

MRI 1.8.7, Eight Threads

Slide 28

Slide 28 text

MRI 1.9.3+, Eight Threads

Slide 29

Slide 29 text

GVL: Global VM Lock Thread 1 Thread 2 Thread 3 Thread 4 CPU CPU CPU CPU In 1.9+, each thread gets its own native thread, but a global lock prevents concurrent execution. Time slices are finer grained and variable, but threads still can’t run in parallel. Lock xfer CPU Lock xfer Lock xfer

Slide 30

Slide 30 text

GVL: Global Venue Lock

Slide 31

Slide 31 text

MRI 1.9.3+, Eight Threads

Slide 32

Slide 32 text

MRI 1.9.3+, Eight Threads

Slide 33

Slide 33 text

JRuby, One Thread

Slide 34

Slide 34 text

Why Do We See Parallelism? • Hotspot JVM has many background threads • GC with concurrent and parallel options • JIT threads • Signal handling • Monitoring and management

Slide 35

Slide 35 text

JRuby, One Thread

Slide 36

Slide 36 text

JRuby, Eight Threads

Slide 37

Slide 37 text

Time Matters Too 0 1.75 3.5 5.25 7 Time per iteration MRI 1.8.7 MRI 1.9.3 JRuby Nearly 10x faster than 1.9.3

Slide 38

Slide 38 text

Rules of Concurrency 1. Don’t do it, if you don’t have to. 2. If you must do it, don’t share data. 3. If you must share data, make it immutable. 4. If it must be mutable, coordinate all access.

Slide 39

Slide 39 text

#1: Don’t • Many problems won’t benefit • Explicitly sequential things, e.g • Bad code can get worse • Multiply perf, GC, alloc overhead by N • Fixes may not be easy (esp. in Ruby) • The risks can get tricky to address

Slide 40

Slide 40 text

No content

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

I’m Not Perfect • Wrote a naive algorithm • Measured it taking N seconds • Wrote the concurrent version • Measured it taking roughly N seconds • Returned to original to optimize

Slide 45

Slide 45 text

Fix Single-thread First! Time in seconds 0 5 10 15 20 big_list time v1 v2 v3 v4 String slice instead of unpack/pack Simpler loops Stream from file

Slide 46

Slide 46 text

Time in seconds 0 17.5 35 52.5 70 Processing 23M word file Non-threaded Two threads Four threads

Slide 47

Slide 47 text

Before Conc Work • Fix excessive allocation (and GC) • Fix algorithmic complexity • Test on the runtime you want to target • If serial perf is still poor after optimization, the task, runtime, or system may not be appropriate for a concurrent version.

Slide 48

Slide 48 text

Concurrency won’t help code that’s using up all hardware resources.

Slide 49

Slide 49 text

#2: Don’t Share Data • Process-level concurrency • …have to sync up eventually, though • Threads with their own data objects • Rails request objects, e.g. • APIs with a “master” object, usually • Weakest form of concurrency

Slide 50

Slide 50 text

#3: Immutable Data • In other words… • Data can be shared • Threads can pass it around safely • Cross-thread view of data can’t mutate • Threads can’t see concurrent mutations as they happen, avoiding data races

Slide 51

Slide 51 text

Object#freeze • Simplest mechanism for immutability • For read-only: make changes, freeze • Read-mostly: dup, change, freeze, replace • Write-mostly: same, but O(n) complexity

Slide 52

Slide 52 text

Immutable Data Structure • Designed to avoid visible mutation but still have good performance characteristics • Copy-on-write is poor-man’s IDS • Better: persistent data structures like Ctrie http://en.wikipedia.org/wiki/Ctrie

Slide 53

Slide 53 text

Persistent? • Collection you have a reference to is guaranteed never to change • Modifications return a new reference • …and only duplicate affected part of trie

Slide 54

Slide 54 text

Hamster • Pure-Ruby persistent data structures • Set, List, Stack, Queue, Vector, Hash • Based on Clojure’s Ctrie collections • https://github.com/hamstergem/hamster

Slide 55

Slide 55 text

person = Hamster.hash(! :name => “Simon",! :gender => :male)! # => {:name => "Simon", :gender => :male}! ! person[:name]! # => "Simon"! person.get(:gender)! # => :male! ! friend = person.put(:name, "James")! # => {:name => "James", :gender => :male}! person! # => {:name => "Simon", :gender => :male}! friend[:name]! # => "James"! person[:name]! # => "Simon"

Slide 56

Slide 56 text

Coming Soon • Reimplementation by Smit Shah • Mostly “native” impl of Ctrie • Considerably better perf than Hamster • https://github.com/Who828/persistent_data_structures

Slide 57

Slide 57 text

Other Techniques • Known-immutable data like Symbol, Fixnum • Mutate for a while, then freeze • Hand-off: if you pass mutable data, assume you can’t mutate it anymore • Sometimes enforced by runtime, e.g. “thread-owned objects”

Slide 58

Slide 58 text

#4: Synchronize Mutation • Trickiest to get right; usually best perf • Fully-immutable generates lots of garbage • Locks, atomics, and specialized collections

Slide 59

Slide 59 text

Locks • Avoid concurrent operations • Read + write, in general • Many varieties: reentrant, read/write • Many implementations

Slide 60

Slide 60 text

Mutex • Simplest form of lock • Acquire, do work, release • Not reentrant semaphore = Mutex.new! ...! a = Thread.new {! semaphore.synchronize {! # access shared resource! }! }

Slide 61

Slide 61 text

ConditionVariable • Release mutex temporarily • Signal others waiting on the mutex • …and be signaled • Similar to wait/notify/notifyAll in Java

Slide 62

Slide 62 text

resource = ConditionVariable.new! ! a = Thread.new {! mutex.synchronize {! # Thread 'a' now needs the resource! resource.wait(mutex)! # 'a' can now have the resource! }! }! ! b = Thread.new {! mutex.synchronize {! # Thread 'b' has finished using the resource! resource.signal! }! }!

Slide 63

Slide 63 text

Monitor • Reentrancy • “try” acquire • Mix-in for convenience • Java synchronization = CondVar + Monitor

Slide 64

Slide 64 text

Monitor require 'monitor'! ! lock = Monitor.new! lock.synchronize do! # exclusive access! end

Slide 65

Slide 65 text

Monitor require 'monitor'! ! class SynchronizedArray < Array! ! include MonitorMixin! ! alias :old_shift :shift! ! def shift(n=1)! self.synchronize do! self.old_shift(n)! end! end! ...

Slide 66

Slide 66 text

Atomics • Without locking… • …replace a value only if unchanged • …increment, decrement safely • Thread-safe code can use atomics instead of locks, usually with better performance

Slide 67

Slide 67 text

atomic • Atomic operations for Ruby • https://github.com/headius/ruby-atomic

Slide 68

Slide 68 text

require 'atomic'! ! my_atomic = Atomic.new(0)! my_atomic.value! # => 0! my_atomic.value = 1! my_atomic.swap(2)! # => 1! my_atomic.compare_and_swap(2, 3)! # => true, updated to 3! my_atomic.compare_and_swap(2, 3)! # => false, current is not 2

Slide 69

Slide 69 text

Specialized Collections • thread_safe gem • Fully-synchronized Array and Hash • Atomic-based hash impl (“Cache”) • java.util.concurrent • Numerous tools for concurrency

Slide 70

Slide 70 text

Queues • Thread-safe Queue and SizedQueue • Pipeline data to/from threads • Standard in all Ruby impls

Slide 71

Slide 71 text

thread_count = (ARGV[2] || 1).to_i! queue = SizedQueue.new(thread_count * 4)! ! word_file.each_line.each_slice(50) do |words|! queue << words! end! queue << nil # terminating condition

Slide 72

Slide 72 text

threads = thread_count.times.map do |i|! Thread.new do! while true! words = queue.pop! if words.nil? # terminating condition! queue.shutdown! break! end! words.each do |word|! # analyze the word

Slide 73

Slide 73 text

Putting It All Together • These are a lot of tools to sort out • Others have sorted them out for you

Slide 74

Slide 74 text

Celluloid • Actor model implementation • OO/Ruby sensibilities • Normal classes, normal method calls • Async support • Growing ecosystem • Celluloid-IO and DCell (distributed actors) • https://github.com/celluloid/celluloid

Slide 75

Slide 75 text

class Sheen! include Celluloid! ! def initialize(name)! @name = name! end! ! def set_status(status)! @status = status! end! ! def report! "#{@name} is #{@status}"! end! end

Slide 76

Slide 76 text

>> charlie = Sheen.new "Charlie Sheen"! => #> charlie.set_status "winning!"! => "winning!"! >> charlie.report! => "Charlie Sheen is winning!"! >> charlie.async.set_status "asynchronously winning!"! => nil! >> charlie.report! => "Charlie Sheen is asynchronously winning!"

Slide 77

Slide 77 text

Sidekiq • Simple, efficient background processing • Think Resque or DelayedJob but better • Normal-looking Ruby class is the job • Simple call to start it running in background • http://mperham.github.io/sidekiq/

Slide 78

Slide 78 text

class HardWorker! include Sidekiq::Worker! ! def perform(name, count)! puts 'Doing hard work'! end! end! ! ...later, in a controller...! ! HardWorker.perform_async('bob', 5)

Slide 79

Slide 79 text

Concurrent Ruby • Grab bag of concurrency patterns • Actor, Agent, Channel, Future, Promise, ScheduledTask, TimerTask, Supervisor • Thread pools, executors, timeouts, conditions, latches, atomics • May grow into a central lib for conc stuff • https://github.com/jdantonio/concurrent-ruby

Slide 80

Slide 80 text

…all the examples I’ve shown you and more

Slide 81

Slide 81 text

Recap • The future of Ruby is concurrent • The tools are there to help you • Let’s all help move Ruby forward

Slide 82

Slide 82 text

Thank you! • Charles Oliver Nutter • [email protected] • @headius