Slide 1

Slide 1 text

DEAR GOD WHAT AM I DOING? CONCURRENCY AND PARALLEL PROCESSING Adam Hawkins - tw://adman65 - gh://twinturbo

Slide 2

Slide 2 text

Who Is this Talk for? There’s been a lot of talks on performance related things: Yesterday we had Immutable Ruby. Today we had Charles talking about JRuby optimizations. So you’re already primed for this one.

Slide 3

Slide 3 text

When you type Thread or fork you feel like this

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

A Brief Introduction to Machine Architecture

Slide 6

Slide 6 text

We Can Model this Complex System with the Following Diagram

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

hamster :: thread

Slide 10

Slide 10 text

wheel :: process hamster :: thread

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

Let’s Get Serious

Slide 14

Slide 14 text

The eternal question: How can I make code faster?

Slide 15

Slide 15 text

No content

Slide 16

Slide 16 text

Do Multiple Things at Once

Slide 17

Slide 17 text

Three Primitives • Processes: separate memory, separate everything. Scheduled by the kernel. • Threads: combine to make processes. Scheduled by the kernel. • Fibers: Like Threads. Scheduled by the programmer. 4KB stack.

Slide 18

Slide 18 text

TL;DS • Kernel decides which process to run (which may have multiple threads) • Processes or threads may block causing the scheduler to select another thread/process for execution • I/O is the most common blocking operation

Slide 19

Slide 19 text

Quickly • Threads behave differently according the platform (JRuby vs MRI vs Rubinius) • Ruby Thread classes are backed by native threads on 1.9 • Green threads prior to 1.9 (aka simulated threads)

Slide 20

Slide 20 text

Threads* *easiest and quickest win

Slide 21

Slide 21 text

require 'thread' (0..5).each do |i| Thread.new do puts "Hello from thread: #{i}" end end Ask what this code will do. Question: what will this code output?

Slide 22

Slide 22 text

require 'thread' threads = (0..5).map do |i| Thread.new do puts "Hello from thread: #{i}" end end threads.map(&:join)

Slide 23

Slide 23 text

$ ruby joining_threads.rb Hello from thread: 1 Hello from thread: 2 Hello from thread: 4 Hello from thread: 0 Hello from thread: 3 Hello from thread: 5

Slide 24

Slide 24 text

Order is Nondeterministic $ ruby joining_threads.rb Hello from thread: 5 Hello from thread: 2 Hello from thread: 1 Hello from thread: 0 Hello from thread: 3 Hello from thread: 4

Slide 25

Slide 25 text

Shared Memory require 'thread' balance = 100 interest = Thread.new do while true sleep 0.1 balance = balance * 1.025 end end while balance < 200 sleep 0.25 puts "Banking: #{balance}" end Question: What’s wrong with this code?

Slide 26

Slide 26 text

Straight Up Locks require 'thread' lock = Mutex.new balance = 100 Thread.new do while true sleep 0.1 lock.synchronize do balance = balance * 1.025 end end end while balance < 200 lock.synchronize do puts "Balance: #{balance}" sleep 1 end end

Slide 27

Slide 27 text

Blocking notice we haven’t done anything that could block. Only simple math operations.

Slide 28

Slide 28 text

I/O Blocks

Slide 29

Slide 29 text

Processes

Slide 30

Slide 30 text

fork

Slide 31

Slide 31 text

balance = 100 pid = fork do while true sleep 0.5 balance = balance * 1.0125 puts "Child Balance: #{balance}" end end # parent if pid while true do sleep 0.5 puts "Parent Balance: #{balance}" end end What’s wrong with this code? What happens to balance? What happens when both processes need to access the balance? inter-process locks

Slide 32

Slide 32 text

Memory is Not Shared $ ruby forking_example.rb Child Balance: 101.25 Parent Balance: 100 Child Balance: 102.515625 Parent Balance: 100 Child Balance: 103.7970703125 Parent Balance: 100 Child Balance: 105.09453369140624 Parent Balance: 100 Child Balance: 106.40821536254882 Parent Balance: 100

Slide 33

Slide 33 text

Fibers, TL;DR Fibers are like threads, not so important for this talk

Slide 34

Slide 34 text

Making Things Faster

Slide 35

Slide 35 text

Challenge: Fetch100 pages of Ruby search results as fast as possible The fetching is going to use HTTP (thusly IO) which ruby’s thread scheduler will optimize.

Slide 36

Slide 36 text

Two Approaches

Slide 37

Slide 37 text

Multithreaded

Slide 38

Slide 38 text

Before My Talk :) require 'net/http' 100.times do |page| puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end # 0.18s user 0.11s system 0% cpu 44.739 total

Slide 39

Slide 39 text

Wait, I have cores and stuff

Slide 40

Slide 40 text

Using Threads require 'thread' require 'net/http' queue = Queue.new 100.times do |i| queue << i end threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join) # ruby multithreaded_http.rb 0.17s user 0.11s system 3% cpu 8.419 total

Slide 41

Slide 41 text

• Single Thread > 40 seconds • Multithreaded < 10 seconds • I know....MOAR THREADS

Slide 42

Slide 42 text

Custom # Threads require 'thread' require 'net/http' queue = Queue.new 100.times do |i| queue << i end workers = ARGV[0].to_i threads = (0..workers).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join)

Slide 43

Slide 43 text

Results # Threads Time (seconds) 4 ~8 5 ~8.5 6 ~9 7 ~9 8 ~9.5 9 ~10 10 ~11

Slide 44

Slide 44 text

More Threads != Faster • Computer can only run one so many threads at once • Context switching • Blocking I/O (HTTP) limits throughput

Slide 45

Slide 45 text

Let’s Do Some Math

Slide 46

Slide 46 text

Password Cracking require 'thread' require 'digest/sha1' encrypted = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8" queue = Queue.new Dictionary.each do |plaintext| queue << plaintext end threads = (0..INFINITY).map do |i| Thread.new do while !queue.empty? plaintext = queue.pop result = Digest::SHA1.hexdigest plaintext if result == encrypted puts "Decrypted to: #{plaintext}" exit end end end end threads.map(&:join)

Slide 47

Slide 47 text

$ ssh adam@mothership

Slide 48

Slide 48 text

... and it’s slow. Explain why it’s slow

Slide 49

Slide 49 text

Enter the GIL (Global Interpreter Lock)

Slide 50

Slide 50 text

Also enter JRuby & Rubinius

Slide 51

Slide 51 text

G I L • Only one thread an execute Ruby code at a given time • Each implementation is different • JRuby and Rubinius don’t have a GIL • MRI has a GIL • This makes true parallel programming impossible on MRI

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

JVM

Slide 54

Slide 54 text

JRuby or Rubinius is more performant for multithreaded programs

Slide 55

Slide 55 text

multiprocess* JRuby and Windows users need not apply

Slide 56

Slide 56 text

Forkin’ require 'net/http' require 'thread' pages = 100 workers = 4 queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do pages.each do |page| Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid}" end end end Process.waitall

Slide 57

Slide 57 text

Hydra: Processes + Threads

Slide 58

Slide 58 text

require 'net/http' require 'thread' pages = 100 workers = 4 queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do queue = Queue.new pages.each { |i| queue << i } threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid} (#{Thread.current})" end end end threads.map(&:join) end end Process.waitall Who need’s IPC when you can use block variables

Slide 59

Slide 59 text

psst....all that code was wrong

Slide 60

Slide 60 text

This Just Happens to Work threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop # do stuff end end end

Slide 61

Slide 61 text

This Just Happens to Work threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking

Slide 62

Slide 62 text

This Just Happens to Work threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking Blocking

Slide 63

Slide 63 text

This code will deadlock in some cases

Slide 64

Slide 64 text

Deadlock: A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does.

Slide 65

Slide 65 text

The Fix def pop @mutex.synchronize do @array.empty? ? false : @array.pop end end

Slide 66

Slide 66 text

Faster Web Servers

Slide 67

Slide 67 text

Prefork Model • Start a process. Get everything ready. • Fork a given # of times to create worker processes • Parent manages the children

Slide 68

Slide 68 text

Unicorn

Slide 69

Slide 69 text

Unicorn Hey guys You alive?

Slide 70

Slide 70 text

Unicorn Unix Sockets Hey guys You alive?

Slide 71

Slide 71 text

Unicorn 500MB 500MB 500MB 500MB 500MB Unix Sockets Hey guys You alive?

Slide 72

Slide 72 text

Pain Points • Interprocess Communication (IPC) • Synchronization must happen • 5 processes, 5 times as much memory • Process monitoring

Slide 73

Slide 73 text

Easier Concurrent Ruby Programs

Slide 74

Slide 74 text

No content

Slide 75

Slide 75 text

The Actor Model • Each Actor is an object running in its own thread • Handles communication with mailboxes

Slide 76

Slide 76 text

What Is It? • Implementation of the actor model by Tony Arcieri. He’s a Ruby hero! • Handles pooling, supervising, messaging, and many other things • Makes writing concurrent OOP as easy as sequential OOP programs. • Avoids deadlocks by handling state internally • Actors are threads ; method calls are fibers

Slide 77

Slide 77 text

Handling Pain Points

Slide 78

Slide 78 text

Monitoring class Worker include Celluloid end worker = Worker.supervise

Slide 79

Slide 79 text

Save it for Later class Worker include Celluloid end Worker.supervise_as :worker # now other parts of the program # can access the actor instance Celluloid::Actor[:worker]

Slide 80

Slide 80 text

“IPC” class Worker include Celluloid end worker = Worker.new worker.mailbox << Message.new

Slide 81

Slide 81 text

Mailboxes Work Everywhere worker = Celluloid::Actor[:worker] worker.mailbox << Message.new

Slide 82

Slide 82 text

require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline = headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end

Slide 83

Slide 83 text

require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline = headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async

Slide 84

Slide 84 text

require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline = headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async Block until a message is received

Slide 85

Slide 85 text

$ ruby mailbox_example.rb BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome!

Slide 86

Slide 86 text

Avoid Deadlocks In ATOM mode, Celluloid actors will "pipeline" work, meaning that in cases where they might execute a "blocking" call, they will continue processing incoming requests as opposed to waiting for the call to complete. This approach prevents the type of deadlocks you might ordinarily encounter in actor RPC systems such as Erlang or Akka. - Celluloid Wiki

Slide 87

Slide 87 text

Simple Example

Slide 88

Slide 88 text

Simple To Use require 'celluloid' require 'net/http' class Worker include Celluloid def fetch(page) Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Thread.current}" end end pool = Worker.pool # uses # of cores for default pool size 100.times do |i| pool.fetch i end

Slide 89

Slide 89 text

K, time to scale out

Slide 90

Slide 90 text

No content

Slide 91

Slide 91 text

require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class Worker include Celluloid end Worker.supervise_as :worker

Slide 92

Slide 92 text

require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class Worker include Celluloid end Worker.supervise_as :worker Register a Node

Slide 93

Slide 93 text

require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq

Slide 94

Slide 94 text

require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq Drop a cell in this node

Slide 95

Slide 95 text

require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff

Slide 96

Slide 96 text

require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network

Slide 97

Slide 97 text

require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node

Slide 98

Slide 98 text

require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node HOLY SHIT!

Slide 99

Slide 99 text

No content

Slide 100

Slide 100 text

Oh ya, you can cluster nodes for massive pwnage

Slide 101

Slide 101 text

Now it’s up to you to do hard work