Dear God What am I Doing? Parallel and Concurrent Processing

DEAR GOD WHAT AM I DOING? CONCURRENCY AND PARALLEL PROCESSING
Adam Hawkins - tw://adman65 - gh://twinturbo

Who Is this Talk for? There’s been a lot of
talks on performance related things: Yesterday we had Immutable Ruby. Today we had Charles talking about JRuby optimizations. So you’re already primed for this one.

When you type Thread or fork you feel like this

A Brief Introduction to Machine Architecture

We Can Model this Complex System with the Following Diagram

hamster :: thread

wheel :: process hamster :: thread

Let’s Get Serious

The eternal question: How can I make code faster?

Do Multiple Things at Once

Three Primitives • Processes: separate memory, separate everything. Scheduled by
the kernel. • Threads: combine to make processes. Scheduled by the kernel. • Fibers: Like Threads. Scheduled by the programmer. 4KB stack.

TL;DS • Kernel decides which process to run (which may
have multiple threads) • Processes or threads may block causing the scheduler to select another thread/process for execution • I/O is the most common blocking operation

Quickly • Threads behave differently according the platform (JRuby vs
MRI vs Rubinius) • Ruby Thread classes are backed by native threads on 1.9 • Green threads prior to 1.9 (aka simulated threads)

Threads* *easiest and quickest win

require 'thread' (0..5).each do |i| Thread.new do puts "Hello from
thread: #{i}" end end Ask what this code will do. Question: what will this code output?

require 'thread' threads = (0..5).map do |i| Thread.new do puts
"Hello from thread: #{i}" end end threads.map(&:join)

$ ruby joining_threads.rb Hello from thread: 1 Hello from thread:
2 Hello from thread: 4 Hello from thread: 0 Hello from thread: 3 Hello from thread: 5

Order is Nondeterministic $ ruby joining_threads.rb Hello from thread: 5
Hello from thread: 2 Hello from thread: 1 Hello from thread: 0 Hello from thread: 3 Hello from thread: 4

Shared Memory require 'thread' balance = 100 interest = Thread.new
do while true sleep 0.1 balance = balance * 1.025 end end while balance < 200 sleep 0.25 puts "Banking: #{balance}" end Question: What’s wrong with this code?

Straight Up Locks require 'thread' lock = Mutex.new balance =
100 Thread.new do while true sleep 0.1 lock.synchronize do balance = balance * 1.025 end end end while balance < 200 lock.synchronize do puts "Balance: #{balance}" sleep 1 end end

Blocking notice we haven’t done anything that could block. Only
simple math operations.

I/O Blocks

Processes

balance = 100 pid = fork do while true sleep
0.5 balance = balance * 1.0125 puts "Child Balance: #{balance}" end end # parent if pid while true do sleep 0.5 puts "Parent Balance: #{balance}" end end What’s wrong with this code? What happens to balance? What happens when both processes need to access the balance? inter-process locks

Memory is Not Shared $ ruby forking_example.rb Child Balance: 101.25
Parent Balance: 100 Child Balance: 102.515625 Parent Balance: 100 Child Balance: 103.7970703125 Parent Balance: 100 Child Balance: 105.09453369140624 Parent Balance: 100 Child Balance: 106.40821536254882 Parent Balance: 100

Fibers, TL;DR Fibers are like threads, not so important for
this talk

Making Things Faster

Challenge: Fetch100 pages of Ruby search results as fast as
possible The fetching is going to use HTTP (thusly IO) which ruby’s thread scheduler will optimize.

Two Approaches

Multithreaded

Before My Talk :) require 'net/http' 100.times do |page| puts
"Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end # 0.18s user 0.11s system 0% cpu 44.739 total

Wait, I have cores and stuff

Using Threads require 'thread' require 'net/http' queue = Queue.new 100.times
do |i| queue << i end threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join) # ruby multithreaded_http.rb 0.17s user 0.11s system 3% cpu 8.419 total

• Single Thread > 40 seconds • Multithreaded < 10
seconds • I know....MOAR THREADS

Custom # Threads require 'thread' require 'net/http' queue = Queue.new
100.times do |i| queue << i end workers = ARGV[0].to_i threads = (0..workers).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join)

Results # Threads Time (seconds) 4 ~8 5 ~8.5 6
~9 7 ~9 8 ~9.5 9 ~10 10 ~11

More Threads != Faster • Computer can only run one
so many threads at once • Context switching • Blocking I/O (HTTP) limits throughput

Let’s Do Some Math

Password Cracking require 'thread' require 'digest/sha1' encrypted = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8" queue
= Queue.new Dictionary.each do |plaintext| queue << plaintext end threads = (0..INFINITY).map do |i| Thread.new do while !queue.empty? plaintext = queue.pop result = Digest::SHA1.hexdigest plaintext if result == encrypted puts "Decrypted to: #{plaintext}" exit end end end end threads.map(&:join)

$ ssh adam@mothership

... and it’s slow. Explain why it’s slow

Enter the GIL (Global Interpreter Lock)

Also enter JRuby & Rubinius

G I L • Only one thread an execute Ruby
code at a given time • Each implementation is different • JRuby and Rubinius don’t have a GIL • MRI has a GIL • This makes true parallel programming impossible on MRI

JRuby or Rubinius is more performant for multithreaded programs

multiprocess* JRuby and Windows users need not apply

Forkin’ require 'net/http' require 'thread' pages = 100 workers =
4 queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do pages.each do |page| Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid}" end end end Process.waitall

Hydra: Processes + Threads

require 'net/http' require 'thread' pages = 100 workers = 4
queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do queue = Queue.new pages.each { |i| queue << i } threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid} (#{Thread.current})" end end end threads.map(&:join) end end Process.waitall Who need’s IPC when you can use block variables

psst....all that code was wrong

This Just Happens to Work threads = (0..4).map do |i|
Thread.new do while !queue.empty? page = queue.pop # do stuff end end end

Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking

Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking Blocking

This code will deadlock in some cases

Deadlock: A deadlock is a situation in which two or
more competing actions are each waiting for the other to ﬁnish, and thus neither ever does.

The Fix def pop @mutex.synchronize do @array.empty? ? false :
@array.pop end end

Faster Web Servers

Prefork Model • Start a process. Get everything ready. •
Fork a given # of times to create worker processes • Parent manages the children

Unicorn

Unicorn Hey guys You alive?

Unicorn Unix Sockets Hey guys You alive?

Unicorn 500MB 500MB 500MB 500MB 500MB Unix Sockets Hey guys
You alive?

Pain Points • Interprocess Communication (IPC) • Synchronization must happen
• 5 processes, 5 times as much memory • Process monitoring

Easier Concurrent Ruby Programs

The Actor Model • Each Actor is an object running
in its own thread • Handles communication with mailboxes

What Is It? • Implementation of the actor model by
Tony Arcieri. He’s a Ruby hero! • Handles pooling, supervising, messaging, and many other things • Makes writing concurrent OOP as easy as sequential OOP programs. • Avoids deadlocks by handling state internally • Actors are threads ; method calls are ﬁbers

Handling Pain Points

Monitoring class Worker include Celluloid end worker = Worker.supervise

Save it for Later class Worker include Celluloid end Worker.supervise_as
:worker # now other parts of the program # can access the actor instance Celluloid::Actor[:worker]

“IPC” class Worker include Celluloid end worker = Worker.new worker.mailbox
<< Message.new

Mailboxes Work Everywhere worker = Celluloid::Actor[:worker] worker.mailbox << Message.new

require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline =
headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end

headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async

headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async Block until a message is received

$ ruby mailbox_example.rb BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS!
wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome!

Avoid Deadlocks In ATOM mode, Celluloid actors will "pipeline" work,
meaning that in cases where they might execute a "blocking" call, they will continue processing incoming requests as opposed to waiting for the call to complete. This approach prevents the type of deadlocks you might ordinarily encounter in actor RPC systems such as Erlang or Akka. - Celluloid Wiki

Simple Example

Simple To Use require 'celluloid' require 'net/http' class Worker include
Celluloid def fetch(page) Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Thread.current}" end end pool = Worker.pool # uses # of cores for default pool size 100.times do |i| pool.fetch i end

K, time to scale out

require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class
Worker include Celluloid end Worker.supervise_as :worker

Worker include Celluloid end Worker.supervise_as :worker Register a Node

Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq

Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq Drop a cell in this node

require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node
= DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff

= DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network

= DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node

= DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node HOLY SHIT!

Oh ya, you can cluster nodes for massive pwnage

Now it’s up to you to do hard work

Dear God What am I Doing? Parallel and Concurre...

Dear God What am I Doing? Parallel and Concurrent Processing

More Decks by Adam Hawkins

Other Decks in Programming

Featured

Transcript