Who Is this Talk for? There’s been a lot of talks on performance related things: Yesterday we had Immutable Ruby. Today we had Charles talking about JRuby optimizations. So you’re already primed for this one.
Three Primitives • Processes: separate memory, separate everything. Scheduled by the kernel. • Threads: combine to make processes. Scheduled by the kernel. • Fibers: Like Threads. Scheduled by the programmer. 4KB stack.
TL;DS • Kernel decides which process to run (which may have multiple threads) • Processes or threads may block causing the scheduler to select another thread/process for execution • I/O is the most common blocking operation
Quickly • Threads behave differently according the platform (JRuby vs MRI vs Rubinius) • Ruby Thread classes are backed by native threads on 1.9 • Green threads prior to 1.9 (aka simulated threads)
require 'thread' (0..5).each do |i| Thread.new do puts "Hello from thread: #{i}" end end Ask what this code will do. Question: what will this code output?
$ ruby joining_threads.rb Hello from thread: 1 Hello from thread: 2 Hello from thread: 4 Hello from thread: 0 Hello from thread: 3 Hello from thread: 5
Order is Nondeterministic $ ruby joining_threads.rb Hello from thread: 5 Hello from thread: 2 Hello from thread: 1 Hello from thread: 0 Hello from thread: 3 Hello from thread: 4
Shared Memory require 'thread' balance = 100 interest = Thread.new do while true sleep 0.1 balance = balance * 1.025 end end while balance < 200 sleep 0.25 puts "Banking: #{balance}" end Question: What’s wrong with this code?
Straight Up Locks require 'thread' lock = Mutex.new balance = 100 Thread.new do while true sleep 0.1 lock.synchronize do balance = balance * 1.025 end end end while balance < 200 lock.synchronize do puts "Balance: #{balance}" sleep 1 end end
balance = 100 pid = fork do while true sleep 0.5 balance = balance * 1.0125 puts "Child Balance: #{balance}" end end # parent if pid while true do sleep 0.5 puts "Parent Balance: #{balance}" end end What’s wrong with this code? What happens to balance? What happens when both processes need to access the balance? inter-process locks
Challenge: Fetch100 pages of Ruby search results as fast as possible The fetching is going to use HTTP (thusly IO) which ruby’s thread scheduler will optimize.
Before My Talk :) require 'net/http' 100.times do |page| puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end # 0.18s user 0.11s system 0% cpu 44.739 total
Using Threads require 'thread' require 'net/http' queue = Queue.new 100.times do |i| queue << i end threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join) # ruby multithreaded_http.rb 0.17s user 0.11s system 3% cpu 8.419 total
Custom # Threads require 'thread' require 'net/http' queue = Queue.new 100.times do |i| queue << i end workers = ARGV[0].to_i threads = (0..workers).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join)
Password Cracking require 'thread' require 'digest/sha1' encrypted = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8" queue = Queue.new Dictionary.each do |plaintext| queue << plaintext end threads = (0..INFINITY).map do |i| Thread.new do while !queue.empty? plaintext = queue.pop result = Digest::SHA1.hexdigest plaintext if result == encrypted puts "Decrypted to: #{plaintext}" exit end end end end threads.map(&:join)
G I L • Only one thread an execute Ruby code at a given time • Each implementation is different • JRuby and Rubinius don’t have a GIL • MRI has a GIL • This makes true parallel programming impossible on MRI
require 'net/http' require 'thread' pages = 100 workers = 4 queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do queue = Queue.new pages.each { |i| queue << i } threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid} (#{Thread.current})" end end end threads.map(&:join) end end Process.waitall Who need’s IPC when you can use block variables
What Is It? • Implementation of the actor model by Tony Arcieri. He’s a Ruby hero! • Handles pooling, supervising, messaging, and many other things • Makes writing concurrent OOP as easy as sequential OOP programs. • Avoids deadlocks by handling state internally • Actors are threads ; method calls are fibers
Save it for Later class Worker include Celluloid end Worker.supervise_as :worker # now other parts of the program # can access the actor instance Celluloid::Actor[:worker]
require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline = headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end
require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline = headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async
require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline = headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async Block until a message is received
Avoid Deadlocks In ATOM mode, Celluloid actors will "pipeline" work, meaning that in cases where they might execute a "blocking" call, they will continue processing incoming requests as opposed to waiting for the call to complete. This approach prevents the type of deadlocks you might ordinarily encounter in actor RPC systems such as Erlang or Akka. - Celluloid Wiki
Simple To Use require 'celluloid' require 'net/http' class Worker include Celluloid def fetch(page) Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Thread.current}" end end pool = Worker.pool # uses # of cores for default pool size 100.times do |i| pool.fetch i end
require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class Worker include Celluloid end Worker.supervise_as :worker Register a Node
require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq
require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq Drop a cell in this node
require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node
require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node HOLY SHIT!