Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dear God What am I Doing? Parallel and Concurrent Processing

Dear God What am I Doing? Parallel and Concurrent Processing

Presented at RubyConf AU & Wroc_love.rb

Adam Hawkins

March 04, 2013
Tweet

More Decks by Adam Hawkins

Other Decks in Programming

Transcript

  1. DEAR GOD WHAT AM I DOING? CONCURRENCY AND PARALLEL PROCESSING

    Adam Hawkins - tw://adman65 - gh://twinturbo
  2. Who Is this Talk for? There’s been a lot of

    talks on performance related things: Yesterday we had Immutable Ruby. Today we had Charles talking about JRuby optimizations. So you’re already primed for this one.
  3. Three Primitives • Processes: separate memory, separate everything. Scheduled by

    the kernel. • Threads: combine to make processes. Scheduled by the kernel. • Fibers: Like Threads. Scheduled by the programmer. 4KB stack.
  4. TL;DS • Kernel decides which process to run (which may

    have multiple threads) • Processes or threads may block causing the scheduler to select another thread/process for execution • I/O is the most common blocking operation
  5. Quickly • Threads behave differently according the platform (JRuby vs

    MRI vs Rubinius) • Ruby Thread classes are backed by native threads on 1.9 • Green threads prior to 1.9 (aka simulated threads)
  6. require 'thread' (0..5).each do |i| Thread.new do puts "Hello from

    thread: #{i}" end end Ask what this code will do. Question: what will this code output?
  7. require 'thread' threads = (0..5).map do |i| Thread.new do puts

    "Hello from thread: #{i}" end end threads.map(&:join)
  8. $ ruby joining_threads.rb Hello from thread: 1 Hello from thread:

    2 Hello from thread: 4 Hello from thread: 0 Hello from thread: 3 Hello from thread: 5
  9. Order is Nondeterministic $ ruby joining_threads.rb Hello from thread: 5

    Hello from thread: 2 Hello from thread: 1 Hello from thread: 0 Hello from thread: 3 Hello from thread: 4
  10. Shared Memory require 'thread' balance = 100 interest = Thread.new

    do while true sleep 0.1 balance = balance * 1.025 end end while balance < 200 sleep 0.25 puts "Banking: #{balance}" end Question: What’s wrong with this code?
  11. Straight Up Locks require 'thread' lock = Mutex.new balance =

    100 Thread.new do while true sleep 0.1 lock.synchronize do balance = balance * 1.025 end end end while balance < 200 lock.synchronize do puts "Balance: #{balance}" sleep 1 end end
  12. balance = 100 pid = fork do while true sleep

    0.5 balance = balance * 1.0125 puts "Child Balance: #{balance}" end end # parent if pid while true do sleep 0.5 puts "Parent Balance: #{balance}" end end What’s wrong with this code? What happens to balance? What happens when both processes need to access the balance? inter-process locks
  13. Memory is Not Shared $ ruby forking_example.rb Child Balance: 101.25

    Parent Balance: 100 Child Balance: 102.515625 Parent Balance: 100 Child Balance: 103.7970703125 Parent Balance: 100 Child Balance: 105.09453369140624 Parent Balance: 100 Child Balance: 106.40821536254882 Parent Balance: 100
  14. Challenge: Fetch100 pages of Ruby search results as fast as

    possible The fetching is going to use HTTP (thusly IO) which ruby’s thread scheduler will optimize.
  15. Before My Talk :) require 'net/http' 100.times do |page| puts

    "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end # 0.18s user 0.11s system 0% cpu 44.739 total
  16. Using Threads require 'thread' require 'net/http' queue = Queue.new 100.times

    do |i| queue << i end threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join) # ruby multithreaded_http.rb 0.17s user 0.11s system 3% cpu 8.419 total
  17. • Single Thread > 40 seconds • Multithreaded < 10

    seconds • I know....MOAR THREADS
  18. Custom # Threads require 'thread' require 'net/http' queue = Queue.new

    100.times do |i| queue << i end workers = ARGV[0].to_i threads = (0..workers).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join)
  19. More Threads != Faster • Computer can only run one

    so many threads at once • Context switching • Blocking I/O (HTTP) limits throughput
  20. Password Cracking require 'thread' require 'digest/sha1' encrypted = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8" queue

    = Queue.new Dictionary.each do |plaintext| queue << plaintext end threads = (0..INFINITY).map do |i| Thread.new do while !queue.empty? plaintext = queue.pop result = Digest::SHA1.hexdigest plaintext if result == encrypted puts "Decrypted to: #{plaintext}" exit end end end end threads.map(&:join)
  21. G I L • Only one thread an execute Ruby

    code at a given time • Each implementation is different • JRuby and Rubinius don’t have a GIL • MRI has a GIL • This makes true parallel programming impossible on MRI
  22. JVM

  23. Forkin’ require 'net/http' require 'thread' pages = 100 workers =

    4 queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do pages.each do |page| Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid}" end end end Process.waitall
  24. require 'net/http' require 'thread' pages = 100 workers = 4

    queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do queue = Queue.new pages.each { |i| queue << i } threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid} (#{Thread.current})" end end end threads.map(&:join) end end Process.waitall Who need’s IPC when you can use block variables
  25. This Just Happens to Work threads = (0..4).map do |i|

    Thread.new do while !queue.empty? page = queue.pop # do stuff end end end
  26. This Just Happens to Work threads = (0..4).map do |i|

    Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking
  27. This Just Happens to Work threads = (0..4).map do |i|

    Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking Blocking
  28. Deadlock: A deadlock is a situation in which two or

    more competing actions are each waiting for the other to finish, and thus neither ever does.
  29. Prefork Model • Start a process. Get everything ready. •

    Fork a given # of times to create worker processes • Parent manages the children
  30. Pain Points • Interprocess Communication (IPC) • Synchronization must happen

    • 5 processes, 5 times as much memory • Process monitoring
  31. The Actor Model • Each Actor is an object running

    in its own thread • Handles communication with mailboxes
  32. What Is It? • Implementation of the actor model by

    Tony Arcieri. He’s a Ruby hero! • Handles pooling, supervising, messaging, and many other things • Makes writing concurrent OOP as easy as sequential OOP programs. • Avoids deadlocks by handling state internally • Actors are threads ; method calls are fibers
  33. Save it for Later class Worker include Celluloid end Worker.supervise_as

    :worker # now other parts of the program # can access the actor instance Celluloid::Actor[:worker]
  34. require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline =

    headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end
  35. require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline =

    headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async
  36. require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline =

    headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async Block until a message is received
  37. $ ruby mailbox_example.rb BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS!

    wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome!
  38. Avoid Deadlocks In ATOM mode, Celluloid actors will "pipeline" work,

    meaning that in cases where they might execute a "blocking" call, they will continue processing incoming requests as opposed to waiting for the call to complete. This approach prevents the type of deadlocks you might ordinarily encounter in actor RPC systems such as Erlang or Akka. - Celluloid Wiki
  39. Simple To Use require 'celluloid' require 'net/http' class Worker include

    Celluloid def fetch(page) Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Thread.current}" end end pool = Worker.pool # uses # of cores for default pool size 100.times do |i| pool.fetch i end
  40. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker
  41. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker Register a Node
  42. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq
  43. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq Drop a cell in this node
  44. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff
  45. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network
  46. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node
  47. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node HOLY SHIT!