Dear God What am I Doing? Parallel and Concurrent Processing

Dear God What am I Doing? Parallel and Concurrent Processing

Presented at RubyConf AU & Wroc_love.rb

94378c403019af23a28b08447a34b8e0?s=128

Adam Hawkins

March 04, 2013
Tweet

Transcript

  1. DEAR GOD WHAT AM I DOING? CONCURRENCY AND PARALLEL PROCESSING

    Adam Hawkins - tw://adman65 - gh://twinturbo
  2. Who Is this Talk for? There’s been a lot of

    talks on performance related things: Yesterday we had Immutable Ruby. Today we had Charles talking about JRuby optimizations. So you’re already primed for this one.
  3. When you type Thread or fork you feel like this

  4. None
  5. A Brief Introduction to Machine Architecture

  6. We Can Model this Complex System with the Following Diagram

  7. None
  8. None
  9. hamster :: thread

  10. wheel :: process hamster :: thread

  11. None
  12. None
  13. Let’s Get Serious

  14. The eternal question: How can I make code faster?

  15. None
  16. Do Multiple Things at Once

  17. Three Primitives • Processes: separate memory, separate everything. Scheduled by

    the kernel. • Threads: combine to make processes. Scheduled by the kernel. • Fibers: Like Threads. Scheduled by the programmer. 4KB stack.
  18. TL;DS • Kernel decides which process to run (which may

    have multiple threads) • Processes or threads may block causing the scheduler to select another thread/process for execution • I/O is the most common blocking operation
  19. Quickly • Threads behave differently according the platform (JRuby vs

    MRI vs Rubinius) • Ruby Thread classes are backed by native threads on 1.9 • Green threads prior to 1.9 (aka simulated threads)
  20. Threads* *easiest and quickest win

  21. require 'thread' (0..5).each do |i| Thread.new do puts "Hello from

    thread: #{i}" end end Ask what this code will do. Question: what will this code output?
  22. require 'thread' threads = (0..5).map do |i| Thread.new do puts

    "Hello from thread: #{i}" end end threads.map(&:join)
  23. $ ruby joining_threads.rb Hello from thread: 1 Hello from thread:

    2 Hello from thread: 4 Hello from thread: 0 Hello from thread: 3 Hello from thread: 5
  24. Order is Nondeterministic $ ruby joining_threads.rb Hello from thread: 5

    Hello from thread: 2 Hello from thread: 1 Hello from thread: 0 Hello from thread: 3 Hello from thread: 4
  25. Shared Memory require 'thread' balance = 100 interest = Thread.new

    do while true sleep 0.1 balance = balance * 1.025 end end while balance < 200 sleep 0.25 puts "Banking: #{balance}" end Question: What’s wrong with this code?
  26. Straight Up Locks require 'thread' lock = Mutex.new balance =

    100 Thread.new do while true sleep 0.1 lock.synchronize do balance = balance * 1.025 end end end while balance < 200 lock.synchronize do puts "Balance: #{balance}" sleep 1 end end
  27. Blocking notice we haven’t done anything that could block. Only

    simple math operations.
  28. I/O Blocks

  29. Processes

  30. fork

  31. balance = 100 pid = fork do while true sleep

    0.5 balance = balance * 1.0125 puts "Child Balance: #{balance}" end end # parent if pid while true do sleep 0.5 puts "Parent Balance: #{balance}" end end What’s wrong with this code? What happens to balance? What happens when both processes need to access the balance? inter-process locks
  32. Memory is Not Shared $ ruby forking_example.rb Child Balance: 101.25

    Parent Balance: 100 Child Balance: 102.515625 Parent Balance: 100 Child Balance: 103.7970703125 Parent Balance: 100 Child Balance: 105.09453369140624 Parent Balance: 100 Child Balance: 106.40821536254882 Parent Balance: 100
  33. Fibers, TL;DR Fibers are like threads, not so important for

    this talk
  34. Making Things Faster

  35. Challenge: Fetch100 pages of Ruby search results as fast as

    possible The fetching is going to use HTTP (thusly IO) which ruby’s thread scheduler will optimize.
  36. Two Approaches

  37. Multithreaded

  38. Before My Talk :) require 'net/http' 100.times do |page| puts

    "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end # 0.18s user 0.11s system 0% cpu 44.739 total
  39. Wait, I have cores and stuff

  40. Using Threads require 'thread' require 'net/http' queue = Queue.new 100.times

    do |i| queue << i end threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join) # ruby multithreaded_http.rb 0.17s user 0.11s system 3% cpu 8.419 total
  41. • Single Thread > 40 seconds • Multithreaded < 10

    seconds • I know....MOAR THREADS
  42. Custom # Threads require 'thread' require 'net/http' queue = Queue.new

    100.times do |i| queue << i end workers = ARGV[0].to_i threads = (0..workers).map do |i| Thread.new do while !queue.empty? page = queue.pop puts "Getting page: #{page}" Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" end end end threads.map(&:join)
  43. Results # Threads Time (seconds) 4 ~8 5 ~8.5 6

    ~9 7 ~9 8 ~9.5 9 ~10 10 ~11
  44. More Threads != Faster • Computer can only run one

    so many threads at once • Context switching • Blocking I/O (HTTP) limits throughput
  45. Let’s Do Some Math

  46. Password Cracking require 'thread' require 'digest/sha1' encrypted = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8" queue

    = Queue.new Dictionary.each do |plaintext| queue << plaintext end threads = (0..INFINITY).map do |i| Thread.new do while !queue.empty? plaintext = queue.pop result = Digest::SHA1.hexdigest plaintext if result == encrypted puts "Decrypted to: #{plaintext}" exit end end end end threads.map(&:join)
  47. $ ssh adam@mothership

  48. ... and it’s slow. Explain why it’s slow

  49. Enter the GIL (Global Interpreter Lock)

  50. Also enter JRuby & Rubinius

  51. G I L • Only one thread an execute Ruby

    code at a given time • Each implementation is different • JRuby and Rubinius don’t have a GIL • MRI has a GIL • This makes true parallel programming impossible on MRI
  52. None
  53. JVM

  54. JRuby or Rubinius is more performant for multithreaded programs

  55. multiprocess* JRuby and Windows users need not apply

  56. Forkin’ require 'net/http' require 'thread' pages = 100 workers =

    4 queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do pages.each do |page| Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid}" end end end Process.waitall
  57. Hydra: Processes + Threads

  58. require 'net/http' require 'thread' pages = 100 workers = 4

    queue_size = (pages / workers).ceil queues = (1..pages).each_slice(queue_size).to_a queues.each do |pages| fork do queue = Queue.new pages.each { |i| queue << i } threads = (0..4).map do |i| Thread.new do while !queue.empty? page = queue.pop Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Process.pid} (#{Thread.current})" end end end threads.map(&:join) end end Process.waitall Who need’s IPC when you can use block variables
  59. psst....all that code was wrong

  60. This Just Happens to Work threads = (0..4).map do |i|

    Thread.new do while !queue.empty? page = queue.pop # do stuff end end end
  61. This Just Happens to Work threads = (0..4).map do |i|

    Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking
  62. This Just Happens to Work threads = (0..4).map do |i|

    Thread.new do while !queue.empty? page = queue.pop # do stuff end end end Nonblocking Blocking
  63. This code will deadlock in some cases

  64. Deadlock: A deadlock is a situation in which two or

    more competing actions are each waiting for the other to finish, and thus neither ever does.
  65. The Fix def pop @mutex.synchronize do @array.empty? ? false :

    @array.pop end end
  66. Faster Web Servers

  67. Prefork Model • Start a process. Get everything ready. •

    Fork a given # of times to create worker processes • Parent manages the children
  68. Unicorn

  69. Unicorn Hey guys You alive?

  70. Unicorn Unix Sockets Hey guys You alive?

  71. Unicorn 500MB 500MB 500MB 500MB 500MB Unix Sockets Hey guys

    You alive?
  72. Pain Points • Interprocess Communication (IPC) • Synchronization must happen

    • 5 processes, 5 times as much memory • Process monitoring
  73. Easier Concurrent Ruby Programs

  74. None
  75. The Actor Model • Each Actor is an object running

    in its own thread • Handles communication with mailboxes
  76. What Is It? • Implementation of the actor model by

    Tony Arcieri. He’s a Ruby hero! • Handles pooling, supervising, messaging, and many other things • Makes writing concurrent OOP as easy as sequential OOP programs. • Avoids deadlocks by handling state internally • Actors are threads ; method calls are fibers
  77. Handling Pain Points

  78. Monitoring class Worker include Celluloid end worker = Worker.supervise

  79. Save it for Later class Worker include Celluloid end Worker.supervise_as

    :worker # now other parts of the program # can access the actor instance Celluloid::Actor[:worker]
  80. “IPC” class Worker include Celluloid end worker = Worker.new worker.mailbox

    << Message.new
  81. Mailboxes Work Everywhere worker = Celluloid::Actor[:worker] worker.mailbox << Message.new

  82. require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline =

    headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end
  83. require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline =

    headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async
  84. require 'celluloid' class Story attr_reader :headline def initialize(headline) @headline =

    headline end end class Broadcaster include Celluloid def initialize async.wait_for_messages end def wait_for_messages loop do message = receive { |msg| msg.is_a? Story } puts "BREAKING NEWS! #{message.headline}" end end end broadcaster = Broadcaster.new loop do broadcaster.mailbox << Story.new("wroc_love.rb is awesome!") sleep 1 end Oh ya, Celluloid can make any method async Block until a message is received
  85. $ ruby mailbox_example.rb BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS!

    wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome! BREAKING NEWS! wroc_love.rb is awesome!
  86. Avoid Deadlocks In ATOM mode, Celluloid actors will "pipeline" work,

    meaning that in cases where they might execute a "blocking" call, they will continue processing incoming requests as opposed to waiting for the call to complete. This approach prevents the type of deadlocks you might ordinarily encounter in actor RPC systems such as Erlang or Akka. - Celluloid Wiki
  87. Simple Example

  88. Simple To Use require 'celluloid' require 'net/http' class Worker include

    Celluloid def fetch(page) Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}" puts "Got page: #{page} via #{Thread.current}" end end pool = Worker.pool # uses # of cores for default pool size 100.times do |i| pool.fetch i end
  89. K, time to scale out

  90. None
  91. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker
  92. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker Register a Node
  93. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq
  94. require 'dcell' DCell.start :id => "worker", :addr => "tcp://127.0.0.1:9001" class

    Worker include Celluloid end Worker.supervise_as :worker Register a Node ømq Drop a cell in this node
  95. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff
  96. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network
  97. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node
  98. require 'dcell' DCell.start :id => "producer", :addr => "tcp://127.0.0.1:9002" worker_node

    = DCell::Node["worker"] worker = worker_node[:worker] worker.do_hard_stuff Grab a node from the network Grab a cell from the node HOLY SHIT!
  99. None
  100. Oh ya, you can cluster nodes for massive pwnage

  101. Now it’s up to you to do hard work