Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Dear God What am I Doing? Parallel and Concurrent Processing

Dear God What am I Doing? Parallel and Concurrent Processing

Presented at RubyConf AU & Wroc_love.rb

Adam Hawkins

March 04, 2013
Tweet

More Decks by Adam Hawkins

Other Decks in Programming

Transcript

  1. DEAR GOD WHAT AM
    I DOING?
    CONCURRENCY AND
    PARALLEL PROCESSING
    Adam Hawkins - tw://adman65 - gh://twinturbo

    View Slide

  2. Who Is this Talk for?
    There’s been a
    lot of talks on
    performance
    related things:
    Yesterday we
    had Immutable
    Ruby. Today we
    had Charles
    talking about
    JRuby
    optimizations.
    So you’re
    already primed
    for this one.

    View Slide

  3. When you type
    Thread or fork you
    feel like this

    View Slide

  4. View Slide

  5. A Brief Introduction to
    Machine Architecture

    View Slide

  6. We Can Model this
    Complex System with
    the Following Diagram

    View Slide

  7. View Slide

  8. View Slide

  9. hamster :: thread

    View Slide

  10. wheel :: process
    hamster :: thread

    View Slide

  11. View Slide

  12. View Slide

  13. Let’s Get Serious

    View Slide

  14. The eternal question:
    How can I make
    code faster?

    View Slide

  15. View Slide

  16. Do Multiple Things
    at Once

    View Slide

  17. Three Primitives
    • Processes: separate memory, separate
    everything. Scheduled by the kernel.
    • Threads: combine to make processes.
    Scheduled by the kernel.
    • Fibers: Like Threads. Scheduled by the
    programmer. 4KB stack.

    View Slide

  18. TL;DS
    • Kernel decides which process to run
    (which may have multiple threads)
    • Processes or threads may block causing the
    scheduler to select another thread/process
    for execution
    • I/O is the most common blocking
    operation

    View Slide

  19. Quickly
    • Threads behave differently according the
    platform (JRuby vs MRI vs Rubinius)
    • Ruby Thread classes are backed by native
    threads on 1.9
    • Green threads prior to 1.9 (aka simulated
    threads)

    View Slide

  20. Threads*
    *easiest and quickest win

    View Slide

  21. require 'thread'
    (0..5).each do |i|
    Thread.new do
    puts "Hello from thread: #{i}"
    end
    end
    Ask what this
    code will do.
    Question:
    what will this
    code output?

    View Slide

  22. require 'thread'
    threads = (0..5).map do |i|
    Thread.new do
    puts "Hello from thread: #{i}"
    end
    end
    threads.map(&:join)

    View Slide

  23. $ ruby joining_threads.rb
    Hello from thread: 1
    Hello from thread: 2
    Hello from thread: 4
    Hello from thread: 0
    Hello from thread: 3
    Hello from thread: 5

    View Slide

  24. Order is
    Nondeterministic
    $ ruby joining_threads.rb
    Hello from thread: 5
    Hello from thread: 2
    Hello from thread: 1
    Hello from thread: 0
    Hello from thread: 3
    Hello from thread: 4

    View Slide

  25. Shared Memory
    require 'thread'
    balance = 100
    interest = Thread.new do
    while true
    sleep 0.1
    balance = balance * 1.025
    end
    end
    while balance < 200
    sleep 0.25
    puts "Banking: #{balance}"
    end
    Question:
    What’s wrong
    with this code?

    View Slide

  26. Straight Up Locks
    require 'thread'
    lock = Mutex.new
    balance = 100
    Thread.new do
    while true
    sleep 0.1
    lock.synchronize do
    balance = balance * 1.025
    end
    end
    end
    while balance < 200
    lock.synchronize do
    puts "Balance: #{balance}"
    sleep 1
    end
    end

    View Slide

  27. Blocking
    notice we haven’t
    done anything that
    could block. Only
    simple math
    operations.

    View Slide

  28. I/O Blocks

    View Slide

  29. Processes

    View Slide

  30. fork

    View Slide

  31. balance = 100
    pid = fork do
    while true
    sleep 0.5
    balance = balance * 1.0125
    puts "Child Balance: #{balance}"
    end
    end
    # parent
    if pid
    while true do
    sleep 0.5
    puts "Parent Balance: #{balance}"
    end
    end
    What’s wrong
    with this code?
    What happens to
    balance? What
    happens when
    both processes
    need to access the
    balance?
    inter-process
    locks

    View Slide

  32. Memory is Not Shared
    $ ruby forking_example.rb
    Child Balance: 101.25
    Parent Balance: 100
    Child Balance: 102.515625
    Parent Balance: 100
    Child Balance: 103.7970703125
    Parent Balance: 100
    Child Balance: 105.09453369140624
    Parent Balance: 100
    Child Balance: 106.40821536254882
    Parent Balance: 100

    View Slide

  33. Fibers, TL;DR
    Fibers are like
    threads, not so
    important for
    this talk

    View Slide

  34. Making Things Faster

    View Slide

  35. Challenge: Fetch100
    pages of Ruby search
    results as fast as
    possible
    The fetching is
    going to use HTTP
    (thusly IO) which
    ruby’s thread
    scheduler will
    optimize.

    View Slide

  36. Two Approaches

    View Slide

  37. Multithreaded

    View Slide

  38. Before My Talk :)
    require 'net/http'
    100.times do |page|
    puts "Getting page: #{page}"
    Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}"
    end
    # 0.18s user 0.11s system 0% cpu 44.739 total

    View Slide

  39. Wait, I have cores and
    stuff

    View Slide

  40. Using Threads
    require 'thread'
    require 'net/http'
    queue = Queue.new
    100.times do |i|
    queue << i
    end
    threads = (0..4).map do |i|
    Thread.new do
    while !queue.empty?
    page = queue.pop
    puts "Getting page: #{page}"
    Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}"
    end
    end
    end
    threads.map(&:join)
    # ruby multithreaded_http.rb 0.17s user 0.11s system 3% cpu 8.419 total

    View Slide

  41. • Single Thread > 40 seconds
    • Multithreaded < 10 seconds
    • I know....MOAR THREADS

    View Slide

  42. Custom # Threads
    require 'thread'
    require 'net/http'
    queue = Queue.new
    100.times do |i|
    queue << i
    end
    workers = ARGV[0].to_i
    threads = (0..workers).map do |i|
    Thread.new do
    while !queue.empty?
    page = queue.pop
    puts "Getting page: #{page}"
    Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}"
    end
    end
    end
    threads.map(&:join)

    View Slide

  43. Results
    # Threads Time (seconds)
    4 ~8
    5 ~8.5
    6 ~9
    7 ~9
    8 ~9.5
    9 ~10
    10 ~11

    View Slide

  44. More Threads != Faster
    • Computer can only run one so many
    threads at once
    • Context switching
    • Blocking I/O (HTTP) limits throughput

    View Slide

  45. Let’s Do Some Math

    View Slide

  46. Password Cracking
    require 'thread'
    require 'digest/sha1'
    encrypted = "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8"
    queue = Queue.new
    Dictionary.each do |plaintext|
    queue << plaintext
    end
    threads = (0..INFINITY).map do |i|
    Thread.new do
    while !queue.empty?
    plaintext = queue.pop
    result = Digest::SHA1.hexdigest plaintext
    if result == encrypted
    puts "Decrypted to: #{plaintext}"
    exit
    end
    end
    end
    end
    threads.map(&:join)

    View Slide

  47. View Slide

  48. ... and it’s slow.
    Explain why it’s slow

    View Slide

  49. Enter the GIL (Global
    Interpreter Lock)

    View Slide

  50. Also enter
    JRuby & Rubinius

    View Slide

  51. G I L
    • Only one thread an execute Ruby code at a
    given time
    • Each implementation is different
    • JRuby and Rubinius don’t have a GIL
    • MRI has a GIL
    • This makes true parallel programming
    impossible on MRI

    View Slide

  52. View Slide

  53. JVM

    View Slide

  54. JRuby or Rubinius is
    more performant for
    multithreaded
    programs

    View Slide

  55. multiprocess*
    JRuby and Windows users need not apply

    View Slide

  56. Forkin’
    require 'net/http'
    require 'thread'
    pages = 100
    workers = 4
    queue_size = (pages / workers).ceil
    queues = (1..pages).each_slice(queue_size).to_a
    queues.each do |pages|
    fork do
    pages.each do |page|
    Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}"
    puts "Got page: #{page} via #{Process.pid}"
    end
    end
    end
    Process.waitall

    View Slide

  57. Hydra: Processes + Threads

    View Slide

  58. require 'net/http'
    require 'thread'
    pages = 100
    workers = 4
    queue_size = (pages / workers).ceil
    queues = (1..pages).each_slice(queue_size).to_a
    queues.each do |pages|
    fork do
    queue = Queue.new
    pages.each { |i| queue << i }
    threads = (0..4).map do |i|
    Thread.new do
    while !queue.empty?
    page = queue.pop
    Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}"
    puts "Got page: #{page} via #{Process.pid} (#{Thread.current})"
    end
    end
    end
    threads.map(&:join)
    end
    end
    Process.waitall
    Who
    need’s IPC
    when you
    can use
    block
    variables

    View Slide

  59. psst....all that code
    was wrong

    View Slide

  60. This Just Happens
    to Work
    threads = (0..4).map do |i|
    Thread.new do
    while !queue.empty?
    page = queue.pop
    # do stuff
    end
    end
    end

    View Slide

  61. This Just Happens
    to Work
    threads = (0..4).map do |i|
    Thread.new do
    while !queue.empty?
    page = queue.pop
    # do stuff
    end
    end
    end
    Nonblocking

    View Slide

  62. This Just Happens
    to Work
    threads = (0..4).map do |i|
    Thread.new do
    while !queue.empty?
    page = queue.pop
    # do stuff
    end
    end
    end
    Nonblocking
    Blocking

    View Slide

  63. This code will deadlock
    in some cases

    View Slide

  64. Deadlock: A deadlock is a
    situation in which two or
    more competing actions are
    each waiting for the other
    to finish, and thus neither
    ever does.

    View Slide

  65. The Fix
    def pop
    @mutex.synchronize do
    @array.empty? ? false : @array.pop
    end
    end

    View Slide

  66. Faster Web Servers

    View Slide

  67. Prefork Model
    • Start a process. Get everything ready.
    • Fork a given # of times to create worker
    processes
    • Parent manages the children

    View Slide

  68. Unicorn

    View Slide

  69. Unicorn
    Hey guys
    You alive?

    View Slide

  70. Unicorn
    Unix Sockets
    Hey guys
    You alive?

    View Slide

  71. Unicorn
    500MB 500MB 500MB 500MB 500MB
    Unix Sockets
    Hey guys
    You alive?

    View Slide

  72. Pain Points
    • Interprocess Communication (IPC)
    • Synchronization must happen
    • 5 processes, 5 times as much memory
    • Process monitoring

    View Slide

  73. Easier Concurrent
    Ruby Programs

    View Slide

  74. View Slide

  75. The Actor Model
    • Each Actor is an object running in its own
    thread
    • Handles communication with mailboxes

    View Slide

  76. What Is It?
    • Implementation of the actor model by Tony
    Arcieri. He’s a Ruby hero!
    • Handles pooling, supervising, messaging, and many
    other things
    • Makes writing concurrent OOP as easy as
    sequential OOP programs.
    • Avoids deadlocks by handling state internally
    • Actors are threads ; method calls are fibers

    View Slide

  77. Handling Pain Points

    View Slide

  78. Monitoring
    class Worker
    include Celluloid
    end
    worker = Worker.supervise

    View Slide

  79. Save it for Later
    class Worker
    include Celluloid
    end
    Worker.supervise_as :worker
    # now other parts of the
    program
    # can access the actor instance
    Celluloid::Actor[:worker]

    View Slide

  80. “IPC”
    class Worker
    include Celluloid
    end
    worker = Worker.new
    worker.mailbox << Message.new

    View Slide

  81. Mailboxes Work
    Everywhere
    worker = Celluloid::Actor[:worker]
    worker.mailbox << Message.new

    View Slide

  82. require 'celluloid'
    class Story
    attr_reader :headline
    def initialize(headline)
    @headline = headline
    end
    end
    class Broadcaster
    include Celluloid
    def initialize
    async.wait_for_messages
    end
    def wait_for_messages
    loop do
    message = receive { |msg| msg.is_a? Story }
    puts "BREAKING NEWS! #{message.headline}"
    end
    end
    end
    broadcaster = Broadcaster.new
    loop do
    broadcaster.mailbox << Story.new("wroc_love.rb is awesome!")
    sleep 1
    end

    View Slide

  83. require 'celluloid'
    class Story
    attr_reader :headline
    def initialize(headline)
    @headline = headline
    end
    end
    class Broadcaster
    include Celluloid
    def initialize
    async.wait_for_messages
    end
    def wait_for_messages
    loop do
    message = receive { |msg| msg.is_a? Story }
    puts "BREAKING NEWS! #{message.headline}"
    end
    end
    end
    broadcaster = Broadcaster.new
    loop do
    broadcaster.mailbox << Story.new("wroc_love.rb is awesome!")
    sleep 1
    end
    Oh ya, Celluloid can make
    any method async

    View Slide

  84. require 'celluloid'
    class Story
    attr_reader :headline
    def initialize(headline)
    @headline = headline
    end
    end
    class Broadcaster
    include Celluloid
    def initialize
    async.wait_for_messages
    end
    def wait_for_messages
    loop do
    message = receive { |msg| msg.is_a? Story }
    puts "BREAKING NEWS! #{message.headline}"
    end
    end
    end
    broadcaster = Broadcaster.new
    loop do
    broadcaster.mailbox << Story.new("wroc_love.rb is awesome!")
    sleep 1
    end
    Oh ya, Celluloid can make
    any method async
    Block until a message is received

    View Slide

  85. $ ruby mailbox_example.rb
    BREAKING NEWS! wroc_love.rb is awesome!
    BREAKING NEWS! wroc_love.rb is awesome!
    BREAKING NEWS! wroc_love.rb is awesome!
    BREAKING NEWS! wroc_love.rb is awesome!

    View Slide

  86. Avoid Deadlocks
    In ATOM mode, Celluloid actors will "pipeline"
    work, meaning that in cases where they might
    execute a "blocking" call, they will continue
    processing incoming requests as opposed to
    waiting for the call to complete. This approach
    prevents the type of deadlocks you might ordinarily
    encounter in actor RPC systems such as Erlang or
    Akka.
    - Celluloid Wiki

    View Slide

  87. Simple Example

    View Slide

  88. Simple To Use
    require 'celluloid'
    require 'net/http'
    class Worker
    include Celluloid
    def fetch(page)
    Net::HTTP.get 'www.google.com', "/search?q=ruby&page=#{page}"
    puts "Got page: #{page} via #{Thread.current}"
    end
    end
    pool = Worker.pool # uses # of cores for default pool size
    100.times do |i|
    pool.fetch i
    end

    View Slide

  89. K, time to scale out

    View Slide

  90. View Slide

  91. require 'dcell'
    DCell.start :id => "worker",
    :addr => "tcp://127.0.0.1:9001"
    class Worker
    include Celluloid
    end
    Worker.supervise_as :worker

    View Slide

  92. require 'dcell'
    DCell.start :id => "worker",
    :addr => "tcp://127.0.0.1:9001"
    class Worker
    include Celluloid
    end
    Worker.supervise_as :worker
    Register a Node

    View Slide

  93. require 'dcell'
    DCell.start :id => "worker",
    :addr => "tcp://127.0.0.1:9001"
    class Worker
    include Celluloid
    end
    Worker.supervise_as :worker
    Register a Node
    ømq

    View Slide

  94. require 'dcell'
    DCell.start :id => "worker",
    :addr => "tcp://127.0.0.1:9001"
    class Worker
    include Celluloid
    end
    Worker.supervise_as :worker
    Register a Node
    ømq
    Drop a cell in
    this node

    View Slide

  95. require 'dcell'
    DCell.start :id => "producer",
    :addr => "tcp://127.0.0.1:9002"
    worker_node = DCell::Node["worker"]
    worker = worker_node[:worker]
    worker.do_hard_stuff

    View Slide

  96. require 'dcell'
    DCell.start :id => "producer",
    :addr => "tcp://127.0.0.1:9002"
    worker_node = DCell::Node["worker"]
    worker = worker_node[:worker]
    worker.do_hard_stuff
    Grab a node from
    the network

    View Slide

  97. require 'dcell'
    DCell.start :id => "producer",
    :addr => "tcp://127.0.0.1:9002"
    worker_node = DCell::Node["worker"]
    worker = worker_node[:worker]
    worker.do_hard_stuff
    Grab a node from
    the network
    Grab a cell
    from the node

    View Slide

  98. require 'dcell'
    DCell.start :id => "producer",
    :addr => "tcp://127.0.0.1:9002"
    worker_node = DCell::Node["worker"]
    worker = worker_node[:worker]
    worker.do_hard_stuff
    Grab a node from
    the network
    Grab a cell
    from the node
    HOLY SHIT!

    View Slide

  99. View Slide

  100. Oh ya, you can
    cluster nodes for
    massive pwnage

    View Slide

  101. Now it’s up to
    you to do hard work

    View Slide