Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Managing concurrent workloads in Ruby

Managing concurrent workloads in Ruby

This is talk about managing concurrent workloads; how to identify work that can be parallelized, and the best strategies to use to get the most out of the resources available to you.

cyclotron3k

November 13, 2018
Tweet

Transcript

  1. Forks, Threads & Events By Aidan Samuel An adventure in

    hyper-productivity Image source: mahbubzaman on Instagram
  2. What are you talking about? This is talk about managing

    concurrent workloads; how to identify work that can be parallelized, and the best strategies to use. It’s all about maximizing the use of your resources (or even better, someone else’s resources). Image source: mahbubzaman on Instagram
  3. Why concurrency? Most modern computers provide multiple cores and obscene

    amounts of RAM. But our Ruby applications are often not making the most of these resources, being single threaded by default. Not only that, but the resources that we interact will be subject to various bottlenecks, bottlenecks that can often be circumvented by using concurrent programming techniques.
  4. Good problems to solve with concurrency CPU bound problems •

    Factoring primes • Mining bitcoins IO bound problems • Bulk emailing • A web controller making multiple database queries • Web servers handling inbound requests Literally anything in JavaScript
  5. Back to basics What are threads and processes? Processes Have

    a stack, heap, file handles, child processes, and a thread*. Threads Belong to a process. Will share the address space with other threads in the same process.
  6. Ok, how do we use them? Processes Started with fork,

    or by explicitly starting a subprocess with backticks, exec, system, etc. When using fork, the entire process will be duplicated. Memory, file handles and all. Will almost certainly be running in parallel. Threads Threads can be started using the Thread gem (part of the stdlib), or with a number of other thread related libraries. When using threads, you must take care of race conditions, and try to use “thread-safe” libraries*. Watch out for the GIL. Threads will probably not be running in parallel, but often that doesn’t matter. No point doing CPU bound problems with MRI
  7. Thread safety Because threads share memory, it’s easy to make

    nasty mistakes. Consider this: counter = 0 counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] threads = 5.times.map do Thread.new do 100000.times do counters.map! { |counter| counter + 1 } end end end threads.each(&:join) counters => [500000, 500000, 500000, 500000, 500000, 302425, 500000, 377316, 500000, 500000] Source: https://vaneyckt.io/posts/ruby_concurrency_in_praise_of_the_mutex/
  8. Mutex plz Use a mutex for thread-safety: counter = 0

    • mutex = Mutex.new counters = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] threads = 5.times.map do Thread.new do 100000.times do • mutex.synchronize do counters.map! { |counter| counter + 1 } • end end end end threads.each(&:join) counters # => [500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000, 500000] Source: https://vaneyckt.io/posts/ruby_concurrency_in_praise_of_the_mutex/
  9. Parallel It’s a nice gem. Provides an enumerable-style interface to

    threads and processes. counter = 0 addresses = ['[email protected]', '[email protected]', ... 10000x ... '[email protected]'] Parallel.each(addresses.each_slice(1000), in_threads: 4) do |batch| email_provider.upload(batch) end sql_queries = [:users, :permissions, :addresses, :phone_numbers] Parallel.map(sql_queries, in_threads: 4) do |table_name| { table_name => DB[table_name].all } end.map &:merge