Threads Aren't Evil

Threads Aren't Evil

To skip to actual technical content go to slide 71 https://speakerdeck.com/schneems/threads-arent-evil?slide=71 the first slides make no sense without narration.

Okay, so threads are pretty evil.

But they are also useful, and given the right development patterns: not impossible to work with.

In this talk you'll look at some real Ruby libraries where threads were applied to accomplish otherwise impossible tasks. We'll look at re-writing a synchronous library to support parallel execution for performance gains. We'll also talk about the operating system internals of exactly what makes a thread a thread.

If you're not comfortable with the "T" word (threads), this talk is the perfect introduction to practical concurrent programming in Ruby.

Db953d125f5cc49756edb6149f1b813e?s=128

Richard Schneeman

September 26, 2017
Tweet

Transcript

  1. Threads Aren’t Evil @schneems

  2. Threads

  3. Are

  4. Not

  5. Evil

  6. None
  7. None
  8. “Ruby doesn’t have real threads”

  9. “Ruby doesn’t have real threads”

  10. “Ruby doesn’t have real threads”

  11. “Ruby is incapable of using threads because of the GIL

  12. “Ruby is incapable of using threads because of the GIL

  13. “Ruby is incapable of using threads because of the GIL

  14. "Rails is not able to run on threads so we

    can't use them in Ruby"
  15. "Rails is not able to run on threads so we

    can't use them in Ruby"
  16. "My ruby code is not thread safe and will not

    run in threads"
  17. What do you hear?

  18. “doesn’t”

  19. “incapable”

  20. “not able”

  21. “will not”

  22. 2012

  23. 2013

  24. 2013

  25. 2013

  26. 2013

  27. “doesn’t”

  28. “incapable”

  29. “not able”

  30. “will not”

  31. None
  32. 1490

  33. None
  34. None
  35. None
  36. 1490

  37. 1490

  38. 1490

  39. 1490

  40. 1490

  41. 1490

  42. 1490

  43. None
  44. цинга

  45. цинга

  46. цинга

  47. цинга

  48. None
  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None
  56. Problem Solved?

  57. None
  58. None
  59. None
  60. None
  61. None
  62. None
  63. None
  64. None
  65. The Future

  66. BTW

  67. Find me @schneems

  68. https:// www. schneems .com

  69. None
  70. None
  71. Who here knows what a thread is?

  72. Process Address Space

  73. Process Address Space - code

  74. Process Address Space - code - data

  75. Process Address Space - code - data - register

  76. Process Address Space - code - data - register -

    stack
  77. Process Address Space - code - data - register -

    stack
  78. #include <stdio.h> int increment(int x) { return x + 1;

    } int main() { int i = 0; printf("Incremented to: %i\n", increment(i)) }
  79. _main: 100000f40: 55 pushq %rbp 100000f41: 48 89 e5 movq

    %rsp, %rbp 100000f44: 48 83 ec 10 subq $16, %rsp 100000f48: c7 45 fc 00 00 00 00 movl $0, -4(%rbp) 100000f4f: c7 45 f8 00 00 00 00 movl $0, -8(%rbp) 100000f56: 83 7d f8 0a cmpl $10, -8(%rbp) 100000f5a: 0f 8d 1f 00 00 00 jge 31 <_main+3F> 100000f60: 48 8d 3d 43 00 00 00 leaq 67(%rip), %rdi 100000f67: b0 00 movb $0, %al 100000f69: e8 1a 00 00 00 callq 26 100000f6e: 89 45 f4 movl %eax, -12(%rbp) 100000f71: 8b 45 f8 movl -8(%rbp), %eax 100000f74: 83 c0 01 addl $1, %eax 100000f77: 89 45 f8 movl %eax, -8(%rbp) 100000f7a: e9 d7 ff ff ff jmp -41 <_main+16> 100000f7f: 8b 45 fc movl -4(%rbp), %eax 100000f82: 48 83 c4 10 addq $16, %rsp 100000f86: 5d popq %rbp 100000f87: c3 retq
  80. _main: 100000f40: 55 pushq %rbp 100000f41: 48 89 e5 movq

    %rsp, %rbp 100000f44: 48 83 ec 10 subq $16, %rsp 100000f48: c7 45 fc 00 00 00 00 movl $0, -4(%rbp) 100000f4f: c7 45 f8 00 00 00 00 movl $0, -8(%rbp) 100000f56: 83 7d f8 0a cmpl $10, -8(%rbp) 100000f5a: 0f 8d 1f 00 00 00 jge 31 <_main+3F> 100000f60: 48 8d 3d 43 00 00 00 leaq 67(%rip), %rdi 100000f67: b0 00 movb $0, %al 100000f69: e8 1a 00 00 00 callq 26 100000f6e: 89 45 f4 movl %eax, -12(%rbp) 100000f71: 8b 45 f8 movl -8(%rbp), %eax 100000f74: 83 c0 01 addl $1, %eax 100000f77: 89 45 f8 movl %eax, -8(%rbp) 100000f7a: e9 d7 ff ff ff jmp -41 <_main+16> 100000f7f: 8b 45 fc movl -4(%rbp), %eax 100000f82: 48 83 c4 10 addq $16, %rsp 100000f86: 5d popq %rbp 100000f87: c3 retq
  81. _main: 100000f40: 55 pushq %rbp 100000f41: 48 89 e5 movq

    %rsp, %rbp 100000f44: 48 83 ec 10 subq $16, %rsp 100000f48: c7 45 fc 00 00 00 00 movl $0, -4(%rbp) 100000f4f: c7 45 f8 00 00 00 00 movl $0, -8(%rbp) 100000f56: 83 7d f8 0a cmpl $10, -8(%rbp) 100000f5a: 0f 8d 1f 00 00 00 jge 31 <_main+3F> 100000f60: 48 8d 3d 43 00 00 00 leaq 67(%rip), %rdi 100000f67: b0 00 movb $0, %al 100000f69: e8 1a 00 00 00 callq 26 100000f6e: 89 45 f4 movl %eax, -12(%rbp) 100000f71: 8b 45 f8 movl -8(%rbp), %eax 100000f74: 83 c0 01 addl $1, %eax 100000f77: 89 45 f8 movl %eax, -8(%rbp) 100000f7a: e9 d7 ff ff ff jmp -41 <_main+16> 100000f7f: 8b 45 fc movl -4(%rbp), %eax 100000f82: 48 83 c4 10 addq $16, %rsp 100000f86: 5d popq %rbp 100000f87: c3 retq
  82. _main: 100000f40: 55 pushq %rbp 100000f41: 48 89 e5 movq

    %rsp, %rbp 100000f44: 48 83 ec 10 subq $16, %rsp 100000f48: c7 45 fc 00 00 00 00 movl $0, -4(%rbp) 100000f4f: c7 45 f8 00 00 00 00 movl $0, -8(%rbp) 100000f56: 83 7d f8 0a cmpl $10, -8(%rbp) 100000f5a: 0f 8d 1f 00 00 00 jge 31 <_main+3F> 100000f60: 48 8d 3d 43 00 00 00 leaq 67(%rip), %rdi 100000f67: b0 00 movb $0, %al 100000f69: e8 1a 00 00 00 callq 26 100000f6e: 89 45 f4 movl %eax, -12(%rbp) 100000f71: 8b 45 f8 movl -8(%rbp), %eax 100000f74: 83 c0 01 addl $1, %eax 100000f77: 89 45 f8 movl %eax, -8(%rbp) 100000f7a: e9 d7 ff ff ff jmp -41 <_main+16> 100000f7f: 8b 45 fc movl -4(%rbp), %eax 100000f82: 48 83 c4 10 addq $16, %rsp 100000f86: 5d popq %rbp 100000f87: c3 retq
  83. #include <stdio.h> int increment(int x) { return x + 1;

    } int main() { int i = 0; printf("Incremented to: %i\n", increment(i)) }
  84. #include <stdio.h> int increment(int x) { return x + 1;

    } int main() { int i = 0; printf("Incremented to: %i\n", increment(i)) }
  85. #include <stdio.h> int increment(int x) { return x + 1;

    } int main() { int i = 0; printf("Incremented to: %i\n", increment(i)) }
  86. #include <stdio.h> int increment(int x) { return x + 1;

    } int main() { int i = 0; printf("Incremented to: %i\n", increment(i)) }
  87. #include <stdio.h> int increment(int x) { return x + 1;

    } int main() { int i = 0; printf("Incremented to: %i\n", increment(i)) }
  88. Thanks to Julia Evans For the example code https://jvns.ca

  89. Process Address Space - code - data - register -

    stack
  90. Process Address Space - code - data - register -

    stack
  91. Process Control Block Process State Process Number Program Counter Registers

    Memory Limits List of open Files Signal Mask CPU Scheduling info (PCB)
  92. None
  93. Process - code - data - register - stack

  94. Process - code - data - register - stack CPU

  95. Process - code - data - register - stack Input

    Output (IO)
  96. Process - code - data - register - stack Input

    Output (IO)
  97. Process - code - data - register - stack Input

    Output (IO)
  98. None
  99. None
  100. None
  101. Pros?

  102. Better CPU utilization

  103. Cons?

  104. Need lots of processes

  105. Processes take up memory which is a finite resource

  106. Loading and unloading a PCB is expensive

  107. Sharing data between processes is HARD

  108. None
  109. None
  110. None
  111. Introducing

  112. Threads

  113. Process - code - data - register - stack

  114. Process - code - data - register - stack

  115. Process - code - data - register - stack

  116. Process register stack ⚡

  117. Process register stack ⚡ ⚡ ⚡ register stack register stack

  118. Pros?

  119. Better CPU utilization

  120. Reuse existing process memory (code and data)

  121. Smaller Context Switch Time

  122. Cons?

  123. Shared code + shared data =

  124. Shared code + shared data =

  125. Threads are simple

  126. Threads are hard

  127. In every language *

  128. It’s possible to avoid threads

  129. Javascript is single threaded

  130. Threading is easier in some languages

  131. Functional languages (can) make threading easy

  132. Functional languages make side effects impossible

  133. Which makes coding harder (IMHO)

  134. OMSCS

  135. Threading sucks (less) in Ruby

  136. Are threads just for performance?

  137. Nope

  138. Thread.new do while @running sleep @timeout @reaper.reap end end

  139. Thread.new do while @running sleep @timeout @reaper.reap end end

  140. Was that scary?

  141. Are threads scary?

  142. (hint: no)

  143. Any other examples?

  144. @background_worker_thread = Thread.new do @background_worker.start { ScoutApm::Debug.instance.call_periodic_hooks ScoutApm::Agent.instance.process_metrics clean_old_percentiles }

    end
  145. What about the GVL?

  146. array = Array.new(100) { String.new } t1 = Thread.new do

    array.each {|x| x.prepend("hello") } end t2 = Thread.new do array.each {|x| x << " world" } end t1.join; t2.join puts array
  147. hello world hello world hello world hello world hello world

    hello world hello world hello world
  148. hello world hello world hello world hello world hello world

    hello world hello world hello world
  149. What did the GVL do?

  150. array = Array.new(100) { String.new } t1 = Thread.new do

    array.each {|x| x.prepend("hello") } end t2 = Thread.new do array.each {|x| x << " world" } end t1.join; t2.join puts array
  151. array = Array.new(100) { String.new } t1 = Thread.new do

    array.each {|x| x.prepend("hello") } end t2 = Thread.new do array.each {|x| x << " world" } end t1.join; t2.join puts array
  152. GVL means we cannot see speed increase just by using

    threads
  153. None
  154. GVL is released when we do IO!

  155. array = Array.new(100) { String.new } t1 = Thread.new do

    uri = URI.parse(“http://example.com/hello") hi = Net::Http.get(uri) array.each {|x| x.prepend(hi) } end t2 = Thread.new do uri = URI.parse(“http://example.com/hello") world = Net::Http.get(uri) array.each {|x| x << world } end t1.join; t2.join puts array
  156. When is the GVL released exactly?

  157. void * rb_thread_call_without_gvl2(void *(*func)(void *), void *data1, rb_unblock_function_t *ubf, void

    *data2) { return call_without_gvl(func, data1, ubf, data2, TRUE); } void * rb_thread_call_without_gvl(void *(*func)(void *data), void *data1, rb_unblock_function_t *ubf, void *data2) { return call_without_gvl(func, data1, ubf, data2, FALSE); }
  158. void * rb_thread_call_without_gvl2(void *(*func)(void *), void *data1, rb_unblock_function_t *ubf, void

    *data2) { return call_without_gvl(func, data1, ubf, data2, TRUE); } void * rb_thread_call_without_gvl(void *(*func)(void *data), void *data1, rb_unblock_function_t *ubf, void *data2) { return call_without_gvl(func, data1, ubf, data2, FALSE); }
  159. Use a debugger to see GVL get released!

  160. $ lldb ruby scratch.rb breakpoint set --name rb_thread_call_without_gvl breakpoint set

    --name rb_thread_call_without_gv2
  161. $ lldb ruby scratch.rb breakpoint set --name rb_thread_call_without_gvl breakpoint set

    --name rb_thread_call_without_gv2
  162. $ lldb ruby scratch.rb breakpoint set --name rb_thread_call_without_gvl breakpoint set

    --name rb_thread_call_without_gv2
  163. Now run

  164. $ lldb ruby scratch.rb breakpoint set --name rb_thread_call_without_gvl breakpoint set

    --name rb_thread_call_without_gv2 run
  165. When we hit a pause, find the backtrace with

  166. call rb_backtrace()

  167. require ‘net/http' url = “https://www.schneems.com" uri = URI.parse(url) response =

    Net::HTTP.get(uri) puts response
  168. require ‘net/http' url = “https://www.schneems.com" uri = URI.parse(url) response =

    Net::HTTP.get(uri) puts response
  169. require ‘net/http' url = “https://www.schneems.com" uri = URI.parse(url) response =

    Net::HTTP.get(uri) puts response
  170. require ‘net/http' url = “https://www.schneems.com" uri = URI.parse(url) response =

    Net::HTTP.get(uri) puts response
  171. require ‘net/http' url = “https://www.schneems.com" uri = URI.parse(url) response =

    Net::HTTP.get(uri) puts response
  172. require ‘net/http' url = “https://www.schneems.com" uri = URI.parse(url) response =

    Net::HTTP.get(uri) puts response
  173. None of the above?

  174. Use the debugger!

  175. 1309 { 1310 void *val = 0; 1311 -> 1312

    rb_thread_t *th = GET_THREAD(); 1313 int saved_errno = 0; 1314 1315 if (ubf == RUBY_UBF_IO || ubf == RUBY_UBF_PROCESS) { (lldb) call rb_backtrace() from <internal:gem_prelude>:4:in `<internal:gem_prelude>' from <internal:gem_prelude>:4:in `require' from.../rubygems.rb:1363:in `<top (required)>' from.../rubygems/specification.rb:873:in `load_defaults' from.../rubygems/specification.rb:821:in `each_spec' from.../rubygems/specification.rb:743:in `each_gemspec' from.../rubygems/specification.rb:743:in `each' from.../rubygems/specification.rb:744:in `block in each_gemspec' from.../rubygems/specification.rb:744:in `each' from.../rubygems/specification.rb:745:in `block (2 levels) in each_gemspec' from.../rubygems/specification.rb:822:in `block in each_spec' from.../rubygems/specification.rb:1161:in `load' from.../rubygems/specification.rb:1161:in `read' (lldb)
  176. require ‘net/http' url = “https://www.schneems.com" uri = URI.parse(url) response =

    Net::HTTP.get(uri) puts response
  177. I told a small lie

  178. void * rb_thread_call_without_gvl2(void *(*func)(void *), void *data1, rb_unblock_function_t *ubf, void

    *data2) { return call_without_gvl(func, data1, ubf, data2, TRUE); } void * rb_thread_call_without_gvl(void *(*func)(void *data), void *data1, rb_unblock_function_t *ubf, void *data2) { return call_without_gvl(func, data1, ubf, data2, FALSE); }
  179. #define GVL_UNLOCK_BEGIN() do { \ rb_thread_t *_th_stored = GET_THREAD(); \

    RB_GC_SAVE_MACHINE_CONTEXT(_th_stored); \ gvl_release(_th_stored->vm); #define GVL_UNLOCK_END() \ gvl_acquire(_th_stored->vm, _th_stored); \ rb_thread_set_current(_th_stored); \ } while(0)
  180. BTW, you don’t need to know how to do this.

  181. I’m just showing you it’s not magic

  182. Recap: Threads are not…

  183. Recap: Threads are not evil

  184. Recap: GVL is not magic

  185. How do we start using threads?

  186. Make a Toy threading library!

  187. It’s fun I promise

  188. Make a worker thread

  189. require ‘thread my_queue = Queue.new my_thread = Thread.new do loop

    do job = my_queue.pop job.call end end
  190. require ‘thread my_queue = Queue.new my_thread = Thread.new do loop

    do job = my_queue.pop job.call end end
  191. require ‘thread my_queue = Queue.new my_thread = Thread.new do loop

    do job = my_queue.pop job.call end end
  192. require ‘thread my_queue = Queue.new my_thread = Thread.new do loop

    do job = my_queue.pop job.call end end
  193. Use a worker thread via a “boss” i.e. main

  194. 10.times { job = Proc.new { puts "hello" } my_queue.push(job)

    }
  195. It works!

  196. But there are bugs

  197. How can we be sure that all jobs are finished?

  198. Join the thread

  199. require 'thread' my_queue = Queue.new my_thread = Thread.new do loop

    do job = my_queue.pop job.call end end 10.times { job = Proc.new { puts "hello" } my_queue.push(job) } my_thread.join
  200. Infinite loop

  201. Why?

  202. require ‘thread my_queue = Queue.new my_thread = Thread.new do loop

    do job = my_queue.pop job.call end end
  203. Poison our queue

  204. my_queue = Queue.new POISON = Object.new my_thread = Thread.new do

    loop do job = my_queue.pop break if job == POISON job.call end end
  205. require 'thread' my_queue = Queue.new POISON = Object.new my_thread =

    Thread.new do loop do job = my_queue.pop break if job == POISON job.call end end 10.times { job = Proc.new { puts "hello" } my_queue.push(job) } my_queue.push(POISON) my_thread.join
  206. Exercise: Build a thread pool that has multiple threads

  207. To poison multiple workers, add multiple poison objects into pool

  208. Threads are not…

  209. Threads are not evil

  210. GVL is not magic

  211. There are other thread constructs

  212. Mutex

  213. Condition Variable

  214. You can ignore most of them if you use a

    queue and boss/ worker pattern.
  215. How do I _actually_ use threads?

  216. concurrent_ruby

  217. require 'concurrent' 10.times { Concurrent::Promise.execute( executor: :fast) do puts "hello"

    end }
  218. require 'concurrent' 10.times { Concurrent::Promise.execute( executor: :fast) do puts "hello"

    end }
  219. require 'concurrent' 10.times { Concurrent::Promise.execute( executor: :fast) do puts "hello"

    end }
  220. Protip!

  221. Always call wait!

  222. Sprockets Example

  223. After we process an asset we have to write it

    to disk
  224. We also want a gzip file etc.

  225. File writes release the GVL

  226. Use Concurrent Ruby!

  227. promise = Concurrent::Promise.execute( executor: executor) do exporter.call end concurrent_exporters <<

    promise
  228. promise = Concurrent::Promise.execute( executor: executor) do exporter.call end concurrent_exporters <<

    promise
  229. concurrent_exporters.each(&:wait!)

  230. Here’s how to use threads

  231. Find where the GVL is released

  232. GVL is released where there is IO

  233. Where there is IO, threads are an amazing performance fit

  234. Create a job

  235. Make sure the job finishes

  236. Handle shared variable access (via atomic structures)

  237. Handle shared variable access (via atomic structures)

  238. What about guilds?

  239. Guilds aren’t released

  240. Can still use threads from within a guild

  241. Remember our explorers

  242. 1875

  243. 1875

  244. 1875

  245. 1875

  246. ☠☠☠

  247. Webrick is slow

  248. Webrick is slow

  249. Talk about what you DON’T know

  250. Talk about what you DON’T know

  251. Talk about what you DON’T know

  252. Share assumptions

  253. Share assumptions

  254. Threads are a tool

  255. Threads can be useful

  256. Threads can be fast

  257. None
  258. None
  259. None
  260. None
  261. None
  262. None
  263. Threads

  264. Are

  265. Not

  266. Evil

  267. Any Questions?

  268. None