Concurrency in Python

D16bc1f94b17ddc794c2dfb48ef59456?s=47 Mosky
March 26, 2015

Concurrency in Python

It is mainly about the multithreading and the multiprocessing in Python, and *in Python's flavor*.

It's also the share at Taipei.py [1].

[1] http://www.meetup.com/Taipei-py/events/220452029/

D16bc1f94b17ddc794c2dfb48ef59456?s=128

Mosky

March 26, 2015
Tweet

Transcript

  1. CONCURRENCY IN PYTHON MOSKY 1

  2. MULTITHREADING & 
 MULTIPROCESSING IN PYTHON MOSKY 2

  3. MOSKY PYTHON CHARMER @ PINKOI 
 MOSKY.TW 3

  4. OUTLINE 4

  5. OUTLINE • Introduction 4

  6. OUTLINE • Introduction • Producer-Consumer Pattern 4

  7. OUTLINE • Introduction • Producer-Consumer Pattern • Python’s Flavor 4

  8. OUTLINE • Introduction • Producer-Consumer Pattern • Python’s Flavor •

    Misc. Techiques 4
  9. INTRODUCTION 5

  10. MULTITHREADING 6

  11. MULTITHREADING • GIL 6

  12. MULTITHREADING • GIL • Only one thread runs at any

    given time. 6
  13. MULTITHREADING • GIL • Only one thread runs at any

    given time. • It still can improves IO-bound problems. 6
  14. MULTIPROCESSING 7

  15. MULTIPROCESSING • It uses fork. 7

  16. MULTIPROCESSING • It uses fork. • Processes can run at

    the same time. 7
  17. MULTIPROCESSING • It uses fork. • Processes can run at

    the same time. • Use more memory. 7
  18. MULTIPROCESSING • It uses fork. • Processes can run at

    the same time. • Use more memory. • Note the initial cost. 7
  19. IS IT HARD? 8

  20. IS IT HARD? • Avoid shared resources. 8

  21. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … 8
  22. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … • Understand Python’s flavor. 8
  23. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … • Understand Python’s flavor. • Then it will be easy. 8
  24. SHARED RESOURCE 9

  25. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW 9
  26. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) 9
  27. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) • But lock causes worse performance and deadlock. 9
  28. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) • But lock causes worse performance and deadlock. • Which is the hard part. 9
  29. DIAGNOSE PROBLEM 10

  30. DIAGNOSE PROBLEM • Where is the bottleneck? 10

  31. DIAGNOSE PROBLEM • Where is the bottleneck? • Divide your

    problem. 10
  32. PRODUCER-CONSUMER PATTERN 11

  33. PRODUCER-CONSUMER PATTERN 12

  34. PRODUCER-CONSUMER PATTERN • A queue 12

  35. PRODUCER-CONSUMER PATTERN • A queue • Producers → A queue

    12
  36. PRODUCER-CONSUMER PATTERN • A queue • Producers → A queue

    • A queue → Consumers 12
  37. PRODUCER-CONSUMER PATTERN • A queue • Producers → A queue

    • A queue → Consumers • Python has built-in Queue module for it. 12
  38. EXAMPLES • https://docs.python.org/2/library/ queue.html#queue-objects • https://github.com/moskytw/mrbus/blob/master/ mrbus/base/pool.py 13

  39. WHY .TASK_DONE? 14

  40. WHY .TASK_DONE? • It’s for .join. 14

  41. WHY .TASK_DONE? • It’s for .join. • When the counter

    goes zero, 
 it will notify the threads which are waiting. 14
  42. WHY .TASK_DONE? • It’s for .join. • When the counter

    goes zero, 
 it will notify the threads which are waiting. • It’s implemented by threading.Condition. 14
  43. 15 THE THREADING MODULE

  44. 15 • Lock — primitive lock: .acquire / .release THE

    THREADING MODULE
  45. 15 • Lock — primitive lock: .acquire / .release •

    RLock — owner can reenter THE THREADING MODULE
  46. 15 • Lock — primitive lock: .acquire / .release •

    RLock — owner can reenter • Semaphore — lock when counter goes zero THE THREADING MODULE
  47. 16

  48. • Condition — 
 .wait for .notify / .notify_all 16

  49. • Condition — 
 .wait for .notify / .notify_all •

    Event — .wait for .set; simplifed Condition 16
  50. • Condition — 
 .wait for .notify / .notify_all •

    Event — .wait for .set; simplifed Condition • with lock: … 16
  51. THE MULTIPROCESSING MODULE 17

  52. THE MULTIPROCESSING MODULE • .Process 17

  53. THE MULTIPROCESSING MODULE • .Process • .JoinableQueue 17

  54. THE MULTIPROCESSING MODULE • .Process • .JoinableQueue • .Pool 17

  55. THE MULTIPROCESSING MODULE • .Process • .JoinableQueue • .Pool •

    … 17
  56. PYTHON’S FLAVOR 18

  57. 19 DAEMONIC THREAD

  58. 19 • It’s not that “daemon”. DAEMONIC THREAD

  59. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. DAEMONIC THREAD
  60. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. • Immediately. DAEMONIC THREAD
  61. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. • Immediately. • Others keep running until return. DAEMONIC THREAD
  62. SO, HOW TO STOP? 20

  63. SO, HOW TO STOP? • Set demon and let Python

    clean it up. 20
  64. SO, HOW TO STOP? • Set demon and let Python

    clean it up. • Let it return. 20
  65. BUT, THE THREAD IS BLOCKING 21

  66. BUT, THE THREAD IS BLOCKING • Set timeout. 21

  67. HOW ABOUT CTRL+C? 22

  68. HOW ABOUT CTRL+C? • Only main thread can receive that.

    22
  69. HOW ABOUT CTRL+C? • Only main thread can receive that.

    • BSD-style. 22
  70. BROADCAST SIGNAL 
 TO SUB-THREAD 23

  71. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. 23
  72. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. 23
  73. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. 23
  74. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. • Just can’t do so. 23
  75. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. • Just can’t do so. • It’s Python. 23
  76. BROADCAST SIGNAL 
 TO SUB-PROCESS 24

  77. BROADCAST SIGNAL 
 TO SUB-PROCESS • Just broadcast the signal

    to sub-processes. 24
  78. BROADCAST SIGNAL 
 TO SUB-PROCESS • Just broadcast the signal

    to sub-processes. • Start with register signal handler:
 signal(SIGINT, _handle_to_term_signal) 24
  79. 25

  80. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) 25
  81. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) • Off the handler:
 signal(signum, SIG_IGN) 25
  82. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) • Off the handler:
 signal(signum, SIG_IGN) • Broadcast:
 killpg(pgid, signum) 25
  83. MISC. TECHIQUES 26

  84. JUST THREAD IT OUT 27

  85. JUST THREAD IT OUT • Or process it out. 27

  86. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) 27
  87. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. 27
  88. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. • “Async” 27
  89. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. • “Async” • And fix some stupid behavior.
 (I meant atexit with multiprocessing.Pool.) 27
  90. COLLECT RESULT SMARTER 28

  91. COLLECT RESULT SMARTER • Put into a safe queue. 28

  92. COLLECT RESULT SMARTER • Put into a safe queue. •

    Use a thread per instance. 28
  93. COLLECT RESULT SMARTER • Put into a safe queue. •

    Use a thread per instance. • Learn “let it go”. 28
  94. EXAMPLES • https://github.com/moskytw/mrbus/blob/master/ mrbus/base/pool.py#L45 • https://github.com/moskytw/mrbus/blob/master/ mrbus/model/core.py#L30 29

  95. MONITOR THEM 30

  96. MONITOR THEM • No one is a master at first.

    30
  97. MONITOR THEM • No one is a master at first.

    • Don’t guess. 30
  98. MONITOR THEM • No one is a master at first.

    • Don’t guess. • Just use a function to print log. 30
  99. BENCHMARK THEM 31

  100. BENCHMARK THEM • No one is a master at first.

    31
  101. BENCHMARK THEM • No one is a master at first.

    • Don’t guess. 31
  102. BENCHMARK THEM • No one is a master at first.

    • Don’t guess. • Just prove it. 31
  103. CONCLUSION 32

  104. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. 32
  105. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. 32
  106. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. 32
  107. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. • Collect your result smarter. 32
  108. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. • Collect your result smarter. • Monitor and benchmark your code. 32