Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Concurrency in Python

Concurrency in Python

It is mainly about the multithreading and the multiprocessing in Python, and *in Python's flavor*.

It's also the share at Taipei.py [1].

[1] http://www.meetup.com/Taipei-py/events/220452029/

Mosky Liu

March 26, 2015
Tweet

More Decks by Mosky Liu

Other Decks in Programming

Transcript

  1. MULTITHREADING • GIL • Only one thread runs at any

    given time. • It still can improves IO-bound problems. 6
  2. MULTIPROCESSING • It uses fork. • Processes can run at

    the same time. • Use more memory. 7
  3. MULTIPROCESSING • It uses fork. • Processes can run at

    the same time. • Use more memory. • Note the initial cost. 7
  4. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … 8
  5. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … • Understand Python’s flavor. 8
  6. IS IT HARD? • Avoid shared resources. • e.g., vars

    or shared memory, files, connections, … • Understand Python’s flavor. • Then it will be easy. 8
  7. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) 9
  8. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) • But lock causes worse performance and deadlock. 9
  9. SHARED RESOURCE • Race condition:
 T1: RW
 T2: RW
 T1+T2:

    RRWW • Use lock → Thread-safe:
 T1+T2: (RW) (RW) • But lock causes worse performance and deadlock. • Which is the hard part. 9
  10. PRODUCER-CONSUMER PATTERN • A queue • Producers → A queue

    • A queue → Consumers • Python has built-in Queue module for it. 12
  11. WHY .TASK_DONE? • It’s for .join. • When the counter

    goes zero, 
 it will notify the threads which are waiting. 14
  12. WHY .TASK_DONE? • It’s for .join. • When the counter

    goes zero, 
 it will notify the threads which are waiting. • It’s implemented by threading.Condition. 14
  13. 15 • Lock — primitive lock: .acquire / .release •

    RLock — owner can reenter THE THREADING MODULE
  14. 15 • Lock — primitive lock: .acquire / .release •

    RLock — owner can reenter • Semaphore — lock when counter goes zero THE THREADING MODULE
  15. 16

  16. • Condition — 
 .wait for .notify / .notify_all •

    Event — .wait for .set; simplifed Condition 16
  17. • Condition — 
 .wait for .notify / .notify_all •

    Event — .wait for .set; simplifed Condition • with lock: … 16
  18. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. DAEMONIC THREAD
  19. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. • Immediately. DAEMONIC THREAD
  20. 19 • It’s not that “daemon”. • Just will be

    killed when Python shutting down. • Immediately. • Others keep running until return. DAEMONIC THREAD
  21. SO, HOW TO STOP? • Set demon and let Python

    clean it up. • Let it return. 20
  22. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. 23
  23. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. 23
  24. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. • Just can’t do so. 23
  25. BROADCAST SIGNAL 
 TO SUB-THREAD • Set a global flag

    when get signal. • Let thread read it before each task. • No, you can’t kill non-daemonic thread. • Just can’t do so. • It’s Python. 23
  26. BROADCAST SIGNAL 
 TO SUB-PROCESS • Just broadcast the signal

    to sub-processes. • Start with register signal handler:
 signal(SIGINT, _handle_to_term_signal) 24
  27. 25

  28. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) 25
  29. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) • Off the handler:
 signal(signum, SIG_IGN) 25
  30. • Realize process context if need:
 pid = getpid()
 pgid

    = getpgid(0)
 proc_is_parent = (pid == pgid) • Off the handler:
 signal(signum, SIG_IGN) • Broadcast:
 killpg(pgid, signum) 25
  31. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) 27
  32. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. 27
  33. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. • “Async” 27
  34. JUST THREAD IT OUT • Or process it out. •

    Let main thread exit earlier. (Looks faster!) • Let main thread keep dispatching tasks. • “Async” • And fix some stupid behavior.
 (I meant atexit with multiprocessing.Pool.) 27
  35. COLLECT RESULT SMARTER • Put into a safe queue. •

    Use a thread per instance. • Learn “let it go”. 28
  36. MONITOR THEM • No one is a master at first.

    • Don’t guess. • Just use a function to print log. 30
  37. BENCHMARK THEM • No one is a master at first.

    • Don’t guess. • Just prove it. 31
  38. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. 32
  39. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. 32
  40. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. • Collect your result smarter. 32
  41. CONCLUSION • Avoid shared resource 
 — or just use

    producer-consumer pattern. • Signals only go main thread. • Just thread it out. • Collect your result smarter. • Monitor and benchmark your code. 32