Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Jess Shapiro - Everything at Once: Python's Many Concurrency Models

Jess Shapiro - Everything at Once: Python's Many Concurrency Models

Python makes it incredibly easy to build programs that do what you want. But what happens when you want to do what you want, but with more input? One of the easiest things to do is to make a program concurrent so that you can get more performance on large data sets. But what's involved with that?

Right now, there are any number of ways to do this, and that can be confusing! How does asyncio work? What's the difference between a thread and a process? And what's this Hadoop thing everyone keeps talking about?

In this talk, we'll cover some broad ground of what the different concurrency models available to you as a Python developer are, the tradeoffs and advantages of each, and explain how you can select the right one for your purpose.

https://us.pycon.org/2019/schedule/presentation/222/

PyCon 2019

May 03, 2019
Tweet

More Decks by PyCon 2019

Other Decks in Programming

Transcript

  1. Concurrency ▪ Doing multiple things “at once” ▪ Concurrency isn’t

    just “on” or “off” ▪ Many available options in Python ▪ Asyncio coroutines ▪ Python threads ▪ GIL-released threads ▪ Multiprocessing ▪ Distributed tasks
  2. Minimum Schedulable Unit ▪ Code is made up of semantic

    chunks ▪ How big are the chunks that can be run independently?
  3. Data Sharing and Isolation ▪ How isolated is data between

    tasks? ▪ How long does data stay the same for? ▪ What tools can be used to share data?
  4. Asyncio Coroutines ▪ One coroutine runs at a time ▪

    MSU: “Awaitable Block” ▪ Global state is shared and consistent within each block ▪ Event loop
  5. Python Threads ▪ One thread runs (GIL) ▪ MSU: “Bytecode”

    ▪ Global state is shared, but consistent only for single-bytecode ops* ▪ Combined scheduling
  6. Which of these is a single-bytecode operation? ▪ x +=

    1 ▪ func(**kw) ▪ dict.items() ▪ ‘{y}’.format(y=x.val) Lesson: Bytecode atomicity is essential, but it’s not there for you. Don’t count on it.
  7. GIL-released Threads ▪ Multiple threads run simultaneously ▪ MSU: Host

    processor instruction (x86, etc) ▪ Global state is shared but unreliable ▪ OS-scheduled
  8. Multiprocessing ▪ Multiple processes run simultaneously ▪ MSU: Host processor

    instruction (x86, etc) ▪ Global state starts the same as parent, but evolves independently ▪ OS-scheduled
  9. Distributed Tasks ▪ Multiple tasks run simultaneously ▪ MSU: varies;

    often the entire application for some subset of data ▪ Global state totally independent; often “process-like” ▪ Central orchestrator
  10. When to use each? ▪ Asyncio – Performance is I/O-bound

    rather than CPU-bound – Starting new codebase without synchronous legacy code ▪ Threads – Need preemptive multitasking – Integrate synchronous code – Need fine-grained concurrency – Python “glue” for GIL-unlocked C ▪ Processes – Don’t need substantial inter-task communication – Full parallelism required for Python code ▪ Distributed tasks – Highly-segmentable and distributable workload – Need for shared state minimal – Large enough load to overcome perf overhead of orchestrator
  11. Acknowledgements ▪ Allison Kaptur & Chris Fenner for slide review

    ▪ Friends & mentors for their belief and support ▪ https://carbon.now.sh for snippet images ▪ Mazarine on Market in SF for avocado toast ▪ YOU!