Slide 1

Slide 1 text

Greenlet-based concurrency Goran Peretin @gperetin

Slide 2

Slide 2 text

Who am I? ✤ Freelancer ✤ Interested in concurrent, parallel and distributed systems

Slide 3

Slide 3 text

What is this about? ✤ understand what is ✤ when should you use ✤ concurrency as execution model (as opposed to composition model)

Slide 4

Slide 4 text

There will be no... ✤ Turnkey solutions ✤ GIL ✤ Details

Slide 5

Slide 5 text

Buzzwords ahead!

Slide 6

Slide 6 text

✤ concurrent vs parallel execution ✤ cooperative vs preemptive multitasking ✤ CPU bound vs IO bound task ✤ thread-based vs event-based concurrency

Slide 7

Slide 7 text

Mandatory definitions

Slide 8

Slide 8 text

Parallel execution ✤ Simultaneous execution of multiple tasks ✤ Must have multiple CPUs

Slide 9

Slide 9 text

Concurrent execution ✤ Executing multiple tasks in the same time frame ✤ ... but not necessarily at the same time ✤ Doesn’t require multiple CPU cores

Slide 10

Slide 10 text

Why do we want concurrent execution? ✤ We need it - more tasks than CPUs ✤ CPU is much faster than anything else

Slide 11

Slide 11 text

Thread-based concurrecy ✤ Executing multiple threads in the same time frame ✤ OS scheduler decides which thread runs when

Slide 12

Slide 12 text

How OS scheduler switches tasks? ✤ When current thread does IO operation ✤ When current thread used up it’s time slice

Slide 13

Slide 13 text

How OS scheduler switches tasks? ✤ When current thread does IO operation ✤ When current thread used up it’s time slice Preemptive multitasking

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Mandatory GIL slide ✤ Global Interpreter Lock ✤ One Python interpreter can run just one thread at any point in time ✤ Only problem for CPU bound tasks

Slide 16

Slide 16 text

CPU bound vs IO bound ✤ CPU bound - time to complete a task is determined by CPU speed ✤ calculating Fibonacci sequence, video processing... ✤ IO bound - does a lot of IO, eg. reading from disk, network requests... ✤ URL crawler, most web applications...

Slide 17

Slide 17 text

Python anyone? ✤ import threading ✤ Python threads - real OS threads

Slide 18

Slide 18 text

Houston, we have a...

Slide 19

Slide 19 text

Problem? ✤ Lots of threads ✤ Thousands

Slide 20

Slide 20 text

Benchmarks!

Slide 21

Slide 21 text

Sample programs ✤ Prog 1: spawn some number of threads - each sleeps 200ms ✤ Prog 2: spawn some number of threads - each sleeps 90s

Slide 22

Slide 22 text

Prog 1 ✤ Sleep 200ms # of threads 100 1K 10K 100K Time 207 ms 327 ms 2.55 s 25.42 s

Slide 23

Slide 23 text

Prog 2 ✤ Sleep 90s # of threads 100 1K 10K 100K RAM ~4.9 GB ~11.8 GB ~82GB ? (256GB)

Slide 24

Slide 24 text

... and more ✤ Number of threads is limited ✤ Preemptive multitasking

Slide 25

Slide 25 text

We need ✤ Fast to create ✤ Low memory footprint ✤ We decide when to switch

Slide 26

Slide 26 text

Green threads!

Slide 27

Slide 27 text

Green threads ✤ Not managed by OS ✤ 1:N with OS threads ✤ User threads, light-weight processes

Slide 28

Slide 28 text

Greenlets ✤ “...more primitive notion of micro- thread with no implicit scheduling; coroutines, in other words.” ✤ C extension

Slide 29

Slide 29 text

Greenlets ✤ Micro-thread ✤ No implicit scheduling ✤ Coroutines

Slide 30

Slide 30 text

Coroutine ✤ Function that can suspend it’s execution and then later resume ✤ Can also be implemented in pure Python (PEP 342) ✤ Coroutines decide when they want to switch

Slide 31

Slide 31 text

Coroutine ✤ Function that can suspend it’s execution and then later resume ✤ Can also be implemented in pure Python (PEP 342) ✤ Coroutines decide when they want to switch Cooperative multitasking

Slide 32

Slide 32 text

Cooperative multitasking ✤ Each task decides when to give others a chance to run ✤ Ideal for I/O bound tasks ✤ Not so good for CPU bound tasks

Slide 33

Slide 33 text

Using greenlets ✤ We need something that will know which greenlet should run next ✤ Our calls must not block ✤ We need something to notify us when our call is done

Slide 34

Slide 34 text

Using greenlets ✤ We need something that will know which greenlet should run next ✤ Our calls must not block ✤ We need something to notify us when our call is done Scheduler

Slide 35

Slide 35 text

Using greenlets ✤ We need something that will know which greenlet should run next ✤ Our calls must not block ✤ We need something to notify us when our call is done Scheduler Event loop

Slide 36

Slide 36 text

Event loop ✤ Listens for events from OS and notifies your app ✤ Asynchronous

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

✤ Scheduler ✤ Event loop Greenlets + ...

Slide 39

Slide 39 text

Gevent

Slide 40

Slide 40 text

Gevent ✤ “...coroutine-based Python networking library that uses greenlet to provide a high-level synchronous API on top of the libevent event loop.”

Slide 41

Slide 41 text

No content

Slide 42

Slide 42 text

Prog 1 ✤ Sleep 200ms # of threads 100 1K 10K 100K Time 207 ms 327 ms 2.55 s 25.42 s # of Greenlets 100 1K 10K 100K Time 204 ms 223 ms 421 ms 3.06 s

Slide 43

Slide 43 text

Prog 2 ✤ Sleep 90s # of threads 100 1K 10K 100K RAM 4.9 GB 11.8 GB 82GB ? (256GB) # of Greenlets 100 1K 10K 100K Time 33 MB 41 MB 114 MB 858 MB

Slide 44

Slide 44 text

Gevent ✤ Monkey-patching ✤ Event loop

Slide 45

Slide 45 text

Disadvantages ✤ Monkey-patching ✤ Doesn’t work with C extensions ✤ Greenlet implementation details ✤ Hard to debug

Slide 46

Slide 46 text

Alternatives ✤ Twisted ✤ Tornado ✤ Callback based

Slide 47

Slide 47 text

PEP 3156 & Tulip ✤ Attempt to standardize event loop API in Python ✤ Tulip is an implementation

Slide 48

Slide 48 text

Recap ✤ Concurrent execution helps with IO bound applications ✤ Use threads if it works for you ✤ Use async library if you have lots of connections

Slide 49

Slide 49 text

Thank you! ✤ Questions?

Slide 50

Slide 50 text

Resources ✤ http:/ /dabeaz.com/coroutines/Coroutines.pdf ✤ http:/ /www.gevent.org/ ✤ http:/ /greenlet.readthedocs.org/en/latest/