Slide 1

Slide 1 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Python Concurrency and Distributed Computing Workshop 1 November 1-4, 2011 Chicago David Beazley (http://www.dabeaz.com)

Slide 2

Slide 2 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Welcome! • Welcome to Andersonville! • Hope you're ready to code • ... and drink lots of coffee 2

Slide 3

Slide 3 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Overview - Day 1 3 Threads and Tasks

Slide 4

Slide 4 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Overview - Day 2 4 Threads and Tasks Multiprocessing

Slide 5

Slide 5 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Overview - Day 2 5 Threads and Tasks Multiprocessing Async/Events

Slide 6

Slide 6 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Overview - Day 2 6 Threads and Tasks Multiprocessing Async/Events Message Passing

Slide 7

Slide 7 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Overview - Day 3 7 Threads and Tasks Multiprocessing Async/Events Message Passing Distributed Computing

Slide 8

Slide 8 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Overview - Day 4 8 Threads and Tasks Multiprocessing Async/Events Message Passing Distributed Computing Generators Coroutines

Slide 9

Slide 9 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Overview - Day 4 9 Threads and Tasks Multiprocessing Async/Events Message Passing Distributed Computing Generators Coroutines x head explosion x

Slide 10

Slide 10 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Important • There are going to be a lot of moving parts • Make sure you can cut/paste code • Make sure you look at solution code • Make sure you copy solution code as needed 10

Slide 11

Slide 11 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com Introductions 11

Slide 12

Slide 12 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Introduction 1 Section 1

Slide 13

Slide 13 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Requirements • There is a set of support files and exercises 2 http://www.dabeaz.com/python/concurrent2011/ • We are using Python 3.2 (concurrent.zip) • Optional installs • numpy • ZeroMQ and pyzmq • Redis

Slide 14

Slide 14 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Concurrency • A topic that's on a lot of programmer's minds • Multicore CPUs, clusters, distributed computing, cloud computing, etc. • Increased interest in concurrent and functional programming languages (e.g., Erlang, Scala, Clojure, Haskell, etc.) 3

Slide 15

Slide 15 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com My Personal Interest • Concurrent programming is cool (and fun) • In fact, it's the whole reason I went into CS • As a book author, I'm interested in exploring different ways to understand and present advanced programming concepts • Eventually, this workshop will become a book, but that's for later (I want your feedback) 4

Slide 16

Slide 16 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Basic Concepts 5

Slide 17

Slide 17 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Concurrent Programming • Applications that work on more than one thing at a time--possibly spread out over a whole cluster of machines • Example : A network server that communicates with several hundred clients all connected at once • Example : A big number crunching job that spreads its work across hundreds of CPUs 6

Slide 18

Slide 18 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Multitasking 7 • On a single machine, concurrency typically implies "multitasking" run run run run run Task A: Task B: task switch • If a single CPU, the operating system rapidly switches back and forth

Slide 19

Slide 19 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Parallel Processing 8 • You may have parallelism (many CPUs) • Here, you often get simultaneous task execution run run run run run Task A: Task B: run CPU 1 CPU 2 • Note: If the total number of tasks exceeds the number of CPUs, then each CPU also multitasks

Slide 20

Slide 20 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Shared Memory 9 • Tasks may run in the same memory space run run run run run Task A: Task B: run CPU 1 CPU 2 object write read • Simultaneous access to objects • Often a source of unspeakable peril Process

Slide 21

Slide 21 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Processes 10 • Tasks might run in separate processes run run run run run Task A: Task B: run CPU 1 CPU 2 • Processes coordinate using IPC • Pipes, FIFOs, memory mapped regions, etc. Process Process IPC

Slide 22

Slide 22 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Distributed Computing 11 • Tasks may be running on distributed systems run run run run run Task A: Task B: run messages • For example, a cluster of workstations • Or servers out in the "cloud."

Slide 23

Slide 23 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Why Python? 12

Slide 24

Slide 24 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Some Issues • Python is interpreted 13 • Frankly, it doesn't seem like a natural match for this kind of programming • Isn't this a serious business left to more "serious" programming languages? "What the hardware giveth, the software taketh away."

Slide 25

Slide 25 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com "Enterprise" Python • Is using Python even appropriate? • Traditionally, distributed computing and parallelism has only been available to those programmers working at elite institutions with deep pockets. (and they don't tend to use "hobby" languages) • Times change. Cheap machines have multiple CPU cores. Anyone can purchase CPU time out in the "cloud" (even my mom). 14

Slide 26

Slide 26 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Why Use Python at All? • It's very high level • And it comes with a large library • Useful data types (dictionaries, lists,etc.) • Network protocols • Text parsing (regexs, XML, HTML, etc.) • Files and the file system • Databases • Python programmers think it's awesome 15

Slide 27

Slide 27 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Python as a Framework • Python is often used as a high-level framework • The various components might be a mix of languages (Python, C, C++, etc.) • Concurrency may be a core part of the framework's overall architecture • Python has to deal with it even if a lot of the underlying processing is going on in C 16

Slide 28

Slide 28 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Programming Productivity • Programmers are often able to get complex systems to "work" in much less time using a high-level language like Python than if they're spending all of their time hacking C code. 17 "The best performance improvement is the transition from the nonworking to the working state." - John Ousterhout "You can always optimize it later." - Unknown "Premature optimization is the root of all evil." - Donald Knuth

Slide 29

Slide 29 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Performance is Irrelevant • Many programs are "I/O bound" • They spend virtually all of their time sitting around waiting • Python can "wait" just as fast as C (maybe even faster--although I haven't measured it). • If there's not much processing, who cares if it's being done in an interpreter? (One exception : if you need an extremely rapid response time as in real-time systems) 18

Slide 30

Slide 30 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com You Can Go Faster • Python can be extended with C code • Look at ctypes, Cython, Swig, etc. • PyPy? 19

Slide 31

Slide 31 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Special Cases • Of course, there are special cases where you probably would not want to use Python • Example : • Flight avionics • Nuclear power plants • Sharks with laser beams • Fine, we won't be talking about that. There are still many other applications where Python makes a lot of sense. 20

Slide 32

Slide 32 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com The Roadmap 21

Slide 33

Slide 33 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Class Overview • Day 1: Threads and multitasking concepts • Day 2: Threads, multiprocessing, async I/O • Day 3: Messaging, distributed computing • Day 4: Coroutines 22

Slide 34

Slide 34 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Major Themes • Software architecture. We're going to spend a lot of time studying different concurrency approaches and designs • Tradeoffs. The good and bad of different design choices. • Doing it yourself. We're going to build all sorts of cool things from scratch. I think it will be fun and inspiring. 23

Slide 35

Slide 35 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Not a Focus • Just plugging stuff into someone's already built programming framework • Frameworks are fine, but this workshop is about core concepts • It's also not promoting any specific set of tools (or even Python itself all that much) • We will look at some frameworks as we go 24

Slide 36

Slide 36 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com A Caution • Many of the course exercises are advanced • Try to do things yourself if you can • Know that solution code is always given • You should absolutely be looking at the solution code throughout this workshop 25

Slide 37

Slide 37 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com A Final Caution • We're going to write a lot of code, but that code should be viewed as a kind of "sketch" • An exploration of different ideas • It's not "production ready" • You would need to add a lot (testing, corner cases, error checking, etc.) 26

Slide 38

Slide 38 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com Let's Get Started! 27

Slide 39

Slide 39 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Python Multithreading 1 Section 2

Slide 40

Slide 40 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Overview • Introduction to thread programming • How to reliably use threads • Some programming idioms and common practices related to working with threads • Details on Python interpreter execution 2

Slide 41

Slide 41 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Secondary Objective • Develop common programming patterns and designs related to all forms of concurrent and distributed computing (not just threads) • Investigate a variety of issues generally related to concurrently executing tasks (debugging, control, synchronization, messaging, etc.) 3

Slide 42

Slide 42 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Background : Threads • Understanding issues related to thread programming is essential for any kind of work with concurrent programming • First, threads are one of the oldest and most widely used approaches (so, if you ignore them, you're missing the big picture) • Problems associated with threads tend to underly other concurrency techniques (thus threads are the poster-child for every horrible thing that can go wrong) 4

Slide 43

Slide 43 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Threads in Practice • Threads are an important part of almost every framework or library related to network programming and distributed computing • Even in libraries where users don't see threads • Sometimes threads are deeply buried inside libraries to deal with very specific tasks related to I/O, waiting, and synchronization • Examples : multiprocessing, twisted, etc. 5

Slide 44

Slide 44 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Concept: Threads • What many programmers think of when they hear about "concurrent programming" • An independent task running inside a program • Shares resources with the main program (memory, files, network connections, etc.) • Has its own independent flow of execution (stack, current instruction, etc.) 6

Slide 45

Slide 45 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Basics 7 % python program.py Program launch. Python loads a program and starts executing statements statement statement ... "main thread"

Slide 46

Slide 46 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Basics 8 % python program.py Creation of a thread. Launches a callable. statement statement ... create thread(foo) def foo():

Slide 47

Slide 47 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Basics 9 % python program.py Concurrent execution of statements statement statement ... create thread(foo) def foo(): statement statement ... statement statement ...

Slide 48

Slide 48 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Basics 10 % python program.py thread terminates on return or exit statement statement ... create thread(foo) def foo(): statement statement ... statement statement ... return or exit statement statement ...

Slide 49

Slide 49 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Basics 11 % python program.py statement statement ... create thread(foo) def foo(): statement statement ... statement statement ... return or exit statement statement ... Key idea: Thread is like a little "task" that independently runs inside your program thread

Slide 50

Slide 50 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- threading Module • How to launch a callable in a separate thread import threading import time def countdown(count): while count > 0: print("Counting down", count) count -= 1 time.sleep(5) t = threading.Thread(target=countdown,args=(10,)) t.start() • Thread() creates a thread object • start() method makes it run 12

Slide 51

Slide 51 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Objects • Alternatively: A class that inherits from Thread class CountdownThread(threading.Thread): def __init__(self,count): threading.Thread.__init__(self) self.count = count def run(self): while self.count > 0: print("Counting down", self.count) self.count -= 1 time.sleep(5) thr = CountdownThread(10) thr.start() • Comment: I don't like this approach because it entangles your code with the implementation of the Thread class (prefer decoupling) 13

Slide 52

Slide 52 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Execution • A thread runs until the specified callable exits • Unlike a function call, there is no return value (if a value is returned, it is ignored) • If a thread dies with an exception, it does not stop your whole program--just that one thread terminates (although you will see a traceback) • Emphasis: Threads execute independently from the code that launched them 14

Slide 53

Slide 53 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Operations • Once started, threads only have a few operations t.is_alive() # Check if thread t is still running t.join([timeout]) # Wait for thread t to exit t.name # Access the thread name • You can't suspend threads • You can't signal threads • You can't kill threads • More later... 15 "It's Alive!!!!!!!!!"

Slide 54

Slide 54 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Interpreter Execution • With threads, the Python interpreter stays alive until all threads exit • This is even true if the "main thread" exits • Common confusion: The main program exits, but the interpreter doesn't quit because other threads are still executing 16

Slide 55

Slide 55 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Daemonic Threads • If a thread runs forever, make it "daemonic" t = threading.Thread(target=func) t.daemon = True t.start() • It is standard practice to use this for all non- terminating background tasks • I tend to use it for all threads 17 • Daemonic threads get killed on interpreter exit

Slide 56

Slide 56 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.1 18

Slide 57

Slide 57 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Using Threads • Creating threads is easy • It's awesome! Your program running in multiple places at the same time • Now what?!? 19

Slide 58

Slide 58 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Using Threads • There are two primary uses of threads. • Waiting. Program has to wait for I/O, an event, or for some other reason, but has other work that still needs to be carried out elsewhere. • Subdivision of work. Take a big computational problem and subdivide into threads so that you can use multiple CPUs or cores (parallelism) • In Python, threads are almost exclusively used for waiting (especially I/O). 20

Slide 59

Slide 59 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The Waiting Problem 21 • An example: Reading on a socket data = s.recv(1024) • This operation blocks until data is available • "blocks" - The program stops and waits • If no concurrency, then everything stops • Obviously, this can be undesirable (e.g., what if there's a GUI or if it's a game?)

Slide 60

Slide 60 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Multiple Clients 22 • Blocking is a major concern for servers s1 s2 s3 s1.recv() s2.recv() s3.recv() server • If recv() blocks, how can multiple clients work? clients

Slide 61

Slide 61 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Multiple Clients 23 • A solution: handle each client in a thread s1 s2 s3 s1.recv() s2.recv() s3.recv() server thread-1 thread-2 thread-3 • recv() still blocks, but it only affects one thread • Other threads can still run (life is good)

Slide 62

Slide 62 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Concurrent Network Server 24 def handle_client(client_sock,client_addr): with client_sock: ... do stuff with the client ... def run_server(serv_sock): while True: # Wait for a new client connection c,a = serv_sock.accept() # Spawn a thread to handle it cthr = threading.Thread(target=handle_client, args=(c,a)) cthr.daemon = True cthr.start() • A simple threaded server template • Idea: Launch a new thread on each connection

Slide 63

Slide 63 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.2 25

Slide 64

Slide 64 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Interlude • As mentioned, creating threads is really easy • You can create thousands of them if you want • Developing with threads is hard • Really hard 26 Q: Why did the multithreaded chicken cross the road? A: to To other side. get the -- Jason Whittington

Slide 65

Slide 65 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Threads and Memory • Although threads have independent control flow, threads share the same memory • So, all threads see global variables • Multiple threads can hold references to the same object and access it independently • Threads can also share files, sockets, etc. 27

Slide 66

Slide 66 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Shared Memory • Example 28 items = [] # A global variable def foo(): ... items.append(x) ... def bar(): ... y = items.pop() ... t1 = Thread(target=foo); t1.start() t2 = Thread(target=bar); t2.start() These operations are both manipulating the global variable "items" • There is danger here: More shortly...

Slide 67

Slide 67 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Shared Objects • The same sharing applies to passed arguments 29 def foo(obj): ... obj.method(args) ... def bar(obj): ... obj.method(args) ... obj = SomeObject() t1 = Thread(target=foo, args=(obj,)) t2 = Thread(target=bar, args=(obj,)) t1.start(); t2.start(); This might be the same object in both threads (remember, Python functions don't copy arguments) • Again, more danger lurks...

Slide 68

Slide 68 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Nondeterminism • Thread execution is non-deterministic • Operations that take several steps might be interrupted mid-stream (non-atomic) • Thus, access to shared data structures is also non-deterministic (which is a really good way to have your head explode) 30

Slide 69

Slide 69 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Synchronization Problems • Events. An action must be performed in response to an event in a different thread • Concurrent updates. Multiple threads that update a shared value. 31

Slide 70

Slide 70 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Event Synchronization • Consider a shared value x = 0 • One thread sets the value, another reads it Thread-1 -------- ... x = 42 ... Thread-2 -------- ... print(x) ... • Problem : Which thread runs first? • Answer : It could be either one... 32

Slide 71

Slide 71 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Concurrent Updates • Consider a shared value x = 0 • And two threads that modify it Thread-1 -------- ... x = x + 1 ... Thread-2 -------- ... x = x - 1 ... • Here, it's possible that the resulting value will be corrupted due to thread scheduling 33

Slide 72

Slide 72 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Concurrent Updates • The two threads Thread-1 -------- ... x = x + 1 ... Thread-2 -------- ... x = x - 1 ... • Low level interpreter execution Thread-1 -------- LOAD_GLOBAL 1 (x) LOAD_CONST 2 (1) BINARY_ADD STORE_GLOBAL 1 (x) Thread-2 -------- LOAD_GLOBAL 1 (x) LOAD_CONST 2 (1) BINARY_SUB STORE_GLOBAL 1 (x) thread switch 34 thread switch

Slide 73

Slide 73 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Concurrent Updates • Low level interpreter code Thread-1 -------- LOAD_GLOBAL 1 (x) LOAD_CONST 2 (1) BINARY_ADD STORE_GLOBAL 1 (x) Thread-2 -------- LOAD_GLOBAL 1 (x) LOAD_CONST 2 (1) BINARY_SUB STORE_GLOBAL 1 (x) thread switch 35 thread switch These operations get performed with a "stale" value of x. The computation in Thread-2 is lost.

Slide 74

Slide 74 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Atomicity • Can you assume any operations are atomic? 36 alist.append(x) item = alist.pop() adict[key] = value del adict[key] ... • In general, you can't assume anything • Many implementations (jython, pypy, etc.) • Might be a user-defined class • "Atomic" meaning noninterruptible

Slide 75

Slide 75 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Race Conditions • If program behavior depends on thread scheduling, you have a "race condition." • It's often rather diabolical--a program may produce slightly different results each time it runs (even though you aren't using any random numbers) • Or it may just have a mysterious gremlin that shows up every couple of weeks 37

Slide 76

Slide 76 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Synchronization • Identifying and fixing a race condition will make you a better programmer (e.g., it "builds character") • However, you'll probably never get that month of your life back... • To fix : You have to synchronize threads (e.g., coordinate their execution so that things happen in the right order) 38

Slide 77

Slide 77 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Events • Event Objects e = threading.Event() e.is_set() # Return True if event set e.set() # Set event e.clear() # Clear event e.wait([timeout]) # Wait for event • Used to make one thread wait for an event to occur in another thread 39

Slide 78

Slide 78 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Event Waiting • Using an event to synchronize execution order x = 0 x_event = threading.Event() 40 Thread-1 -------- ... x = 42 x_event.set() ... Thread-2 -------- ... x_event.wait() print(x) ... signals • Caution : Events only have one-time use • Can use to make sure threads do things in a specific order

Slide 79

Slide 79 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Mutex Locks • Mutual Exclusion Lock m = threading.Lock() • Used to synchronize threads so that only one thread can make modifications to shared data at any given time • Think transactions 41

Slide 80

Slide 80 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Mutex Locks • Using a lock with m: # Acquires the lock statements statements # Releases the lock statements • Key feature: Only one thread can execute inside the 'with' statement at once • If lock is already in use, a thread waits 42

Slide 81

Slide 81 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Use of Mutex Locks • Commonly used to enclose "critical sections" x = 0 x_lock = threading.Lock() 43 Thread-1 -------- ... with x_lock: x = x + 1 ... Thread-2 -------- ... with x_lock: x = x - 1 ... Critical Section • Only one thread can execute in critical section at a time (lock gives exclusive access)

Slide 82

Slide 82 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Using a Mutex Lock • It is your responsibility to identify and lock all "critical sections" 44 x = 0 x_lock = threading.Lock() Thread-1 -------- ... with x_lock: x = x + 1 ... Thread-2 -------- ... x = x - 1 ... If you use a lock in one place, but not another, then you're missing the whole point. All modifications to shared state must be enclosed by the with statement.

Slide 83

Slide 83 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Locks and Deadlock • If a thread already acquired a lock, don't have it acquire it again---everything will freeze 45 lock = threading.Lock() def foo(): with lock: # Freezes here (lock in use) statements def bar(): with lock: foo() bar() • Sometimes occurs if you try to reuse the same lock for too many things in your program (e.g., if you just had one global lock for everything)

Slide 84

Slide 84 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Locks and Deadlock • Never write code that acquires more than one mutex lock at a time 46 x = 0 y = 0 x_lock = threading.Lock() y_lock = threading.Lock() with x_lock: statements using x ... with y_lock: statements using x and y ... • This almost invariably ends up creating a program that mysteriously deadlocks (see dining philosophers)

Slide 85

Slide 85 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Alternate Interface • Alternate interface for locks 47 x = 0 x_lock = threading.Lock() x_lock.acquire() statements using x ... x_lock.release() • Very tricky to use correctly due to issues with exception handling • Better to use the 'with' statement

Slide 86

Slide 86 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.3 48

Slide 87

Slide 87 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Other Kinds of Locks • The threading library also defines • RLock • Semaphore • Condition • What are these used for? 49

Slide 88

Slide 88 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- RLock • Reentrant Mutex Lock m = threading.RLock() # Create a lock m.acquire() # Acquire the lock m.release() # Release the lock • Similar to a normal lock except that it can be reacquired multiple times by the same thread • However, each acquire() must have a release() • Common use : Code-based locking (where you're locking function/method execution as opposed to data access) 50

Slide 89

Slide 89 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- RLock Example • Only allow one thread to execute methods in a class at a given time class Foo: _lock = threading.RLock() def bar(self): with Foo._lock: ... def spam(self): with Foo._lock: ... self.bar() ... 51 • Observe : Once any method is called, all of the methods are locked until the method returns • Nested calls and recursion are okay

Slide 90

Slide 90 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Semaphores • A counter-based synchronization primitive m = threading.Semaphore(n) # Create a semaphore m.acquire() # Acquire m.release() # Release • acquire() - Waits if the count is 0, otherwise decrements the count and continues • release() - Increments the count and signals waiting threads (if any) • Unlike locks, acquire()/release() can be called in any order and by any thread 52

Slide 91

Slide 91 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Semaphore Uses • Limiting concurrency. You can limit the number of threads performing certain operations. For example, performing database queries, making network connections, etc. • Signaling. Semaphores can be used to send "signals" between threads. For example, having one thread wake up another thread. 53

Slide 92

Slide 92 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Limiting Concurrency • Using a semaphore to limit concurrency _fetch_limit = threading.Semaphore(5) # Max: 5-threads def fetch_page(url): with _fetch_limit: u = urllib.urlopen(url) return u.read() 54 • In this example, only 5 threads can be executing in the function at once (if more want to run, they have to wait)

Slide 93

Slide 93 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Signaling • Using a semaphore to signal done = threading.Semaphore(0) 55 ... statements statements statements done.release() done.acquire() statements statements statements ... Thread 1 Thread 2 • Here, acquire() and release() occur in different threads and in a different order • Sometimes used in queuing problems

Slide 94

Slide 94 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Condition Variables • Condition Objects cv = threading.Condition([lock]) cv.acquire() # Acquire the underlying lock cv.release() # Release the underlying lock cv.wait() # Wait for condition cv.notify() # Signal that a condition holds cv.notifyAll() # Signal all threads waiting 56 • A combination of locking/signaling • Lock is used to protect code that changes a shared data value • Signal is used to notify other threads that the data has changed state ("condition" changed)

Slide 95

Slide 95 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Condition Example • Thread that prints value every time it changes x = 0 x_cond = Condition() 57 ... ... with x_cond: x = new value x_cond.notify() ... ... last_x = x while True: with x_cond: while (last_x == x): x_cond.wait() print("x=",x) last_x = x Thread 1 Thread 2 • Somewhat similar to an Event except that you can use it over and over again

Slide 96

Slide 96 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.4 58

Slide 97

Slide 97 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Problems with Threads • Many things all going at once • Weird control flow and tricky data handling • Ad-hoc coding hell and many bad examples • Weak debugging support • My advice : Focus on proper encapsulation 59

Slide 98

Slide 98 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Tasks vs. Threads • When you launch a thread, you are creating a concurrently executing "task" • A "task" is simply a representation of work • Insight : Tasks are more general than threads • Threads are an implementation detail related to getting the task to run, but are unrelated to the actual work being carried out 60

Slide 99

Slide 99 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Features • Some desirable features of tasks • Start/stop/resume functionality • Crash recovery • Logging and diagnostics • Debugging support • To get this, you need planning and discipline 61

Slide 100

Slide 100 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Classes 62 class CountdownTask: def __init__(self,count): self.count = count def run(self): while self.count > 0: print("Counting down", self.count) self.count -= 1 time.sleep(5) • Define your tasks as a class • Minimally, it has a run() method that performs the work associated with the task • Notice: No use of threads here

Slide 101

Slide 101 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Bootstrapping 63 class CountdownTask: def __init__(self,count): self.count = count def bootstrap(self): self.run() def run(self): while self.count > 0: print("Counting down", self.count) self.count -= 1 time.sleep(5) • Always put an extra wrapper around run() • Think of it as "booting" the task • Purpose is to set up the runtime environment

Slide 102

Slide 102 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Termination 64 class CountdownTask: ... def bootstrap(self): self.runnable = True self.run() def stop(self): self.runnable = False def run(self): while self.runnable and self.count > 0: ... • Give tasks some way to stop • Note: tasks must be programmed to check • Food for thought: Is stopping == killing?

Slide 103

Slide 103 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Exception Handling 65 class CountdownTask: ... def bootstrap(self): try: self.run() except Exception: self.exc_info = sys.exc_info() # Save it! # Log/report/handle the exception ... • Give tasks a catch-all exception handler • Buggy tasks will crash. You want to have a way to manage and debug it. • Advice : Always report and save the exception

Slide 104

Slide 104 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task State 66 class CountdownTask: def __init__(self,count): self.state = "INIT" ... def bootstrap(self): self.state = "RUNNING" try: self.run() except Exception: self.exc_info = sys.exc_info() self.state = "EXIT" • Give tasks a "state" attribute • Having this is very useful for debugging • Thought: Make it user-customizable

Slide 105

Slide 105 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Logging • Give your tasks some logging 67 import logging class CountdownTask: ... def bootstrap(self): self.log = logging.getLogger("countdown") self.log.info("Starting") ... try: self.run() except Exception: self.log.error("Crashed", exc_info=True) self.log.info("Exit")

Slide 106

Slide 106 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Logging • Set up the logging module for your program import logging logging.basicConfig( filename="debug.log", filemode="w", format="%(process)d:%(threadName)s" \ "%(levelname)s:%(message)s", level=logging.DEBUG) 68 • Examples of issuing log messages log = logging.getLogger("name") log.critical("A critical error occurred") log.error("File %s not found", filename) log.warning("This is your last warning") log.info("Just some information") log.debug("Debugging : n = %d", n)

Slide 107

Slide 107 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Logging Information • More information on logging 69 http://www.dabeaz.com/special/Logging.pdf • We'll be using it throughout

Slide 108

Slide 108 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Cleanup 70 class CountdownTask: ... def finalize(self): del self.log del self.exc_info ... self.state = "FINAL" • Define a finalization method • This method should clean-up all attributes related to the runtime environment (e.g., logging, exceptions, etc.) • Think of it as "__del__" for the runtime

Slide 109

Slide 109 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Resources 71 • Tasks have two runtime states • Managed separately Task (inactive) runtime environment Task (running) Task (inactive) bootstrap() finalize()

Slide 110

Slide 110 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Class Recap • You have a task class with this interface: class Task: def __init__(self): ... def stop(self): ... def bootstrap(self): ... def run(self): ... def finalize(self): ... 72 • There are start/stop methods • There is error handling • There is logging for diagnostics

Slide 111

Slide 111 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Interlude 73 • Our task class is a general representation of concurrently executing "work" • Think of it as a task environment • It is self-contained and decoupled • Task object is like a basic building block

Slide 112

Slide 112 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Adding Threads • Question : How do threads enter? • Think of threads as a very low-level implementation detail (like assembly language, low-level system calls, etc.). • Threads just enable concurrent execution 74

Slide 113

Slide 113 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Adding Threads • One approach : Add a start() method 75 class Task: ... def start(self): thr = threading.Thread(target=self.bootstrap) thr.daemon = True thr.start() def bootstrap(self): ... • Threads are used internally, but users don't know that--they just call start()

Slide 114

Slide 114 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Big Picture • Tasks have two parts 76 task execution thread task = Task() task.start() task.stop() task.finalize() task creation and control launches def bootstrap(self): self.run() def run(self): ... do work ... • Yes, there are many moving parts • It looks complex, but you'll be thankful later

Slide 115

Slide 115 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Commentary • Isn't all of this task setup a lot of work? 77 • Yes, but using threads is like playing with fire

Slide 116

Slide 116 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Food For Thought • Who is in charge here? • The threading library? • A third party library? • A framework? • Nobody? • You want to be in control of the environment • There is going to be complexity • Better to manage it than to react 78

Slide 117

Slide 117 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Counterpoint • Don't go overboard abstracting away details • Keep it simple • Make sure you can test and debug it 79 "If you add the right abstraction layer, you can make the jump from an unknown number of problems to an unknowable number of problems." - Cameron Laird

Slide 118

Slide 118 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.5 80

Slide 119

Slide 119 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Debugging Tips • Debugging concurrent tasks is tricky • Let's look at a few useful tips • Don't repeat yourself • Assigning thread/task names • How to use the main thread • Enabling post-mortem debugging support 81

Slide 120

Slide 120 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Avoid Repeating Yourself • Use inheritance to avoid code replication 82 class Task: def __init__(self): self.state = "INIT" def start(self): thr = threading.Thread(target=self.bootstrap) thr.daemon = True thr.start() def bootstrap(self): ... def stop(self): ... • To use: inherit from Task and redefine run()

Slide 121

Slide 121 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Using the Main Thread • Never use the main execution thread to perform any real work • If using threads, launch separate threads for all parts of your application • Why? If you do this, you can still use the interactive interpreter during execution • Incredibly useful for debugging (can examine tasks, program state, etc.) 83

Slide 122

Slide 122 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Main Thread Suggestion • In production, have the main thread spin 84 def main(): import time while True: time.sleep(1) • It consumes virtually no CPU (sleeps) • Avoids annoyance of killing programs that use threads (Ctrl-C will work properly)

Slide 123

Slide 123 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Post-Mortem Debugging • Here's a neat little trick involving pdb 85 class Task: ... def bootstrap(self): try: self.run() except Exception: self.exc_info = sys.exc_info() def pm(self): import pdb pdb.post_mortem(self.exc_info[2]) • If your task crashes with an exception, the pm() method launches the debugger on the saved traceback (can inspect internals)

Slide 124

Slide 124 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.6 86

Slide 125

Slide 125 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Threads and Memory • How did we get on this topic? • Oh yeah, threads share the same memory • It's great except that... • Shared memory sucks • Mutable data sucks • Locking sucks • Blasphemy! 87

Slide 126

Slide 126 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The Horror, The Horror! • Writing reliable and maintainable programs based on shared memory is a "sisyphean task" • Shared state is not a feature • At first glance, it seems "convenient" • Actually a trap that leads to nothing but sorrow 88

Slide 127

Slide 127 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Personal Aside • Computer Science professors love locks, synchronization, and shared state! • Gives them something to talk about for 3-4 weeks while teaching operating systems (e.g., dining philosophers) • Easily applied to making students cry on exams • Professors don't have to maintain real code 89

Slide 128

Slide 128 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Concurrency Wisdom • Experience : Concurrent programs involving tasks, shared state, and complicated locking can't be understood (or debugged) by humans • Practical advice: • Tasks should never share state • Tasks can communicate, but should only pass immutable data structures • Strive for task isolation and simplicity 90

Slide 129

Slide 129 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Shared State 91 "If there's one lesson we've learned from 30+ years of concurrent programming it is: just don't share state. It's like two drunkards trying to share a beer. It doesn't matter if they're good buddies. Sooner or later they're going to get into a fight. And the more drunkards you add to the pavement, the more they fight each other over the beer. The tragic majority of multithreaded applications look like drunken bar fights." - ØMQ (The Guide)

Slide 130

Slide 130 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Question • How do you get isolation? 92

Slide 131

Slide 131 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Communication • Instead of having shared data structures, focus on communication between threads • Examples: • Producer/consumer • Publish/subscribe • Request/response • Big idea : think of threads as independent actors (like network servers) 93

Slide 132

Slide 132 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Producers & Consumers • Threaded programs can often be organized into into producers and consumers 94 Task 1 (Producer) Task 2 (Consumer) inbox send(item) • Instead of "sharing" data, threads only coordinate by sending data to each other • Think pipes, sockets, etc.

Slide 133

Slide 133 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- queue Library Module • A thread-safe message queue • Basic operations from queue import Queue q = Queue([maxsize]) # Create a queue q.put(item) # Put an item on the queue q.get() # Get an item from the queue q.empty() # Check if empty q.full() # Check if full q.qsize() # Queue size 95 • To use: write your code so that it strictly adheres to get/put operations.

Slide 134

Slide 134 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Basic Queue Usage • Example of setting up a producer and consumer # Produce an item msg_q.put(item) 96 while True: item = msg_q.get() consume_item(item) from queue import Queue msg_q = Queue() Producer Thread Consumer Thread • Items are sent from one thread to another • No shared state except for the queue

Slide 135

Slide 135 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Tasks as Consumers • Recall our earlier task object 97 class Task: def __init__(self): ... def stop(self): ... def bootstrap(self): ... def run(self): ... • Queuing can be added by extending this class • Remember : You want encapsulation

Slide 136

Slide 136 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Tasks as Consumers 98 import queue class Task: def bootstrap(self): self._messages = queue.Queue() ... def send(self,msg): self._messages.put(msg) def recv(self): return self._messages.get() def run(self): while True: # Get a message from the queue msg = self.recv() # Work on msg ... • Add an internal queue, send(), and recv() • send() stores incoming messages. run() receives messages with recv()

Slide 137

Slide 137 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Consumer Termination 99 class TaskExit(Exception): pass class Task: ... def send(self,msg): self._messages.put(msg) def stop(self): # Use None to signal end of messages self._messages.put(TaskExit) ... • To gracefully shutdown, use a sentinel • Sentinel value goes on the end of the queue • Previous messages will get processed first

Slide 138

Slide 138 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Consumer Termination 100 class TaskExit(Exception): pass class Task: ... def recv(self): msg = self._messages.get() if msg is TaskExit: raise TaskExit() return msg def bootstrap(self): ... try: self.run() except TaskExit: pass ... • Check for sentinel in recv() and raise exception • Can catch in bootstrap() behind the scenes

Slide 139

Slide 139 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.7 101

Slide 140

Slide 140 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Task Messaging 102 • Have built a messaging framework for tasks Task 1 Task 2 Task 3 Task 4 • It's extremely general purpose • Still some subtle details to work out

Slide 141

Slide 141 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Message Immutability 103 • For sanity, messages should be immutable! • Strings, Numbers, Tuples • This means... • No dictionaries or instances • Recall : "Nothing but sorrow" • Personal preference: Use tuples of immutables • If you can't do that, at least send copies

Slide 142

Slide 142 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Why Immutability? 104 • Passing data between threads involves a transfer of information (and involves memory) Task 1 msg send() Task 2 • Are you sending a reference or a value? Task 1 msg Task 2 • If reference, there is shared state (danger) refcnt=1 refcnt=2

Slide 143

Slide 143 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Message Tagging 105 • With messages, it is often useful to have a tag that identifies the kind of message task.send(('foo',msg)) task.send(('bar',msg)) • Allows the receiver to recognize different kinds of messages and process accordingly def run(self): ... tag, msg = self.recv() if tag == 'foo': # Process 'foo' messages ... elif tag == 'bar': # Process 'bar' messages

Slide 144

Slide 144 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Message Tagging 106 • Tagging can also be used to separate messages originating from multiple sources Task 1 Task 2 Task 3 ('foo', msg) ('bar', msg) • Example: A consumer that receives data from multiple input sources

Slide 145

Slide 145 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Queue Sizes • Queues can be created with a max size q = queue.Queue([maxsize]) # Bounded queue 107 • Can be used to prevent unbounded queue growth of consumers • Example: Limiting messages to slow or non- responsive consumers

Slide 146

Slide 146 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Non-blocking Queuing • Normally, put/get block a thread • Non-blocking get/put q.get(False) # Get item or queue.Empty exception q.put(item,False) # Put item or queue.Full exception 108 • Queuing with timeouts q.get(timeout=PERIOD) q.put(item, timeout=PERIOD)

Slide 147

Slide 147 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Example : Receive Timeout 109 import queue class Task: ... def recv(self,*,timeout=None): try: self._messages.get(timeout=timeout) except queue.Empty: raise RecvTimeoutError() ... • Implementation • Receivers should have the option of not blocking forever (if they want)

Slide 148

Slide 148 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Caution 110 • Queue limits, non-blocking, and timeouts are significantly more complicated to use in practice than you might imagine • Introduces problem of "flow-control" • Queue limits might result in deadlock • Non-blocking might result in discarded messages

Slide 149

Slide 149 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.8 111

Slide 150

Slide 150 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Publish/Subscribe 112 • Producers might use a subscription model Publisher Subscriber channel channel Subscriber Subscriber • Think chat, RSS, XMPP, logging, etc... • Publishers send message into a channel, subscribers receive the feed

Slide 151

Slide 151 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Publish/Subscribe • To implement, it is common to define an intermediary object for message handling 113 Publisher Subscriber Subscriber Subscriber Gateway • Gateway receives messages and deals with details of distribution, routing, subscriptions, etc. • Goal : Loose coupling

Slide 152

Slide 152 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Example: Logging • You're already familiar with something that works exactly like this: the logging module 114 Handler Handler Handler Logger • Logger gets logging messages and publishes them to various subscribed handlers • Can use it as a rough design model log.info("Hi")

Slide 153

Slide 153 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Example Gateway • Simplistic implementation of a Gateway 115 class Gateway: def __init__(self): self._channels = {} def subscribe(self,task,channel): self._channels.setdefault(channel,set()).add(task) def unsubscribe(self,task,channel): self._channels[channel].remove(task) def publish(self,msg,channel): for task in self._channels[channels]: task.send(msg) • Caution: It's missing some locking (added in the exercise)

Slide 154

Slide 154 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Example Gateway • Internally, there are different channels 116 class Gateway: def __init__(self): self._channels = {} def subscribe(self,task,channel): self._channels.setdefault(channel,set()).add(task) def unsubscribe(self,task,channel): self._channels[channel].remove(task) def publish(self,msg,channel): for task in self._channels[channel]: task.send(msg)

Slide 155

Slide 155 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Example Gateway • Subscribe/unsubscribe methods 117 class Gateway: def __init__(self): self._channels = {} def subscribe(self,task,channel): self._channels.setdefault(channel,set()).add(task) def unsubscribe(self,task,channel): self._channels[channel].remove(task) def publish(self,msg,channel): for task in self._channels[channel]: task.send(msg)

Slide 156

Slide 156 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Example Gateway • publish method 118 • Simply forwards messages to any tasks subscribed on a channel class Gateway: def __init__(self): self._channels = {} def subscribe(self,task,channel): self._channels.setdefault(channel,set()).add(task) def unsubscribe(self,task,channel): self._channels[channel].remove(task) def publish(self,msg,channel): for task in self._channels[channel]: task.send(msg)

Slide 157

Slide 157 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Gateway Use • There is at least one gateway in the system • Used by all of the threads for publishing 119 Gateway Subscriber Subscriber Subscriber Publisher Publisher • Key: Multiple publishers and subscribers

Slide 158

Slide 158 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Subscriber Disconnect 120 class Task: ... def run(self): gateway.subscribe(self) try: ... finally: gateway.unsubscribe(self) • Problem : Tasks come and go. • Must be careful to manage subscriptions • Example: unsubscribe on exception/return

Slide 159

Slide 159 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Decoupling 121 _gateways = {} def get_gateway(name): if name not in _gateways: _gateways[name] = Gateway() return _gateways[name] • Publishers should try to remain relatively decoupled from the gateways (i.e., avoid passing direct references around). • One approach: Emulate the logging interface • Use: gateway = get_gateway("mygateway") gateway.publish(msg,"channel")

Slide 160

Slide 160 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.9 122

Slide 161

Slide 161 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Food for Thought 123 • pub/sub is a very flexible approach • Promotes loose coupling of tasks • Reliability features (task restart, redundancy, etc.) • Scalability to multiple machines (later) • Can also be used to implement system-level features such as task monitoring, events, logging, debugging, etc.

Slide 162

Slide 162 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Monitoring/Debugging • pub/sub allows diagnostic tools to be attached 124 Gateway Monitoring Consumer Publisher • Idea: Optional components can listen in on the communication and report back

Slide 163

Slide 163 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Events/Notification • Can have dedicated gateways for events 125 Task • Special tasks for handling abnormal situations, crashes, etc. Task Task Event Gateway crash ☠ Crash Manager

Slide 164

Slide 164 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.10 126

Slide 165

Slide 165 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Worker Tasks • Sometimes concurrent tasks/threads are used to perform background work on behalf of other code 127 master worker task Request Response/result • Scenario : Master hands work over to a separate task and continues with other processing. Gets the result at some later time

Slide 166

Slide 166 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Worker Tasks • Very tricky: Instead of just sending data, you have an asynchronous request/response cycle 128 master worker Request Response/result ??? • Issue : How does the result come back?

Slide 167

Slide 167 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The Problem • This is nothing like a normal function call • The work is finished at some undetermined time in the future • The master doesn't know when the result will arrive--and it may want to do other things in the meantime • Comment: This problem also comes up in other settings (distributed computing, etc.) 129

Slide 168

Slide 168 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Returning Results • Define an object that represents a future result class UnavailableError(Exception): pass class FutureResult: def set(self,value): self._value = value def get(self): if hasattr(self,"_value"): return self._value else: raise UnavailableError("No result") 130 • The idea here: Return the result if it's been set, otherwise raise an exception

Slide 169

Slide 169 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Result Objects • Sample use (interactive mode) >>> r = FutureResult() >>> r.get() Traceback (most recent call last): File "", line 1, in File "result.py", line 9, in get raise UnavailableError("no result") __main__.UnavailableError: no result >>> r.set(42) >>> r.get() 42 >>> 131 • Now, let's see how you might use it

Slide 170

Slide 170 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Worker Task 132 • FutureResult object is created by worker and given back to the requestor • Worker sets the result when work finished class WorkerTask(Task): def request(self,msg): fresult = FutureResult() # Create FutureResult self.send((fresult,msg)) # Send along with msg return fresult # Return result object def run(self): while True: # Get a message fresult,msg = self.recv() # Work on msg ... # Set the result fresult.set(response) # Set the response

Slide 171

Slide 171 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Requesting Work • Example of making a request: 133 fresult = worker.request(msg) # Make request to worker ... ... do other things while worker works ... # Get the result (at a later time) r = fresult.get() • Keep in mind: the worker is operating concurrently in the background • When it finishes (at unknown time), it will store data in the returned result object

Slide 172

Slide 172 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Returning Exceptions • What if a worker wants to communicate an error or exception? • This is tricky---you're basically passing an exception between tasks • It's not "normal" exception handling 134

Slide 173

Slide 173 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Results w/ Exceptions • Modified Result object class UnavailableError(Exception): pass class FutureResult: def set(self,value): self._value = value def set_error(self): self._exc = sys.exc_info() def get(self): if hasattr(self,"_exc"): raise self._exc[1].with_traceback(self._exc[2]) elif hasattr(self,"_value"): return self._value else: raise UnavailableError("No result") 135 Reraise the exception that occurred in the worker Save current exception information

Slide 174

Slide 174 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Setting Exceptions • An example of setting an exception 136 class WorkerTask(Task): def run(self): ... try: ... some work ... result.set(value) except: result.set_error() fresult = worker.send(msg) ... ... r = fresult.get() Exception actually gets raised here (when someone is interested in the outcome) Save the current exception • Admittedly, it's a little mind-bending

Slide 175

Slide 175 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.11 137

Slide 176

Slide 176 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- A Coordination Problem • How does the master thread know when the result has been made available? 138 master worker Request Response/result • Does it have to constantly poll? • Does it just wait for awhile? • This is a timing/synchronization issue

Slide 177

Slide 177 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Event Waiting • Using an event to signal "completion" def master(): ... item = create_item() evt = Event() worker.send((item,evt)) ... # Other processing ... ... ... ... ... # Wait for worker evt.wait() 139 Worker Thread item, evt = get_work() processing processing ... ... # Done evt.set()

Slide 178

Slide 178 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Results with Waiting • Using events in our result object from threading import Event class FutureResult: def __init__(self): self._evt = Event() def set(self,value): self._value = value self._evt.set() def get(self): self._evt.wait() return self._value 140 • Idea : get() will simply use the event to wait for the result to become available

Slide 179

Slide 179 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.12 141

Slide 180

Slide 180 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Advanced Features • FutureResult can be expanded to support a variety of other very useful features • Cancellation • Progress/status • Completion callbacks 142

Slide 181

Slide 181 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Work Cancellation • Allow requestor to cancel 143 class FutureResult: def __init__(self): self._cancel = False def cancel(self): self._cancel = True • Example use in worker def run(self): while True: fresult, msg = self.recv() if fresult._cancel: continue ...

Slide 182

Slide 182 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Work Progress • Results can include progress information 144 class FutureResult: def __init__(self): self.progress = 0 ... • Example use in worker (requestor can monitor) def run(self): while True: fresult, msg = self.recv() # Work on the result ... fresult.progress += n # Update progress ... # Done fresult.set(response) • Alternate: publish progress on a channel

Slide 183

Slide 183 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Completion Callbacks • A function that fires when result is ready 145 class FutureResult: def __init__(self): self._callback = None def set_callback(self,cb): self._callback = cb def set(self,result): self._value = result if self._callback: # Invoke callback (if set) self._callback(result) • Example use: def when_done(result): print(result) fresult = worker.request(msg) fresult.set_callback(when_done)

Slide 184

Slide 184 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.13 146

Slide 185

Slide 185 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Worker Pools • It is common for work to be farmed out to a pool of worker tasks 147 master worker pool Request Response/result worker worker worker worker • Example : A pool of dozen different worker threads handle incoming work

Slide 186

Slide 186 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Pool Implementation • Many possibilities • Most common : launch multiple threads that all receive messages from the same message queue 148

Slide 187

Slide 187 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Worker Pool • Sketch of implementation 149 class WorkerPool(Task): def __init__(self,nworkers=1): ... self.nworkers = nworkers def run(self): for n in range(1,self.nworkers): thr = threading.Thread(target=self.do_work) thr.daemon = True thr.start() self.do_work() def do_work(self): while True: fresult,msg = self.recv() ... do work ... fresult.set(value) • Tricky bits : Shutdown (see exercise) Launch of multiple worker threads

Slide 188

Slide 188 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Worker Pool API • In practice, worker pools tend to have an API that is similar to this: 150 class WorkerPool(Task): ... def apply(self,func,args=(),kwargs={}): # Runs func(*args,**kwargs) in a worker thread ... def map(self,func,sequence): # Runs [func(s) for s in sequence] in worker ... • apply() - Runs a callable in a worker • map() - Applies a function to a sequence using multiple workers. (related to map-reduce)

Slide 189

Slide 189 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.14 151

Slide 190

Slide 190 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Interlude • We have done a lot of work putting it together • Task objects • Different communication patterns • Task diagnostics, debugging, logging, etc. • It sets the stage for other topics 152

Slide 191

Slide 191 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Implementation • Let's conclude by peeling back the covers • Some details of how Python operates • Some limitations 153

Slide 192

Slide 192 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- What is a Thread? • Python threads are real system threads • POSIX threads (pthreads) • Windows threads • Fully managed by the host operating system • All scheduling/thread switching • Represent threaded execution of the Python interpreter process (written in C) 154

Slide 193

Slide 193 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Behind the Scenes • There's not much going on... • Here's what happens • Python creates a small data structure containing some interpreter state • A new thread (pthread) is launched • The thread calls PyEval_CallObject • Last step is just a C function call that runs whatever Python callable was specified 155

Slide 194

Slide 194 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread-Specific State • Each thread has its own interpreter specific data structure (PyThreadState) • Current stack frame (for Python code) • Current recursion depth • Thread ID • Some per-thread exception information • Optional tracing/profiling/debugging hooks • It's a small C structure (~84 bytes) 156

Slide 195

Slide 195 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- PyThreadState Structure 157 typedef struct _ts { struct _ts *next; PyInterpreterState *interp; struct _frame *frame; int recursion_depth; int tracing; int use_tracing; Py_tracefunc c_profilefunc; Py_tracefunc c_tracefunc; PyObject *c_profileobj; PyObject *c_traceobj; PyObject *curexc_type; PyObject *curexc_value; PyObject *curexc_traceback; PyObject *exc_type; PyObject *exc_value; PyObject *exc_traceback; PyObject *dict; int tick_counter; int gilstate_counter; PyObject *async_exc; long thread_id; } PyThreadState;

Slide 196

Slide 196 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Threads and sys module • Certain functions in the sys module are tied to thread-specific state (exceptions, diagnostics) • Example : sys.exc_info() 158 try: statements except Exception: # Get per-thread exception information etype, value, tb = sys.exc_info() • Some other thread-specific functions sys.exc_clear() sys.settrace() sys.setprofile() ...

Slide 197

Slide 197 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The Infamous GIL • But there's a catch... • Only one Python thread can execute in the interpreter process at once • There is a "global interpreter lock" that carefully controls thread execution • The GIL ensures that sure each thread gets exclusive access to the entire interpreter internals when it's running 159

Slide 198

Slide 198 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- GIL Behavior • Whenever a thread runs, it holds the GIL • However, the GIL is released on I/O 160 I/O I/O I/O release acquire release acquire acquire release • So, any time a thread is forced to wait, other "ready" threads get their chance to run • Basically a kind of "cooperative" multitasking run run run run acquire

Slide 199

Slide 199 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Switching • If threads hammer the CPU and don't do I/O, they will periodically switch back and forth • Python 3.1 and earlier: threads have option of switching every 100 "instructions" (ticks) • Python 3.2 and newer: threads switch every 5ms • Switch period can be tuned (if needed) 161 sys.setcheckinterval(nticks) # Python 3.1 and earlier sys.setswitchinterval(secs) # Python 3.2 and newer

Slide 200

Slide 200 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Why is the GIL there? • Simplifies the implementation of the Python interpreter (okay, sort of a lame excuse) • Better suited for reference counting (Python's memory management scheme) • Simplifies the use of C/C++ extensions. Extension functions do not need to worry about thread synchronization • Is it ever going away? Probably not soon. 162

Slide 201

Slide 201 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The GIL Explained • In June, 2009, I gave a talk about the GIL 163 http://www.dabeaz.com/python/GIL.pdf • I won't repeat it all here, but here's the gist: Python threads should not be used for CPU- bound processing (e.g., crunching data) • This led to a PyCON'2010 presentation http://www.dabeaz.com/GIL

Slide 202

Slide 202 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- CPU Bound Processing • For heavy CPU processing, Python is limited to a single CPU-core • Thus, threads can not be used to provide any kind of parallel processing • No performance gain at all • In older versions of Python (< 3.2), threads might make the performance far worse 164

Slide 203

Slide 203 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- GIL Badness • The worst part of the GIL is not the fact that it limits the use of multiple cores • There are other ways to utilize multicore (later) • GIL actually causes all sorts of bizarre problems with timing and scheduling of threads • Examples: Response time, I/O throughput 165

Slide 204

Slide 204 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Python Code Execution • Code is compiled to interpreter instructions 166 def countdown(n): while n > 0: print n n -= 1 >>> import dis >>> dis.dis(countdown) 0 SETUP_LOOP 33 (to 36) 3 LOAD_FAST 0 (n) 6 LOAD_CONST 1 (0) 9 COMPARE_OP 4 (>) 12 JUMP_IF_FALSE 19 (to 34) 15 POP_TOP 16 LOAD_FAST 0 (n) 19 PRINT_ITEM 20 PRINT_NEWLINE 21 LOAD_FAST 0 (n) 24 LOAD_CONST 2 (1) 27 INPLACE_SUBTRACT 28 STORE_FAST 0 (n) 31 JUMP_ABSOLUTE 3 ... • Instructions in the Python VM

Slide 205

Slide 205 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Instruction Execution • Interpreter instructions are non-interruptible 167 • Long operations can block everything >>> nums = range(1000000000) >>> sum(nums) 499999999500000000 >>> 1 instruction (~ 42 seconds) • Try hitting Ctrl-C (uninterruptible) >>> nums = range(1000000000) >>> sum(nums) ^C^C^C (nothing happens, very long pause) ... KeyboardInterrupt >>>

Slide 206

Slide 206 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Why You Care • Long running instructions block progress • Would manifest itself as an annoying "pause" in a GUI, game, or network application • Example : A request is sent to a server, but it doesn't respond for 10 seconds 168

Slide 207

Slide 207 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Long Running Instructions • Illustration 169 Thread 1 Thread 2 STALLED long instruction (running) GIL release event arrives • No way for a long instruction to be preempted • All other threads stall, waiting for completion running sleeping done response

Slide 208

Slide 208 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- I/O Throughput Issues 170 • Consider this code fragment: def receive_data(s): msg = bytearray() while True: data = s.recv(1024) ! if not data: break ! msg.extend(data) return msg • It receives and assembles a message on a socket • Imagine it's part of some web/messaging code

Slide 209

Slide 209 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- I/O Throughput Issues 171 • A test: Send 1MB of data between interpreters python python 1MB • Time to receive : ~0.009s (116Mbytes/sec) • Python 3.2, quad-core MacPro

Slide 210

Slide 210 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- I/O Throughput Issues 172 • Now, introduce a CPU bound thread def receive_data(s): msg = bytearray() while True: data = s.recv(1024) ! if not data: break ! msg.extend(data) return msg • Thread 2 is just doing some "work" Thread 1 Thread 2 def spin(): while True: pass

Slide 211

Slide 211 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- I/O Throughput Issues 173 • A test: Sending 1MB of data between interpreters python python (2 threads) 1MB • Time to receive : ~2.23s (470Kb/sec) • Almost 250x times slower! Yikes!

Slide 212

Slide 212 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.15 174

Slide 213

Slide 213 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 175 • GIL performance problems are related to the mechanism used to switch threads • In particular, the preemption mechanism and lack of thread priorities • Must illustrate

Slide 214

Slide 214 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 176 Thread 1 running • Suppose there is just one thread • It runs forever • Never releases the GIL • Life is great!

Slide 215

Slide 215 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 177 Thread 1 Thread 2 SUSPENDED running • Now, a second thread makes an appearance... • It is suspended because it doesn't have the GIL • Somehow, it has to get it from Thread 1

Slide 216

Slide 216 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 178 Thread 1 Thread 2 SUSPENDED running • Second thread does a timed wait on GIL • The idea : Thread 2 waits to see if the GIL gets released voluntarily by Thread 1 (e.g., if Thread 1 performs I/O or goes to sleep) wait(gil, TIMEOUT) By default TIMEOUT is 5 milliseconds, but it can be changed

Slide 217

Slide 217 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 179 Thread 1 Thread 2 SUSPENDED running • A thread might give up the GIL voluntarily • This is straightforward--the GIL is just handed to the waiting thread (all is well) wait(gil, TIMEOUT) I/O wait release running SUSPENDED

Slide 218

Slide 218 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 180 Thread 1 Thread 2 SUSPENDED running • What if a thread doesn't suspend? (CPU bound) • After timeout, the waiting thread initiates a "drop request" and repeats the wait operation wait(gil, TIMEOUT) TIMEOUT drop_request wait(gil, TIMEOUT)

Slide 219

Slide 219 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 181 Thread 1 Thread 2 SUSPENDED running • Thread 1 suspends when drop request received • GIL released after the current instruction completes (recall, it might take awhile) wait(gil, TIMEOUT) TIMEOUT wait(gil, TIMEOUT) drop_request release running

Slide 220

Slide 220 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 182 Thread 1 Thread 2 SUSPENDED running • On a forced release, Thread 1 waits for an ack • Signal indicates that the other thread successfully got the GIL and is now running wait(gil, TIMEOUT) TIMEOUT wait(gil, TIMEOUT) drop_request release running WAIT wait(ack) SUSPENDED ack

Slide 221

Slide 221 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Thread Switching 183 Thread 1 Thread 2 SUSPENDED running • The process now repeats itself for Thread 1 • This sequence happens over and over again as CPU-bound threads execute wait(gil, TIMEOUT) TIMEOUT wait(gil, TIMEOUT) drop_request release running WAIT wait(ack) SUSPENDED wait(gil, TIMEOUT) ack

Slide 222

Slide 222 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Problem : Response Time 184 • If a thread wants the GIL, it might wait 5ms • There is no way for a "high priority" thread to grab the GIL away from a CPU-bound thread • Example : Critical event received on network • You can't guarantee an immediate response

Slide 223

Slide 223 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Response Time • Illustration 185 Thread 1 Thread 2 READY running wait(gil, TIMEOUT) release running IOWAIT data arrives wait(gil, TIMEOUT) TIMEOUT drop_request • To handle I/O, thread 2 must go through the entire timeout sequence to get control

Slide 224

Slide 224 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Problem: Thread Selection 186 Thread 1 Thread 2 SUSPENDED running TIMEOUT drop_request SUSPENDED Thread 3 SUSPENDED SUSPENDED running release • Thread that first wants the GIL might not get it • Not only can you not guarantee response time, you can't guarantee scheduling

Slide 225

Slide 225 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- Problem : GIL Release • CPU-bound threads degrade I/O 187 Thread 1 Thread 2 READY running run data arrives • Each I/O call (e.g., recv/send) drops the GIL and restarts the CPU-bound Thread 1 • Each time Thread 1 runs, need 5ms to preempt data arrives running READY run release running READY data arrives 5ms 5ms 5ms

Slide 226

Slide 226 text

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 2- I/O Experiment Explained 188 • Recall our original code def receive_data(s): msg = bytearray() while True: data = s.recv(1024) ! if not data: break ! msg.extend(data) return msg • It required ~2.23s to receive 1MB (vs. 0.009s) • ~1024 recv() operations (each releases the GIL) • Thread 2 gets scheduled more than it ought to Thread 1 Thread 2 def spin(): while True: pass

Slide 227

Slide 227 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.16 189

Slide 228

Slide 228 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The GIL and C Code • C/C++ extensions can release the interpreter lock and run independently • Caveat : Once released, C code shouldn't do any processing related to the Python interpreter or Python objects • You might be able to use this to take advantage of multiple cores 190

Slide 229

Slide 229 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The GIL and C Extensions • Having C extensions release the GIL is how you get into true "parallel computing" 191 Python instructions C code GIL release GIL acquire C threads Python instructions Thread 1 Thread 2 Python instructions GIL acquire GIL release • Python and C can run in parallel

Slide 230

Slide 230 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The GIL and C Extensions • Having C extensions release the GIL is how you get into true "parallel computing" 192 Python instructions C code GIL release GIL acquire C threads Python instructions Thread 1 Thread 2 Python instructions GIL acquire GIL release • Python and C can run in parallel Key part: This execution must not involve the Python interpreter

Slide 231

Slide 231 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- How to Release the GIL • C extensions use special macros 193 PyObject *pyfunc(PyObject *self, PyObject *args) { ... Py_BEGIN_ALLOW_THREADS // Threaded C code ... Py_END_ALLOW_THREADS ... } • Certain extensions such as ctypes also release the GIL automatically

Slide 232

Slide 232 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- The GIL and C Extensions • The trick with C extensions is that you have to make sure they do enough work • You won't get any benefit if the C code only runs a few simple calculations • You need to do a lot of calculation (e.g., thousands of floating point ops). 194

Slide 233

Slide 233 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Using Processes • Another GIL workaround is to delegate CPU-intensive work to a subprocess • Example : Send work to a separate Python interpreter over a pipe or socket • Have that interpreter operate independently and send results back when done • We're going to look at this later... 195

Slide 234

Slide 234 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Exercise thread.17 196

Slide 235

Slide 235 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Cost of Threads • Threads are often considered expensive • Somewhat true, but the actual overhead tends to be overblown (especially in blog posts and rants about threads) 197

Slide 236

Slide 236 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Under the Covers 198 • Operating systems know how to deal with threads • For I/O waiting, it's highly optimized s1 s2 sockets OS Kernel thr1 thr2 thr3 wait queues thr thr thr ready queue thr running recv data • Basically a scheduler and a bunch of queues • Waiting/waking are just queue operations recv()

Slide 237

Slide 237 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Context Switching 199 • Each time the system switches threads, it performs a "context-switch." • Saves CPU state (registers, etc.) • Might flush CPU cache, TLB, etc. • Actual behavior depends on system • However: excessive context switching is bad • Biggest killer of thread performance? (maybe)

Slide 238

Slide 238 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Impact on Messaging 200 • Context-switching overhead is a practical concern for threading based on messaging • Does every message sent involve a context-switch from the sender to the receiver for processing? • Or do messages get queued up for awhile • What happens during rapid messaging? • Requires detailed study and performance analysis

Slide 239

Slide 239 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Rapid Messaging 201 • Tasks that send messages at a high frequency, should be modified to group messages Producer Consumer messages Producer Consumer grouped messages • Results in fewer actual send() calls, less locking, less context switching, etc.

Slide 240

Slide 240 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- I/O Considerations 202 • Every I/O operation also involves a potential context switch because the GIL is released • For best performance : Threads should perform a small number of large I/O operations instead of a large number of small I/O operations (buffering) • Example : It's better to write a single 1Mb message to a socket than 1024 1K messages. • Make the operating system work for you...

Slide 241

Slide 241 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- I/O Performance 203 • Don't send a bunch of small fragments while expression: ... s.sendall(fragment) ... • Use buffering and a single send msgbuf = bytearray() while expression: ... msgbuf.extend(fragment) ... s.sendall(msgbuf) • Will greatly reduce system overhead, amount of thread switching, locking, etc.

Slide 242

Slide 242 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Memory Use • Each thread gets its own C stack 204 t1 = Thread() t1.start() t2 = Thread() t2.start() t3 = Thread() t3.start() virtual memory thread 1 (8MB) thread 2 (8MB) thread 3 (8MB) main thread ... ... Heap • The size is determined by the system (Python uses the default settings) • Creating many threads may quickly eat up the VM address space stack

Slide 243

Slide 243 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Thread Stack Space • Thread stack size can be adjusted 205 import threading threading.stack_size(65536) t1 = threading.Thread(...) t2 = threading.Thread(...) • Must be a 4K multiple, minimum is 32K • 32K is adequate for a lot of Python code • Each function call < 512 bytes of stack • Might need more for C extensions (if too small, may get a violent crash - SegFault)

Slide 244

Slide 244 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Performance Thoughts 206 "The best performance optimization is the transition from the non-working to the working state." -- John Ousterhout • If you are going to program with threads, correctness and reliability must have higher priority than all other concerns • You can optimize it later • There are other ways to scale things up

Slide 245

Slide 245 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Final Comments 207 • Threads are very useful for certain kinds of problems involving I/O, waiting, etc. • In some cases, they're the best choice • In other cases, they're terrible • And sometimes, they're the only way

Slide 246

Slide 246 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Final Comments 208 • We've covered a lot of ground • Threads/Tasks • Various design patterns • Messaging idioms • Performance considerations • All of this forms the foundation for other topics (processes, distributed computing, etc.)

Slide 247

Slide 247 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 2- Final Final Comments 209 • If you're going to use threads, keep them hidden in the background • Don't make end-users mess around with them • Observe: In our task library, end-user task classes don't do anything with threads (they receive messages, but details are hidden) • Again: Encapsulation is key

Slide 248

Slide 248 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Multiprocessing 1 Section 3

Slide 249

Slide 249 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Introduction • As you know, Python has a global interpreter lock (GIL) that limits thread performance • It means that you can only utilize a single CPU within any given program • In this section, we look at a workaround-- carrying out work in a subprocess 2

Slide 250

Slide 250 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Concept : Process • Each running program is a "process" • Executes independently • Has own memory • Has own resources (files, sockets, etc.) • Can be scheduled on different CPUs by OS • Each instance of the Python interpreter that runs on your system is a process 3

Slide 251

Slide 251 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Cooperating Processes • For CPU-intensive work, a common strategy is to use cooperating processes 4 python (master) python (worker) python (worker) • Multiple copies of the python interpreter that run on different CPUs and exchange data • Not networking (all on the same machine) CPU 1 CPU 2

Slide 252

Slide 252 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Commentary • Our thread-based programs were strongly tied to this kind of design (messaging, queues, etc.) • That was intentional • Threads have known scalability problems as problems and systems get large • To solve that, you are almost forced to move to processes (and later distributed systems) • This is a universal problem--not just Python 5

Slide 253

Slide 253 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- multiprocessing Module • A standard library module for carrying out work in separate processes • Can be used to distribute work to other CPUs and to take advantage of multiple cores • Also has some distributed computing features (to be covered a little later) 6

Slide 254

Slide 254 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Process Pools • A process-based worker pool 7 p = Process(target=somefunc) p = multiprocessing.Pool([numprocesses]) • This is the main feature you should use • It executes functions in a subprocess • It's very high-level (you don't need to worry a lot about internal details)

Slide 255

Slide 255 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Process Pools • Core pool operations 8 p = Process(target=somefunc) p = multiprocessing.Pool([numprocesses]) p.apply(func [, args [, kwargs]]) p.apply_async(func [, args [, kwargs [, callback]]]) p.map(func, [, iterable [, chunksize]]) • There are some others, but these are enough to get started • Let's see some examples

Slide 256

Slide 256 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool apply() • Running a function in another process 9 p = Process(target=somefunc) def add(x,y): return x+y if __name__ == '__main__': p = Pool(2) r = p.apply(add,(2,3)) print(r) • apply() runs a function in one of the worker processes and returns the result • Note: It waits for the result to come back

Slide 257

Slide 257 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- apply() Illustrated • Suppose you have a lot of threads 10 p = Process(target=somefunc) Thread 1 Thread 2 Thread 3 Thread 4 • If they're all I/O bound, life is good • Mostly they sleep, hardly any GIL contention

Slide 258

Slide 258 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- apply() Illustrated • Now suppose a thread wants to do work 11 p = Process(target=somefunc) Thread 1 Thread 2 Thread 3 Thread 4 CPU-bound processing • Thread holds GIL • Causes contention with other threads GIL contention

Slide 259

Slide 259 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- apply() Illustrated • Delegating work to a pool 12 p = Process(target=somefunc) Thread 1 Thread 2 Thread 3 Thread 4 CPU-bound processing Pool apply() result • Pool is separate process. No GIL waiting

Slide 260

Slide 260 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool apply_async() • Asynchronous execution 13 p = Process(target=somefunc) def add(x,y): return x+y if __name__ == '__main__': p = Pool(2) r = p.apply_async(add,(2,3)) # Other work ... # Collect the result at a later time print(r.get()) • Here, you get a handle to an object for retrieving the result at some later time (like a future result)

Slide 261

Slide 261 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Async Results • For asynchronous execution, you get a special AsyncResult object • Here is a mini reference on it 14 p = Process(target=somefunc) a.get([timeout]) # Get the result a.ready() # Result ready? a.successful() # Completed without errors a.wait([timeout]) # Wait for result • get() is the most useful method, but there are other operations for polling, querying error status, etc.

Slide 262

Slide 262 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- apply_async() Callbacks • Asynchronous execution with callback 15 p = Process(target=somefunc) def add(x,y): return x+y def gotresult(result): print(result) if __name__ == '__main__': p = Pool(2) r = p.apply_async(add,(2,3),callback=gotresult) • Here, a callback function fires when the result is received

Slide 263

Slide 263 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- apply_async() Illustrated • Used to initiate parallel computation 16 p = Process(target=somefunc) Thread Pool apply_async() • Thread initiates multiple operations • Collects results later results

Slide 264

Slide 264 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- apply() vs apply_async() • Usage depends on the context • Use pool.apply() if you're using a pool to do work on behalf of a thread (and there are a lot of threads) • Use pool.apply_async() if there's only one execution thread and it's trying to farm out work to multiple workers at once 17 p = Process(target=somefunc)

Slide 265

Slide 265 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool map() • pool.map() - Maps a function onto a sequence 18 def square(x): return x*x if __name__ == '__main__': p = Pool(2) nums = range(1000) squares = p.map(square,nums,100) • This subdivides a sequence into chunks and farms out work to the pool workers p1 p2 result input pool workers

Slide 266

Slide 266 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool map() • pool.map() is similar to a list comprehension 19 def square(x): return x*x # Compute [x*x for x in nums] nums = range(1000) squares = p.map(square,nums) • Restrictions: • Mapped function must be module-level • No lambda • No instance methods (won't work)

Slide 267

Slide 267 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Exercise process.1 20

Slide 268

Slide 268 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Some Technicalities • Pool workload • Pool creation and startup • Pool termination • Long-term execution/reliability 21

Slide 269

Slide 269 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool Workload • To effectively use a process pool, you need to make sure enough work gets carried out to recover the cost of communication • So, you probably wouldn't do it for just simple operations like adding two numbers • Likewise, you may not want to send massive amounts of data back and forth 22

Slide 270

Slide 270 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool Creation • Pools should only be created by the main thread of an application • If created at script-level, must be protected by a __main__ check 23 p = Process(target=somefunc) if __name__ == '__main__': p = Pool() ... • If you forget, may get in a recursive process creation loop (e.g., fork-bomb) on Windows

Slide 271

Slide 271 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool Shutdown • To be nice, you should shut down pools 24 p = Process(target=somefunc) p.close() # Indicate no further work p.join() # Wait for all pending work to finish • Do this at program termination • To immediately kill p.terminate()

Slide 272

Slide 272 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pools and Shared State • Every worker in a process pool is completely isolated (no sharing) • They do not have access to any state in the master process that created the pool • This is the complete opposite of threads 25 p = Process(target=somefunc)

Slide 273

Slide 273 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pool Startup • Pools have an initialization option for startup 26 p = Process(target=somefunc) def my_init(a,b,c): # Initialize myself ... if __name__ == '__main__': p = Pool(initializer=my_init, initargs=(1,2,3)) ... • This is the only safe way to initialize the state of worker processes in a pool • It is not safe to rely upon the values of global variables set prior to pool creation

Slide 274

Slide 274 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- A Unix Caution • Pool workers will share any open files or sockets that were in use at Pool creation • You might want this • You might not • Could cause "unexplicable" system behavior related to resource management (e.g., files not being closed correctly, etc.) 27 p = Process(target=somefunc)

Slide 275

Slide 275 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pools and Threads 28 p = Process(target=somefunc) • Pools should be created before threads • Created immediately at program startup • Before any other threads are running • Do not create/launch pools within threads • It might "work", but you'll be on weak ground

Slide 276

Slide 276 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Using Process Pools • How to add to your application? • Best advice: Create a single pool at application startup and use it everywhere you want to do work in a subprocess 29 application task1 task3 task2 task4 pool subprocess subprocess subprocess subprocess

Slide 277

Slide 277 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Process Pool Example • Pool is then used as needed • For example, in a task-thread 30 class MyTask(Task): def run(self): while True: msg = self.recv() ... # Go run some CPU-intensive function r = pool.apply(some_func,arg) # Process the result ... • The pool is just a utility that tasks use when they they're going to do expensive work

Slide 278

Slide 278 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Process Pool Tips • Some other good rules of thumb • Functions should not depend on global state and have no side effects • Passed arguments should only consist of simple data structures (strings, nums, tuples, lists, dicts, etc.). • Also: No instance methods allowed 31

Slide 279

Slide 279 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Pools and Instance Methods • An example: 32 pool = Pool() # Does this make any sense? items = [] pool.apply(items.append,123) • If you think about it long enough, you'll realize that it's nonsense (would have to send the whole instance and any modifications would be lost) • Anyways, it's not allowed (get exception)

Slide 280

Slide 280 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Exercise process.2 33

Slide 281

Slide 281 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Processes • The fundamental building block used by multiprocessing is the Process object • It's pretty low-level, but its interface mimics threads • Allows you to launch a specific python function inside a subprocess 34

Slide 282

Slide 282 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Functions as Processes • Launching a function in a process def countdown(count): while count > 0: print "Counting down", count count -= 1 time.sleep(5) if __name__ == '__main__': p1 = multiprocessing.Process(target=countdown, args=(10,)) p1.start() • You create a Process object • Use start() to launch it 35

Slide 283

Slide 283 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- multiprocessing Example • Defining a process by a class import time import multiprocessing class CountdownProcess(multiprocessing.Process): def __init__(self,count): multiprocessing. Process.__init__(self) self.count = count def run(self): while self.count > 0: print "Counting down", self.count self.count -= 1 time.sleep(5) return if __name__ == '__main__': p1 = CountdownProcess(10) # Create the process object p1.start() # Launch the process 36

Slide 284

Slide 284 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Other Process Features • Joining a process (waits for termination) p = Process(target=somefunc) p.start() ... p.join() • Making a daemonic process 37 p = Process(target=somefunc) p.daemon = True p.start() • Terminating a process p = Process(target=somefunc) ... p.terminate() p = Process(target=somefunc) • These mirror similar thread functions

Slide 285

Slide 285 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Commentary • Launching processes is really easy • Correct use of processes is hard • Partly due to platform differences • Also due to the means of process creation • Too many layers of abstraction? 38 p = Process(target=somefunc)

Slide 286

Slide 286 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Process Creation • In order create a new process, multiprocessing carries out a process "fork" • A clone of the calling process is created p1 = Process() p1.start() 39 p = Process(target=somefunc) python python python fork() parallel execution • The clone is identical to the original process

Slide 287

Slide 287 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Process Forking on Unix • Processes are created using os.fork() • Child process is identical to parent • Same state (e.g., variables, instances) • Same open files • Same open sockets • All threads except the caller are discarded • Assume that the worker gets the entire state of the parent process (sans threads) 40 p = Process(target=somefunc)

Slide 288

Slide 288 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Windows Forking • Windows has no such OS feature • Process creation on windows • A new Python process is created, and the process arguments are pickled across a pipe connecting the two processes • The startup time is horrible (vs. Unix) • Can not rely on any sharing • Assume that the worker only gets the parameters passed to it 41 p = Process(target=somefunc)

Slide 289

Slide 289 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Process Coordination • Multiprocessing provides large assortment of primitives for coordinating and communicating with low-level processes • Queue, JoinableQueue, Pipe, etc. • Lock, Rlock, Semaphore, Event, Condition • Shared memory objects, etc. • My advice: Don't bother (seriously) 42 p = Process(target=somefunc)

Slide 290

Slide 290 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Queue Example • A consumer process 43 p = Process(target=somefunc) def consumer(input_q): while True: # Get an item from the queue item = input_q.get() # Process item print(item) • A producer process def producer(output_q): while not done: # Produce some item ... # Put on the output queue output_q.put(item)

Slide 291

Slide 291 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Queue Example • Running the two processes 44 p = Process(target=somefunc) if __name__ == '__main__': from multiprocessing import Process, Queue q = Queue() # Launch the consumer process cons_p = Process(target=consumer,args=(q,)) cons_p.daemon = True cons_p.start() # Run the producer function on some data producer(q)

Slide 292

Slide 292 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Final Comments • Multiprocessing serves a very specific purpose • If your system has multiple CPUs/cores, then pools can be used to take advantage of them • Advice : Do not use Process objects as the foundation for building a concurrent programming framework (like we did for threads) • Too many mind-boggling problems related to the reliable use of process forking and environment 45 p = Process(target=somefunc)

Slide 293

Slide 293 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 3- Exercise process.3 46

Slide 294

Slide 294 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O (a.k.a. Async) 1 Section 4

Slide 295

Slide 295 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Introduction • Many programs are heavily based on I/O • For example, network servers • Due to thread issues, some programmers have turned to alternative concurrency approaches • Usually based on event-driven programming 2 p = Process(target=somefunc)

Slide 296

Slide 296 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Goal • Cover some basic I/O concepts • Discuss event-driven approach • Call attention to problems and limitations 3 p = Process(target=somefunc)

Slide 297

Slide 297 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- I/O Basics • I/O is a fundamental part of most programs • However, there are many different programming models for carrying it out • Blocking • Nonblocking • Polling/multiplexing ("async") 4 p = Process(target=somefunc)

Slide 298

Slide 298 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Blocking I/O • If an I/O operation (e.g., read or write) does not return until the operation actually completes, the operation is said to "block." • Example : s.recv() on a network socket • Normally, this operation temporarily stops your program and waits until some kind of data is available to be read • This is the normal programming model 99% of programmers know about 5 p = Process(target=somefunc)

Slide 299

Slide 299 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Blocking I/O • Underneath the covers, blocking I/O is tied to the underlying operating system and buffering • Every file descriptor (file, socket, etc.) has some internal memory buffers 6 p = Process(target=somefunc) send recv in out OS Kernel Python send recv in out buffers s1 s2

Slide 300

Slide 300 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Blocking I/O • The decision to block is entirely based on the buffer contents • Empty buffers (recv) or full buffers (send) 7 p = Process(target=somefunc) send recv in out OS Kernel Python send recv in out blocked sender (no buffer space) s1 s2 blocked receiver (no data)

Slide 301

Slide 301 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Blocking I/O Rules • Reading : If buffered data is available, return it. Otherwise block until data becomes available. • Writing : If buffer space is available, store output data in the buffer. If no more space is available, block until space becomes available • The buffering aspect of this is essential • Tuned to deal with mismatch between CPU speed and speed of I/O devices 8 p = Process(target=somefunc)

Slide 302

Slide 302 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Caution : Partial Sends • In network programs, send() often only sends the number of bytes that will actually fit into the system buffers • If writing low-level code, you have to check for this and use repeated sends for all data 9 p = Process(target=somefunc) s = socket(AF_INET, SOCK_STREAM) ... index = 0 while index < len(msg): index += s.send(msg[index:]) ... • Alternative: Use s.sendall(data)

Slide 303

Slide 303 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Socket Tuning Parameters • The amount of buffer space can often be tuned 10 p = Process(target=somefunc) s = some socket # Set the receive buffer size s.setsockopt(SOL_SOCKET, SO_RCVBUF, 65536) # Set the send buffer size s.setsockopt(SOL_SOCKET, SO_SNDBUF, 65536) • By changing these, you might be able to improve network performance, flow control, etc. • There are other TCP tuning parameters (see documentation for setsockopt)

Slide 304

Slide 304 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Non-blocking I/O • An alternative I/O model that changes the behavior of I/O waiting • If an I/O operation would have blocked, the operation returns immediately with a raised exception instead of waiting 11 p = Process(target=somefunc)

Slide 305

Slide 305 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Non-blocking Sockets • Example of setting up non-blocking socket I/O 12 p = Process(target=somefunc) from socket import * s = socket(AF_INET, SOCK_STREAM) s.bind(("",15000)) s.listen(5) # Wait for a connection c,a = s.accept() # Turn on nonblocking mode on client connection c.setblocking(False) • Now, try to read from it >>> c.recv(8192) Traceback (most recent call last): File "", line 1, in socket.error: [Errno 35] Resource temporarily unavailable >>>

Slide 306

Slide 306 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Non-blocking Exceptions • Catching a non-blocking error 13 p = Process(target=somefunc) import errno try: data = s.recv(8192) ... except socket.error as e: if e.errno == errno.EWOULDBLOCK: # Would have blocked ... else: # Some other socket error ... • It can get messy fast

Slide 307

Slide 307 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Using Non-blocking I/O • Non-blocking I/O is useful if you're trying to overlap I/O with other kinds of processing • Achieving a kind of concurrency between I/O operations and other computation • It offers a guarantee that an I/O operation won't cause your program to get "stuck." • It's heavily used in some network programming frameworks (especially those based on event handling, generators, and coroutines). 14 p = Process(target=somefunc)

Slide 308

Slide 308 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Exercise io.1 15 p = Process(target=somefunc)

Slide 309

Slide 309 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- I/O Polling/Multiplexing • Polling - An approach where you manually check for I/O activity and respond to it • Typically associated with event-loops 16 p = Process(target=somefunc) while True: ... processing ... if poll_for_io(): process I/O ... ... processing • For example, a program might check for I/O activity every few milliseconds

Slide 310

Slide 310 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- select module • Used to support polling • Provides interfaces to the following • select() - Unix and Windows • poll() - Unix • epoll() - Linux • kqueue() - BSD • kevent() - BSD 17 p = Process(target=somefunc)

Slide 311

Slide 311 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- select() function • Used for I/O multiplexing/polling • Usage : select(rset,wset,eset [,timeout]) 18 p = Process(target=somefunc) readers = [...] # sockets waiting to read writers = [...] # sockets waiting to write exc = [...] # sockets to check for exceptions rset,wset,eset = select(readers,writers,exc) for r in rset: # Handle readers handle_read(r) for w in wset: # Handle writers handle_write(w) for e in eset: # Handle exceptions handle_exception(e)

Slide 312

Slide 312 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- select() function • You just give select() sets of all sockets/file descriptors of interest • select() then returns the sets of descriptors on which different actions can be performed • read - Data is available in read-buffer • write - Buffer space is available • exception - An exceptional condition (meaning depends on the kind of file) 19 p = Process(target=somefunc)

Slide 313

Slide 313 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- select() function • select() blocks until some activity is detected • There is an optional timeout parameter • Use the timeout if some other processing is going on at the same time • For example, if I/O polling is also embedded inside a GUI event loop 20 p = Process(target=somefunc)

Slide 314

Slide 314 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- select() Limitations • There is often an OS limit of 1024 files • This limits the number of sockets/files that can be monitored by a single select() • Performance is O(n), n is # sockets • Causes scalability issues as n gets large • There are workarounds for both issues, but they aren't cross platform and you'll need to experiment (e.g., poll(), epoll(), etc.) 21 p = Process(target=somefunc)

Slide 315

Slide 315 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Exercise io.2 22 p = Process(target=somefunc)

Slide 316

Slide 316 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O • Can use select() to build event-driven systems • The underlying idea is actually pretty simple • You monitor a collection of I/O streams and create a stream of "events" that put pushed into callback or handler functions 23 p = Process(target=somefunc)

Slide 317

Slide 317 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O • First define a handler base class 24 p = Process(target=somefunc) class IOHandler: # Method to return a file descriptor def fileno(self): pass # Reading def readable(self): return False def handle_read(self): pass # Writing def writable(self): return False def handle_write(self): pass

Slide 318

Slide 318 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O • Next, write an I/O event dispatcher using polling 25 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write()

Slide 319

Slide 319 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O • Event handler registration 26 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() Registration and management of event handlers

Slide 320

Slide 320 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O • Collecting handler read/write status 27 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() Collect all of the sockets that want to read or write

Slide 321

Slide 321 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O • Polling for activity handlers that want I/O 28 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() poll

Slide 322

Slide 322 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven I/O • Invocation of I/O callback methods 29 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() invoke handlers for sockets that can read/write

Slide 323

Slide 323 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven "Tasks" • In this framework, applications get implemented as IOHandler objects wrapped around a specific file or socket object 30 p = Process(target=somefunc) class SomeHandler(IOHandler): def __init__(self,sock): self.sock = sock ... def fileno(self): return self.sock.fileno() • The internals don't really matter, but there must be a fileno() method to supply a file descriptor to select()/poll() operations

Slide 324

Slide 324 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven "Tasks" • Tasks must keep internal state that determines if they are interested in reading or writing 31 p = Process(target=somefunc) class SomeHandler(IOHandler): def __init__(self,sock): ... self.wants_to_read = True self.wants_to_write = False ... def readable(self): return self.wants_to_read def writable(self): return self.wants_to_write ... • These methods tell the polling loop what events it should be looking for at any given time

Slide 325

Slide 325 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven "Tasks" • Tasks must define methods to actually handle read/write events 32 p = Process(target=somefunc) class SomeHandler(IOHandler): ... def handle_read(self): ... data = self.sock.recv(8192) ... def handle_write(self): ... self.sock.send(somedata) ... • These methods only get called if the event loop has received some kind of matching event

Slide 326

Slide 326 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Multitasking • To run multiple tasks, you just register multiple handlers with the event loop and run its main event loop 33 p = Process(target=somefunc) dispatcher = EventDispatcher() dispatcher.register(SomeHandler(s1)) # s1 is a socket dispatcher.register(SomeHandler(s2)) # s2 is a socket ... dispatcher.run() • In theory, this set up allows your program to monitor multiple network connections

Slide 327

Slide 327 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Example : Time Handler • Here is a very simple UDP time handler 34 p = Process(target=somefunc) from socket import * import time class TimeHandler(IOHandler): def __init__(self,address): self.sock = socket(AF_INET, SOCK_DGRAM) self.sock.bind(address) def fileno(self): return self.sock.fileno() def readable(self): return True def handle_read(self): msg,addr = self.sock.recvfrom(8192) resp = time.ctime().encode('ascii') self.sock.sendto(resp,addr)

Slide 328

Slide 328 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Example : Time Handler • How to run it 35 p = Process(target=somefunc) dispatcher = EventDispatcher() dispatcher.register(TimeHandler(('',10000))) dispatcher.run() • How to test it >>> from socket import * >>> s = socket(AF_INET, SOCK_DGRAM) >>> s.sendto(b"",("localhost",10000)) >>> s.recvfrom(8192) (b'Thu Dec 23 10:52:07 2010', ('127.0.0.1', 10000)) >>>

Slide 329

Slide 329 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Exercise io.3 36 p = Process(target=somefunc)

Slide 330

Slide 330 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- A Complication • How to handle streaming connections? • Many network applications have long-lived network connections and bidirectional data transmission (TCP) • Further complication: Each connection request creates a new socket to manage 37 p = Process(target=somefunc)

Slide 331

Slide 331 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Solution • To handle long-lived connections, you define a IOHandler just for the client connection • Handler instances are then dynamically added and removed from the event dispatcher as connections are opened and closed • Added when a new connection is received • Removed when a connection is closed 38 p = Process(target=somefunc)

Slide 332

Slide 332 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Example : Echo Client 39 p = Process(target=somefunc) class EchoClientHandler(IOHandler): def __init__(self,sock,addr,dispatcher): self.sock = sock self.dispatcher = dispatcher self.outgoing = bytearray() self.closed = False self.dispatcher.register(self) def fileno(self): return self.sock.fileno() def readable(self): return not self.closed def handle_read(self): msg = self.sock.recv(65536) if not msg: self.closed = True # Closed if not self.outgoing: self.dispatcher.unregister(self) self.close() else: self.outgoing.extend(msg) Read handlers get data and save it in an outgoing data buffer Handle client shutdown

Slide 333

Slide 333 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Example : Echo Client 40 p = Process(target=somefunc) class EchoClientHandler(IOHandler): def __init__(self,sock,addr,dispatcher): self.sock = sock self.dispatcher = dispatcher self.outgoing = bytearray() self.closed = False self.dispatcher.register(self) ... def writable(self): return True if self.outgoing else False def handle_write(self): nsent = self.sock.send(self.outgoing) self.outgoing = self.outgoing[nsent:] if not self.outgoing and self.closed: self.dispatcher.unregister(self) self.sock.close() Write handler sends as much outgoing data as it can Handle client shutdown

Slide 334

Slide 334 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Creating Servers • The handler class we just defined is for already established connections. • Still need to plug it into some kind of server • Recall : This is a traditional TCP server 41 p = Process(target=somefunc) sock = socket(AF_INET, SOCK_STREAM) sock.bind(address) sock.listen(5) while True: client, addr = sock.accept() # Go handle the client • Need to have an event-driven version

Slide 335

Slide 335 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Example : TCP Server • Event-driven server for accepting connections 42 p = Process(target=somefunc) class TCPServerHandler(IOHandler): def __init__(self,address,handler,dispatcher): self.handler = handler self.dispatcher = dispatcher self.sock = socket(AF_INET, SOCK_STREAM) self.sock.setsockopt(SOL_SOCKET, SO_REUSEADDR,1) self.sock.bind(address) self.sock.listen(5) self.sock.setblocking(False) self.dispatcher.register(self) def fileno(self): return self.sock.fileno() def readable(self): return True def handle_read(self): c,a = self.sock.accept() c.setblocking(False) self.handler(c,addr,self.dispatcher)

Slide 336

Slide 336 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Example : TCP Server 43 p = Process(target=somefunc) This is the key part. On each new connection, a new client handler is created and added to the dispatcher class TCPServerHandler(IOHandler): def __init__(self,address,handler,dispatcher): self.handler = handler self.dispatcher = dispatcher self.sock = socket(AF_INET, SOCK_STREAM) self.sock.setsockopt(SOL_SOCKET, SO_REUSEADDR,1) self.sock.bind(address) self.sock.listen(5) self.sock.setblocking(False) self.dispatcher.register(self) def fileno(self): return self.sock.fileno() def readable(self): return True def handle_read(self): c,a = self.sock.accept() c.setblocking(False) self.handler(c,addr,self.dispatcher) • Event-driven server for accepting connections Triggered on each new connection

Slide 337

Slide 337 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Example : Echo Server • Running the server 44 p = Process(target=somefunc) dispatcher = EventDispatcher() serv = TCPServerHandler(('',20000), EchoClientHandler, dispatcher) dispatcher.run() • If it works, you'll be able to open up multiple connections and interact with it • Assuming your head hasn't exploded

Slide 338

Slide 338 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Exercise io.4 45 p = Process(target=somefunc)

Slide 339

Slide 339 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Commentary • The overall concept underlying the last example is the basis for the much misunderstood (or maligned?) asyncore library module • Also the basis for the Twisted framework • As you will observe, the resulting programs can respond to multiple I/O channels without threads or processes 46 p = Process(target=somefunc)

Slide 340

Slide 340 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Events and Asyncore • asyncore standard library module • Implements a wrapper around sockets that turn all blocking I/O operations into events 47 p = Process(target=somefunc) s = socket(...) s.accept() s.connect(addr) s.recv(maxbytes) s.send(msg) ... from asyncore import dispatcher class MyApp(dispatcher): def handle_accept(self): ... def handle_connect(self): ... def handle_read(self): ... def handle_write(self): ... # Create a socket and wrap it s = MyApp(socket())

Slide 341

Slide 341 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Using Asyncore • You manipulate wrapped sockets that operate both as normal sockets and as event dispatchers • The general idea is the same as what we just covered so I won't go into more detail here • "Python Essential Reference, 4th Ed." has some detailed examples of using asyncore and the related asynchat module 48 p = Process(target=somefunc)

Slide 342

Slide 342 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Twisted 49 • A large event-driven framework built around I/O polling and multiplexing concepts http://twistedmatrix.com • It's similar to what you would get if you started with asyncore and then built the entire universe on top of it using nothing but event handlers and callbacks

Slide 343

Slide 343 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Twisted Example 50 • Here is an echo server in Twisted (straight from the manual) from twisted.internet.protocol import Protocol, Factory from twisted.internet import reactor class Echo(Protocol): def dataReceived(self, data): self.transport.write(data) def main(): f = Factory() f.protocol = Echo reactor.listenTCP(45000, f) reactor.run() if __name__ == '__main__': main() An event callback Running the event loop

Slide 344

Slide 344 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Event Driven Issues 51 • All event-driven I/O systems have a variety of really tricky programming issues • Scalability • Long-running calculations • Blocking operations • Interoperability with other code • Let's briefly discuss

Slide 345

Slide 345 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Scaling Problems 52 • Event-driven I/O is often based on polling mechanisms such as select() or poll() • Both of those operations scale rather poorly as the number of monitored objects increases • So, as the number of clients increases, more and more time is going to be spent performing the poll operation • A real issue if there is rapid messaging (although solvable in a non-portable manner)

Slide 346

Slide 346 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- A Scaling Benefit 53 • One benefit of event-driven I/O is that it has predictable resource use (especially memory) • Reduced context-switching (?) • Having thousands of open files/sockets doesn't really consume any significant memory • Unlike threads, you don't have to allocate extra stack space and other resources

Slide 347

Slide 347 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Long-Running Calculations 54 • If an event handler runs a long calculation, it blocks everything until it completes • Example : Parsing a large XML message • Remember, there are no threads or preemption • This would manifest itself as program "stall". You've probably seen this with GUIs.

Slide 348

Slide 348 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Blocking Operations 55 • Event-driven systems also have a really hard time dealing with blocking operations • Reading from the file system • Performing database queries • Connecting to other services • If any of these operations take place in an event handler, the entire server/application stalls until it completes (no threads)

Slide 349

Slide 349 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- 56 Blocking Illustrated

Slide 350

Slide 350 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- The Blocking Problem 57 • Consider this code... class ApplicationHandler(object): ... def handle_request(self): ... results = db.execute("select * from table where ...") for r in results: ... A database query (blocks?) • Everything waits until the callback method finishes its execution • An issue if it happens to take a long time Processing of the results (CPU-intensive?)

Slide 351

Slide 351 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Using Threads 58 • Blocking operations might be handed to a separate thread/process to avoid stalling class ApplicationHandler(object): ... def handle_request(self): ... launch_thread(do_query, "select * from table where ...") • But there is the tricky of problem of coordinating what happens upon completion

Slide 352

Slide 352 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Using Threads 59 • Common approach: Completion Callback class ApplicationHandler(object): ... def handle_request(self): ... launch_thread(do_query, "select * from table where ...", callback=self.process_results) def process_results(self,results): ... continued processing • Behind the scenes, system will run the operation in a separate thread, collect the result, coordinate with the event-handling framework, and then fire the callback

Slide 353

Slide 353 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Exercise io.5 60

Slide 354

Slide 354 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Workers and Polling 61 • Commentary: Coordinating workers and I/O polling is a lot trickier than it looks select() Event Loop Worker Thread select() launch thread Event Loop result callback(result) What happens here?

Slide 355

Slide 355 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Workers and Polling 62 • Problem: select() only works with sockets • Workers utilize queues, messaging, other means Event Loop sockets queues select() ??? • Question: How do you make this work?

Slide 356

Slide 356 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Selectable Queues 63 • Common approach: Attach a loopback socket Event Loop • Sample code (ugly, but portable) queue getsocket putsocket connected select listens on # Create a pair of connected sockets server = socket(AF_INET, SOCK_STREAM) server.bind(('127.0.0.1', 0)) server.listen(1) putsocket = socket(AF_INET,SOCK_STREAM) putsocket.connect(server.getsockname()) getsocket, _ = server.accept() server.close()

Slide 357

Slide 357 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Selectable Queues 64 Event Loop • Add internal I/O to get/put operations class SelectableQueue(queue.Queue): def __init__(self): super().__init__() # Set up the connected sockets ... def put(self,item): super().put(item) self.putsocket.send(b"x") def get(self): self.getsocket.recv(1) return super().get() def fileno(self): return self.getsocket.fileno() key idea: puts do I/O for triggering select()

Slide 358

Slide 358 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Interoperability Problems 65 • Event-driven programming tends to force an event-driven programming style across your entire application program • This includes all external libraries and everything else used by your application • However, most programming libraries are not written in an event-driven style • For instance, the entire standard library

Slide 359

Slide 359 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Personal Bias 66 • I don't like event-driven I/O programming • Applying it across a large application is a very good way to create unmaintainable code that's a maze of twisty little passages, all different. • I put it in the same category as assembly code (although not as easy to follow - sic) • Okay to use it internally (in libraries), but don't expose it to the rest of the world

Slide 360

Slide 360 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Final Words 67 • Event-driven I/O is still useful is certain domains • Programs that are already event-based (for example GUI programs) • Games • Most normal network programs are better served by using threads or processes • Especially small-to-medium scale projects

Slide 361

Slide 361 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 4- Exercise io.6 68

Slide 362

Slide 362 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Message Passing and Serialization 1 Section 5

Slide 363

Slide 363 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Concept: Message Passing • Multiple independent copies of the Python interpreter (or programs in other languages) • Running in separate processes • Possibly on different machines • Sending/receiving messages 2

Slide 364

Slide 364 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Commentary • Message passing is a well-established technique for concurrent programming • It has been successfully scaled up to systems involving tens of thousands of processors (e.g., supercomputers, Linux clusters, etc.) • The foundation of distributed computing • We've already covered some basic ideas 3

Slide 365

Slide 365 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Message Passing 4 Process Process send() recv() connection • On the surface, it's really simple • Processes only send and receive messages • There are really only two main issues • What is a message? • How is it transported?

Slide 366

Slide 366 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- An Issue • There is no universally accepted programming interface or implementation of messaging • There are dozens of different packages that offer different features and options • Covering every possible angle of message passing interfaces is simply impossible here • And a reference manual would be rather dull 5

Slide 367

Slide 367 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Massive Complexity • There are actually many other issues • Reliability, redundancy, fault tolerance • Security, authentication, encryption • Performance (bandwidth, latency, load- balancing, quality of service, etc.) • Routing, network topology • Interoperability, systems integration • Messaging libraries are often a terror 6

Slide 368

Slide 368 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Section Focus • Our focus is going to be on general programming idioms related to messaging • This mostly concentrates on the boundary between Python and the messaging layer 7 message transport message library Python message library Python Our Focus recv send send recv

Slide 369

Slide 369 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- What is a Message? • Usually just a collection of bytes (a buffer) • A "serialized" representation of some data 8

Slide 370

Slide 370 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Message Encoding 9 • Messages have to be formatted or encoded in some manner that enables transport • To send, a message is encoded • To receive, a message is decoded

Slide 371

Slide 371 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Encoding Example 10 • A minimal encoding (size prefixed bytes) size Message (bytes) • Message is just bytes with a size header • No interpretation of the bytes (opaque) • So, payload could be anything at all (any encoding, any programming language, etc.)

Slide 372

Slide 372 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Message Transport • Messages have to be transmitted (somehow) between running processes • Inter-Process Communication (IPC) • Some low-level communication primitives • Pipes • FIFOs • Sockets (Network Programming) 11

Slide 373

Slide 373 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Pipes • A file-like abstraction that allows a byte stream to be transmitted across processes • Perhaps the most portable way to set up a pipe is to use the subprocess module • A standard library module for launching subprocesses that works cross-platform 12

Slide 374

Slide 374 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- An Example • Launching a subprocess and hooking up the child process via a pipe • Use the subprocess module 13 import subprocess p = subprocess.Popen(['python','child.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) p.stdin.write(data) # Send data to subprocess p.stdout.read(size) # Read data from subprocess Parent p.stdin p.stdout Child sys.stdin sys.stdout Pipe

Slide 375

Slide 375 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Named Pipes/FIFOs • It is also possible to set up a named pipe • Creating (in Unix) 14 import os os.mkfifo("/tmp/myfifo") • Using in different processes f = open("/tmp/myfifo","wb") f.write("Some data\n") f.flush() f = open("/tmp/myfifo","rb") line = f.readline() Writer Reader • Note : Not on Windows with this API

Slide 376

Slide 376 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Sockets • Setting up a listener 15 # Set up a listener s = socket(AF_INET,SOCK_STREAM) s.bind(("",12345) s.listen(5) c,a = s.accept() • Connecting as a client s = socket(AF_INET,SOCK_STREAM) s.connect(("localhost",12345))

Slide 377

Slide 377 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Exercise msg.1 16

Slide 378

Slide 378 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- High-Level Messaging 17 • There are many messaging frameworks • AMQP • ØMQ • RabbitMQ • Celery • Common theme : Putting a higher-level interface on top of sockets, pipes, etc.

Slide 379

Slide 379 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Messaging Features 18 • Support for common messaging patterns • Push/Pull (Queues) • Request/Reply • Publish/subscribe • Reliability/scalability features • Load balancing • Durable connections

Slide 380

Slide 380 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Example : ØMQ 19 • ZeroMQ (http://www.zeromq.org/) • In a nutshell : Message-based sockets • In their own words... "A ØMQ socket is what you get when you take a normal TCP socket, inject it with a mix of radioactive isotopes stolen from a secret Soviet atomic research project, bombard it with 1950-era cosmic rays, and put it into the hands of a drug-addled comic book author with a badly- disguised fetish for bulging muscles clad in spandex." • I would cautiously agree

Slide 381

Slide 381 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Example : ØMQ 20 • Here's an example of an echo server: • That's it • And this server can already handle requests from 100s (or 1000s) or connected clients # echoserver.py import zmq context = zmq.Context() sock = context.socket(zmq.REP) sock.bind("tcp://*:6000") while True: message = sock.recv() # Get a message sock.send(b"Hi:"+message) # Send a reply

Slide 382

Slide 382 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Example : ØMQ 21 • Here's an example of a echo client • That's also pretty simple (it just works) # echoclient.py import zmq context = zmq.Context() sock = context.socket(zmq.REQ) sock.connect("tcp://localhost:6000") sock.send(b"Spam") # Send a request resp = sock.recv() # Get response print(resp)

Slide 383

Slide 383 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Example : ØMQ 22 • Some cool features • Can start server or client in any order • Clients can connect to multiple servers (load balancing, redundancy, etc.) • Variety of socket types (Reply, Request, Push, Pull, Publish, Subscribe, etc.)

Slide 384

Slide 384 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Example : ØMQ 23 • Example client connected to multiple servers • Request gets sent to one of the servers • Think about scaling, redundancy, etc. import zmq context = zmq.Context() sock = context.socket(zmq.REQ) sock.connect("tcp://host1.com:6000") sock.connect("tcp://host2.com:7000") sock.connect("tcp://host3.com:7000") sock.send(b"Spam") # Send a request resp = sock.recv() # Get response

Slide 385

Slide 385 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Exercise msg.2 24

Slide 386

Slide 386 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Security Concerns 25 • Messaging systems are meant to be used internally, not exposed to end-users Web server HTTP "Mom" Messaging • Used by all of the back-end code--hidden away in dark server rooms, etc.

Slide 387

Slide 387 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Security Concerns 26 • As a general rule, you don't want internal messaging to be exposed to the outside • The usual techniques apply • Firewalls • Secure sockets (SSL) • Digital certificates, public/private key • VPNs

Slide 388

Slide 388 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Message Authentication 27 • You may have situations where messaging uses an exposed connection (e.g., allowing anyone to connect if they know the right address and port) • For this, you probably want to have some kind of authentication scheme • Common approach : Use cryptographic hash authentication (MD5, SHA, etc.)

Slide 389

Slide 389 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Hashing Authentication 28 • Both endpoints pick a secret key/password key = b"peekaboo" key = b"peekaboo" • Just to emphasize--it's secret

Slide 390

Slide 390 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Hashing Authentication 29 • Connection involves challenge/response 01ea7f6a72fae8129 client server connect challenge make random bytes compute hash digest (with secret key) 01ea7f6a72fae8129 compute hash digest (with secret key) 65aef25472 65aef25472 response 65aef25472 = yes? (authenticated)

Slide 391

Slide 391 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- hmac module 30 • Python has a library for helping (hmac) import os import hmac key = b"peekaboo" client, addr = sock.accept() message = os.urandom(32) client.send(message) hash = hmac.new(key,message) digest = hash.digest() response = client.recv() if digest == response: # Authenticated import hmac key = b"peekaboo" sock.connect() message = sock.recv() hash = hmac.new(key,message) digest = hash.digest() sock.send(digest) Challenge (Server) Response (Client) • See IETF RFC 2104

Slide 392

Slide 392 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Commentary 31 • This is only an authentication scheme • Can be used to keep unwanted clients and outsiders from establishing a connection to the messaging framework • Reasonably secure (based on difficulty of breaking cryptographic hashes) • It's not encryption. Messages themselves could be seen by a packet sniffer, etc. (although key is never transmitted)

Slide 393

Slide 393 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Exercise msg.3 32

Slide 394

Slide 394 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Object Serialization 33 • Serialization of Python objects • pickle • Serialization of foreign (binary) data • struct

Slide 395

Slide 395 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- The Problem 34 • How to serialize Python objects? • Lists, dictionaries, sets, instances, etc. • An issue here is that Python is extremely flexible with respect to data types • Containers can also hold mixed data • There is no easy format for describing Python objects (e.g., a simple array or by fixed binary data structures)

Slide 396

Slide 396 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- pickle Module • A module for serializing Python objects 35 • Serializing an object onto a "file" import pickle ... pickle.dump(someobj,f) • Unserializing an object from a file someobj = pickle.load(f) • Here, a file might be a file, a pipe, a wrapper around a socket, etc.

Slide 397

Slide 397 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- pickle Compatibility • What objects are compatible? • Nearly any object that consists of data • None, numbers, strings • Tuples, lists, dicts, sets, etc. • Instances of objects • Functions and classes (tricky) • The underlying message encoding is "self- describing" (which hides a lot of details) 36

Slide 398

Slide 398 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- pickle Incompatibility • Objects not compatible with pickle • Anything involving system or runtime state • Open files, sockets, etc. • Threads • Running generator functions • Stack frames • Closures 37

Slide 399

Slide 399 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Pickling to Strings • Pickle also creates byte strings import pickle # Convert to a string s = pickle.dumps(someobj) ... # Load from a string someobj = pickle.loads(s) • This can be used if you need to embed a Python object into some other messaging protocol or data encoding 38

Slide 400

Slide 400 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Pickling Class Instances • Pickle supports class instances class Point: def __init__(self,x,y): self.x = x self.y = y • Example: 39 p = Point(23,24) pickle.dump(p,f) • Caveat: Class definition must be on receiver • Advice : Don't send instances around in messaging systems (too fragile)

Slide 401

Slide 401 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Pickle and Large Objects • Large objects can cause problems • Depending on the object, pickle might make a memory copy of the entire object (either while sending or during reconstruction) • Example: Arrays (array module) 40 import array a = array.array('i',range(10000000)) ... pickle.dump(a,f) # Makes a memory copy (40MB) • Better to split into smaller messages

Slide 402

Slide 402 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Classes and Functions • Functions and classes can be pickled, but they are only name references 41 # foo.py def bar(x,y): return x+y • Example of pickled data >>> import pickle >>> pickle.dumps(bar) 'cbar\nfoo\np0\n.' >>> • When unpickled, the name references are resolved by importing the needed modules Notice module and function name

Slide 403

Slide 403 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Pickle Security • It's not secure at all • Never use pickle with untrusted clients (malformed pickles can be used to execute arbitrary system commands) • Bottom line : Never receive pickled data on an untrusted or unauthenticated connection 42

Slide 404

Slide 404 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Miscellaneous Comments • Pickle is really only useful for Python • Would not use if you need to communicate to other programming languages • However, you can do some pretty amazing things with it if Python is your environment • There is already built-in messaging support 43

Slide 405

Slide 405 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Connection Objects • multiprocessing provides Listener and Client objects that transmit pickled data • A Listener (receives connections) 44 p = Process(target=somefunc) from multiprocessing.connection import Listener serv = Listener(('',15000),authkey='12345') # Wait for a connection client = serv.accept() # Now, wait for messages to arrive while True: msg = client.recv() # process the message

Slide 406

Slide 406 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Connection Objects • Example Client 45 p = Process(target=somefunc) from multiprocessing.connection import Client conn = Client(('localhost',15000),authkey='12345') conn.send(msg) • You will notice a similarity to sockets • Except that it's much higher level and it sends pickled objects

Slide 407

Slide 407 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Connection Objects • Some important features of connections • Authentication (uses HMAC, a technique based on message digests such as SHA) • Instead of bytes, you send Python objects • Data is encoded using pickle • Is extremely useful if you're just going to hook two Python interpreters together 46 p = Process(target=somefunc)

Slide 408

Slide 408 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Exercise msg.4 47

Slide 409

Slide 409 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Foreign Objects • If dealing with C/C++ extensions or foreign software (non-python), you often turn to binary-encoded data • Such software might transmit data in a binary encoding of some sort • If you know what you are using, you can bridge Python to it as long as you speak the protocol 48

Slide 410

Slide 410 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Binary Protocols • There are some standard binary messaging protocols in use • Protocol Buffers (Google) • Thrift (Facebook) • BSON (MongoDB??) • All of these have Python libraries and tools • Under the covers, they have to deal with binary data encoding/decoding 49

Slide 411

Slide 411 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- struct module • Packs/unpacks binary records and structures • Example: Suppose you had this structure struct Stock { char name[8]; int shares; double price; }; 50 • Now, suppose you wanted to encode/ decode raw byte streams with that record? name shares price 8 bytes 4 bytes 8 bytes

Slide 412

Slide 412 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- struct module • First, create a Struct object from struct import Struct StockStruct = Struct("8sid") 51 • Structure is described by a "format string" "8s" = char [8] "i" = int "d" = double • To write the format string, you have to precisely know what the structure is • And you need to know the format codes...

Slide 413

Slide 413 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- struct module • Packing/unpacking codes (based on C) 'c' char (1 byte string) 'b' signed char (8-bit integer) 'B' unsigned char (8-bit integer) 'h' short (16-bit integer) 'H' unsigned short (16-bit integer) 'i' int (32-bit integer) 'I' unsigned int (32-bit integer) 'l' long (32 or 64 bit integer) 'L' unsigned long (32 or 64 bit integer) 'q' long long (64 bit integer) 'Q' unsigned long long (64 bit integer) 'f' float (32 bit) 'd' double (64 bit) 's' char[] (String) 'p' char[] (String with 8-bit length) 'P' void * (Pointer) 52

Slide 414

Slide 414 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- struct module • Each code may be preceded by a repetition count '4i' 4 integers '20s' 20-byte string • Integer alignment and byte order modifiers '@' Native byte order and alignment '=' Native byte order, standard alignment '<' Little-endian, standard alignment '>' Big-endian, standard alignment '!' Network (big-endian), standard align • Only one modifier is allowed and it goes first 53 '

Slide 415

Slide 415 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Structure Alignment • By default, structure fields are concatenated together with no padding or alignment '8sid' • Be aware that this differs from C/C++ struct Stock { char name[8]; int shares; double price; }; • Use '@' if native C alignment is needed 54 name shares price 8 bytes 4 bytes 8 bytes name shares shares price 0-7 8-15 16-23 unused padding StockStruct = Struct("@8sid")

Slide 416

Slide 416 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Packing Binary Records • Packing Python values into a byte string from struct import Struct StockStruct = Struct("8sid") stock = { 'name' : 'GOOG', 'shares' : 100, 'price' : 490.10 } rawbytes = StockStruct.pack(stock['name'], stock['shares'], stock['price']) >>> rawbytes 'GOOG\x00\x00\x00\x00d\x00\x00\x00\x9a\x99\x99\x99\x99\xa1~@' 55 • You do this if you're sending/writing

Slide 417

Slide 417 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Unpacking Records • Unpacking a byte string into a tuple from struct import Struct StockStruct = Struct("8sid") rawbytes = f.read(StockStruct.size) name, shares, price = StockStruct.unpack(rawbytes) stock = { 'name' : name.strip('\x00'), 'shares' : shares, 'price' : price } 56 • You do this for reading/receiving • If you need to create other datatypes (dicts, instances), that's an extra step

Slide 418

Slide 418 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Performance Tip • struct has some module-level functions 57 struct.pack("8sid", "GOOG", 100, 490.10) struct.unpack("8sid", rawbytes) • They can be used without having to create a special Struct object • However, they don't run as fast because they have to interpret the format each time • Probably best avoided in code that is doing a lot of packing/unpacking

Slide 419

Slide 419 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- struct Cautions • Binary records are inherently "unportable" • Great attention to detail is required • Encoding can vary by platform (e.g., 32-bit vs. 64 bits) • Still useful if you have control over the environment and you know what you're doing 58

Slide 420

Slide 420 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 5- Exercise msg.5 59

Slide 421

Slide 421 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Distributed Computing 1 Section 6

Slide 422

Slide 422 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Introduction • Independent systems, networks, and message passing are the basis of distributed computing • So far, we've covered some basic design patterns and underlying mechanics • Topics not yet covered : more advanced messaging techniques • Connecting to the outside world (foreign systems, interoperability, etc.) 2 p = Process(target=somefunc)

Slide 423

Slide 423 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Major Topics • More on messaging (actors, registries, message brokers, etc.) • Distributed data (key-value stores) • Remote Procedure Call (RPC) • Distributed Objects • Interoperability, foreign systems, etc. 3 p = Process(target=somefunc)

Slide 424

Slide 424 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Tasks Revisited • In the thread section, we defined tasks that received and acted upon messages sent to them 4 p = Process(target=somefunc) class MyTask(Task): def run(self): while True: msg = self.recv() # Get a message ... # Do something with it ... m = MyTask() m.start() m.send(msg) # Send a task a message • Formally: this is an example of an "actor"

Slide 425

Slide 425 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Actors • Many actors might work together 5 p = Process(target=somefunc) actor actor actor send() send() send() • Again, independent tasks sending messages actor send() send()

Slide 426

Slide 426 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Features of Actors • Some desirable characteristics • No shared state (messages only) • One operation : sending a message • Messages are asynchronous • Concurrent execution • Again, we already built all of this 6 p = Process(target=somefunc)

Slide 427

Slide 427 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Why Study Actors? • The programming model is very minimal • Thus, you can understand it (maybe) • There is a large body of theoretical knowledge (computer scientists have been studying them since the 1970s) • Techniques involving actors are applicable to more advanced scenarios 7 p = Process(target=somefunc)

Slide 428

Slide 428 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.1 8

Slide 429

Slide 429 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Problem : Decoupling • Consider an application built from a large collection of independent actors/tasks • How do the actors find and link to each other? • What happens if actors crash, restart, etc? • As a general rule, you want decoupling 9 p = Process(target=somefunc)

Slide 430

Slide 430 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Tight Coupling • Don't want: Actors programmed so that they hold direct references to other actor instances 10 p = Process(target=somefunc) class Actor(Task): def __init__(self,target): self.target = target def run(self): ... self.target.send(msg) ... • This results in a very rigid/fragile design • Makes it nearly impossible to make adjustments to the system organization Actor Actor .target

Slide 431

Slide 431 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Name Registry • Better : Indirectly refer to actors through some kind of naming registry 11 p = Process(target=somefunc) _registry = {} def register(name, actor): _registry[name] = actor def unregister(name): del _registry[name] def lookup(name): return _registry.get(name) • Give all actors an identifying name • Have them register/unregister as needed

Slide 432

Slide 432 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Name Registry • Registration of actors 12 p = Process(target=somefunc) register("spam", SpamActor()) register("foo", FooActor()) register("bar", BarActor()) • This is just building a centralized table _registry = { 'spam' : , 'foo' : , 'bar' : , ... }

Slide 433

Slide 433 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Redefining send() • Perform all messaging through a global function that relies upon the registry 13 p = Process(target=somefunc) def send(target_name,msg): target = lookup(target_name) if target: target.send(msg) • All actors now always use the global send() class Actor(Task): def __init__(self,target_name): self.target_name = target_name def run(self): ... send(self.target_name, msg) ...

Slide 434

Slide 434 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.2 14

Slide 435

Slide 435 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Distributed Actors • To distribute actors, you additionally need to have some kind of IPC/networking component 15 p = Process(target=somefunc) process 1 process 2 • You can use the earlier messaging techniques • multiprocessing, ØMQ, etc.

Slide 436

Slide 436 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Distributed Send • To implement distributed actors, you must focus on send() and actor names 16 p = Process(target=somefunc) local process local actors remote process remote actors "a" "b" "c" "d" "e" "f" "g" send() • Essentially, send() has to seamlessly work with both local and remote actors

Slide 437

Slide 437 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Proxy Actors • Messages can be directed to a remote system through the use of a proxy 17 p = Process(target=somefunc) local process remote process remote actors "e" "f" "g" send() proxy "e" • A proxy receives messages for a remote actor and forwards them to a remote process "a" proxy "g" send()

Slide 438

Slide 438 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Proxy Implementation • Implementing a proxy 18 p = Process(target=somefunc) class ProxyTask(Task): def __init__(self,proxyname,target,conn): super().__init__(name="proxy") self.proxyname = proxyname self.target = target self.conn = conn def run(self): try: while True: msg = self.recv() conn.send((self.target,msg)) finally: unregister(self.proxyname) • Receives messages and forwards them on some kind of connection

Slide 439

Slide 439 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Creating Proxies 19 p = Process(target=somefunc) • Create a utility function for it from multiprocessing.connection import Client def proxy(proxyname,target,address,authkey): conn = Client(address,authkey=authkey) pxy = ProxyTask(proxyname,target,conn) pxy.start() register(proxyname,pxy) proxy("e","e",("localhost",15000),authkey=b"12345") proxy("ext:f","f",("localhost",15000),authkey=b"12345") # Send a message to a remote actor send("e","hello world") send("ext:f","hello world") • Creates a connection using multiprocessing and registers a proxy task to accept messages

Slide 440

Slide 440 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Message Dispatching • A dispatcher is needed to receive messages 20 p = Process(target=somefunc) local process actors "e" "f" "g" • The dispatcher is a server that accepts connections, receives messages, and forwards them to the local actors "a" dispatch proxy "e" proxy "g"

Slide 441

Slide 441 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Message Dispatcher 21 p = Process(target=somefunc) class DispatchClientTask(Task): def __init__(self,conn): super().__init__(name="dispatchclient") self.conn = conn def run(self): try: while True: target,msg = self.conn.recv() send(target,msg) finally: self.conn.close() • First, you need a task that receives messages from the proxy class • It just takes messages from the connection and sends them locally message handling

Slide 442

Slide 442 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Message Dispatcher 22 p = Process(target=somefunc) from multiprocessing.connection import Listener class DispatcherTask(Task): def __init__(self,address,authkey): super().__init__(name="dispatcher") self.address = address self.authkey = authkey def run(self): serv = Listener(self.address,authkey=self.authkey) while True: try: client = serv.accept() DispatchClientTask(client).start() except Exception as e: self.log.info("Error : %s", e, exc_info=True) • Next, you need a server that accepts connections • Launches a new client task each connection connection handling

Slide 443

Slide 443 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Message Dispatcher 23 p = Process(target=somefunc) _dispatcher = None def start_dispatcher(address,authkey): global _dispatcher if _dispatcher: return _dispatcher = DispatcherTask(address,authkey) _dispatcher.start() • Finally, a function to start the dispatcher • Important point • There is usually only one dispatcher • A singleton

Slide 444

Slide 444 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Using the Dispatcher • You launch it and forget about it 24 p = Process(target=somefunc) start_dispatcher(("localhost",15000),authkey=b"12345") • It operates entirely in the background and doesn't interfere with other tasks • It just delivers outside messages

Slide 445

Slide 445 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Putting it All Together • Every process will have • A set of local actors/tasks • A dispatcher • A set of proxies for remote actors 25 p = Process(target=somefunc)

Slide 446

Slide 446 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Big Picture 26 p = Process(target=somefunc) "b" "a" dispatch proxy proxy send() send() Incoming messages from remote actors Messages sent to remote actors • Each process local actors • Many parts working together • Note: Think about failure modes

Slide 447

Slide 447 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.3 27

Slide 448

Slide 448 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Actor-Actor Messaging • So far, actors have been sending messages directly to other actors 28 p = Process(target=somefunc) actor actor • This is fine, but what if you want to support some more advanced features? • Example : Replication, load balancing, etc.

Slide 449

Slide 449 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Actor-Actor Messaging • Example: An actor pool 29 p = Process(target=somefunc) actor actor actor actor pool • Message goes to one actor in the pool (selected round-robin, system load, or some other criteria)

Slide 450

Slide 450 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Problem • How much does the sender of a message have to know about its destination? • Does it have to know about that pool? • Does it have to do all of the routing? • Or should it be blissfully unaware? 30 p = Process(target=somefunc)

Slide 451

Slide 451 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Message Brokers • Messages might be sent to an intermediary 31 p = Process(target=somefunc) actor actor actor actor pool • Broker is responsible for handling the message in some manner (selecting a target, routing, etc.) broker

Slide 452

Slide 452 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Message Brokers • Decoupling is an essential feature • sender doesn't have to know anything about how the broker operates • This kind of approach is essential for scaling and other features • Broker could transparently add/remove actors from the pool depending on load 32 p = Process(target=somefunc)

Slide 453

Slide 453 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Broker Downsides • Message brokers become critical parts that can never fail under any circumstance • Failure of a broker is far more serious than loss of a single node (e.g., it takes all 1000 servers offline instead) • Obvious solution (sic) is to add even more complexity (replicated brokers, managers for the brokers, etc.). 33 p = Process(target=somefunc)

Slide 454

Slide 454 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Reliability Assumptions • In distributed code, never assume that the mere act of sending a message is reliable • There are too many things that can go wrong in too many places • So, better to plan for the worst • We'll say more in a minute, but if a message must be delivered, you need to take extra steps to verify 34 p = Process(target=somefunc)

Slide 455

Slide 455 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.4 35

Slide 456

Slide 456 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Actor Addressing • Trying to keep track of actor locations is hard • Especially if you force programmers to do it manually by hard-coding everything • Better solution: Create a global registry for mapping actor names to dispatchers (hosts) • Basically, a shared table that tracks actor names and locations 36 p = Process(target=somefunc)

Slide 457

Slide 457 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Name Registry 37 p = Process(target=somefunc) process 1 process 2 "a" "b" "c" "d" "e" "f" "g" dispatcher dispatcher ("foo.com",15000) ("bar.com",16000) "a" : ("foo.com",15000) "b" : ("foo.com",15000) "c" : ("foo.com",15000) "d" : ("foo.com",15000) "e" : ("bar.com",16000) "f" : ("bar.com",16000) "g" : ("bar.com",16000) global registry

Slide 458

Slide 458 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Registry Details • In each process, the registry is consulted whenever send() is used to send a message to an unknown actor (maybe the registry knows) • To implement the registry, you need to the address the problem of managing state across the entire application • Problem : Registry has to be available to all tasks, all machines, etc. 38 p = Process(target=somefunc)

Slide 459

Slide 459 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Building a Registry • Registry is essentially just a centralized table • A sensible option: Use a key-value store • It's exactly what it sounds like--a dictionary • You can easily build your own • Or use an existing one : memcached, redis, CouchDB, MongoDB, Cassandra, 39 p = Process(target=somefunc)

Slide 460

Slide 460 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.5 40

Slide 461

Slide 461 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Key-value Stores • Key-value stores can be used for a variety of other purposes (more than just a registry) • Maintain system-wide configuration data • Store results from distributed calculation • Provide work queues • Etc. 41 p = Process(target=somefunc)

Slide 462

Slide 462 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Example: Results 42 p = Process(target=somefunc) actor actor actor • Obtaining results actor key-value DB get results • Example : Actor sends out some message that disappears into a "cloud" of other actors • Picks up results by watching the DB. request

Slide 463

Slide 463 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.6 43

Slide 464

Slide 464 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Request/Reply Messaging • With actors, communication is one way 44 p = Process(target=somefunc) actor actor send() • However, two-way messaging is also common client server request reply • Mainstay of client/server computing • It's also the most "tricky"

Slide 465

Slide 465 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Request/Reply Issues • Request/reply messaging adds connection state 45 p = Process(target=somefunc) send request wait request request wait reply reply client send reply server • If anything goes wrong, the connection may enter a deadlocked or invalid state

Slide 466

Slide 466 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Potential Problems • Failure modes for request/reply • Request message sent, but is lost • Server crashes before sending reply • Reply message gets lost • In all of these cases : client loses the connection or freezes waiting for a reply that never arrives 46 p = Process(target=somefunc)

Slide 467

Slide 467 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Fixing Lost Replies • Lost requests/replies can be fixed • Have client retry requests after timeout/crash • However, this opens up even more problems 47 p = Process(target=somefunc)

Slide 468

Slide 468 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Repeated Requests • Server may act upon duplicate requests • Maybe it was only the first reply that was lost 48 p = Process(target=somefunc) client • What if the request changes server state? • Example: "Do not hit reload or your credit card might be charged twice." server request reply lost??? request reply retry

Slide 469

Slide 469 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Repeated Replies • Client might get two replies (slow server) 49 p = Process(target=somefunc) client server request reply request reply retry request reply • Client should be smart enough to discard reply from the duplicated request Notice how retried request results in a duplicate reply Which request?

Slide 470

Slide 470 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Sequence Numbers • Duplicates fixed with sequence numbers 50 p = Process(target=somefunc) client server request reply request reply retry request reply • Included in every request/reply and used to detect duplicate transactions, etc. seq: 13 seq: 13 seq: 13 seq: 14 seq: 13 seq: 14 discard

Slide 471

Slide 471 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Commentary • Using reliable messaging such as TCP means that you don't have to worry about any of this • Wrong. Dead wrong. • If you write code where a reply is expected, you have to account for failure • What if server dies unexpectedly (software crash, hardware failure, power-loss, etc.) 51 p = Process(target=somefunc)

Slide 472

Slide 472 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.7 52

Slide 473

Slide 473 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Remote Procedure Call • Remote invocation of procedures implemented on a server process 53 p = Process(target=somefunc) Server def foo(): ... def bar(): ... def spam(): ... Client s.foo() Client s.bar()

Slide 474

Slide 474 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Remote Procedure Call • RPC implementation uses a similar technique as used for distributed actors (dispatcher, proxies) 54 p = Process(target=somefunc) Server def foo(): ... def bar(): ... def spam(): ... Client s.foo() Client s.bar() Dispatcher proxy proxy

Slide 475

Slide 475 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- RPC Details • RPC messages simply identify a method name and include method arguments 55 p = Process(target=somefunc) # funcname = name of function # args = tuple of positional args # kwargs = dict of keyword args msg = (funcname, args, kwargs) # Make an RPC message send(target, msg) # Send it somewhere • In the server, just dispatch # Get a message funcname, args, kwargs = receive() # Look up the function and dispatch func = _functions[funcname] result = func(*args, **kwargs) send(sender, result)

Slide 476

Slide 476 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- RPC Cautions • RPC is a request/reply pattern • It has all of the reliability concerns discussed in the previous section • Missing replies • Crashed servers • Duplicate requests/replies • But it has other problems as well 56 p = Process(target=somefunc)

Slide 477

Slide 477 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- RPC Cautions • A major philosophy of RPC is that a remote procedure call should look exactly identical to a normal function call (user doesn't know) • Except that it's a flawed concept • Too much RPC leads to horrible performance (far worse than a local procedure) • Potential for partial system failures that are very difficult to debug and untangle 57 p = Process(target=somefunc)

Slide 478

Slide 478 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- RPC Cautions • It's very hard to scale to large systems (hard to incorporate features such as caching, fan-in, fan-out, monitoring, filtering, etc.) • Hard to maintain over time (software versions, API changes, etc.) • Rather than reconsider the design, there's a tendency to just keep pounding harder (resulting in even more complexity) 58 p = Process(target=somefunc)

Slide 479

Slide 479 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.8 59

Slide 480

Slide 480 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Distributed Objects • Objects live on a server (where they stay put) • Clients remotely invoke instance methods 60 p = Process(target=somefunc) Server Client a.spam() Client c.bar() instances a b c spam() bar()

Slide 481

Slide 481 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Distributed Objects • In principle, supporting distributed objects is similar to remote procedure call (RPC) • But there is one really big difference • Distributed objects involves the manipulation of state (instances) stored on the server • With state comes extra complication (memory management, locking, persistence, etc.) 61 p = Process(target=somefunc)

Slide 482

Slide 482 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Server Instances • Objects are defined by a normal class 62 p = Process(target=somefunc) class Foo(object): def bar(self): ... def spam(self): ... • On the server, various instances are created a = Foo() b = Foo() c = Foo() • These are normal Python objects

Slide 483

Slide 483 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Providing Remote Access • We build upon everything so far • Distributed objects are similar to actors except that they have more methods than just send() • Method invocation is usually like RPC (methods return results) • To implement, you still need dispatchers, proxies, registry services, etc. 63 p = Process(target=somefunc)

Slide 484

Slide 484 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Server Dispatching • For remote access, a dispatcher is needed 64 p = Process(target=somefunc) Server instances a b c spam() bar() Dispatcher Client Requests • Exactly the same idea as with actors, RPC

Slide 485

Slide 485 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Request Messages • Incoming requests must identify both the instance and a method to execute 65 p = Process(target=somefunc) Server instances a b c spam() bar() Dispatcher Client Requests ("a","spam",...) "a" : a "b" : b "c" : c instance registry instance names and methods are embedded

Slide 486

Slide 486 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Client Interfaces • Clients generally want to use the same programming interface as the class 66 p = Process(target=somefunc) class Foo(object): def bar(self): ... def spam(self): ... a b Server a.bar() b.spam() Client • Ideally, client code shouldn't even be aware of the server (looks like a normal instance) bar() spam()

Slide 487

Slide 487 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Client Proxies • To emulate the API, proxy classes are needed 67 p = Process(target=somefunc) class FooProxy(object): def __init__(self,name,serveraddr): self.name = name self.conn = connect_to(serveraddr) def bar(self,*args): # send "bar" request to server # return result ... def spam(self,*args): # send "spam" request to server # return result ... • The proxy has the same programming API as the original object (same methods) • Proxy methods issue RPC requests to server

Slide 488

Slide 488 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Problem : Synchronization • In a distributed environment, many clients may be connected simultaneously • There might be server threads • May be concurrent access to the objects • Thus, you may need locking • All is lost (back to manipulating shared state) 68 p = Process(target=somefunc)

Slide 489

Slide 489 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Problem : Instance Creation • What happens if new instances get created on the server in response to requests? • How are they referenced by clients? • Who is responsible for managing them? • How long do they live? • Do they persist? (In a database) • Countless things can go wrong... 69 p = Process(target=somefunc)

Slide 490

Slide 490 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Problem : Reliability • What happens if the server crashes? (objects disappear and clients crash?) • Can software on the server be fixed/updated? • Can class definitions be modified? • API changes? 70 p = Process(target=somefunc)

Slide 491

Slide 491 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Comments • Using distributed objects is really bad idea for most projects • Massive amounts of added complexity, library dependencies, programming sophistication • Example: I once had a consulting gig where I was supposed to analyze a one million line distributed C++ application. 95% of the code was related to distributed objects (and it sucked) 71 p = Process(target=somefunc)

Slide 492

Slide 492 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Objects and RPC • Distributed objects have all of the same issues and limitations as RPC • But, with even more complexity! 72 p = Process(target=somefunc)

Slide 493

Slide 493 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- A Flawed Concept? • Distributed object systems often follow the "objects all the way down" philosophy 73 p = Process(target=somefunc) Machine A Machine A Machine B Machine C == same • If objects are perfectly encapsulated, they can live anywhere (magically, out the cloud somewhere) • Fine except that experience says it doesn't work

Slide 494

Slide 494 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- The Issue • Too much high-level abstraction? • Poor performance : Objects may interact in a suboptimal manner (excessive communication) • Partial failure : Part of the system dies, leaving the rest of it running, but not fully operational • Debugging and diagnostics? 74 p = Process(target=somefunc)

Slide 495

Slide 495 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Some Resources • pyro (Python Remote Objects). A python- centric distributed object framework. Assumes that you're only working in Python. Simplifies many tasks that are harder in other systems. • CORBA. Distributed object framework designed for multiple languages. Look at: OmniORB, fnorb. Note: as far as I can tell CORBA is not hugely popular in the Python world (excessive complexity?) 75 p = Process(target=somefunc)

Slide 496

Slide 496 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.9 76

Slide 497

Slide 497 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Interoperability • You may want parts of your distributed system to interoperate with other components • Possibly written in other languages • Possibly located elsewhere • Possibly implemented by someone else 77 p = Process(target=somefunc)

Slide 498

Slide 498 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Interoperability Tips • To connect to foreign systems, you really want to focus on well-documented standards • Use common data encodings (XML, JSON, etc.) • Use common protocols (HTTP, XML-RPC, etc.) 78 p = Process(target=somefunc)

Slide 499

Slide 499 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- XML-RPC • Remote Procedure Call • Uses HTTP as a transport protocol • Parameters/Results encoded in XML • Supported by languages other than Python 79

Slide 500

Slide 500 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Simple XML-RPC • How to create a stand-alone server 80 from xmlrpc.server import SimpleXMLRPCServer def add(x,y): return x+y s = SimpleXMLRPCServer(("",8080)) s.register_function(add) s.serve_forever() • How to test it (xmlrpclib) >>> from xmlrpc.client import ServerProxy >>> s = ServerProxy("http://localhost:8080") >>> s.add(3,5) 8 >>> s.add("Hello","World") "HelloWorld" >>>

Slide 501

Slide 501 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Simple XML-RPC • Adding multiple functions 81 from xmlrpc.server import SimpleXMLRPCServer s = SimpleXMLRPCServer(("",8080)) s.register_function(add) s.register_function(foo) s.register_function(bar) s.serve_forever() • It's fairly straightforward...

Slide 502

Slide 502 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- XML-RPC Undercover • Here's what gets sent for a request: 82 POST /RPC2 HTTP/1.0 Content-Type: text/xml Content-Length: 187 add 3 5 s = ServerProxy("...") s.add(3,5)

Slide 503

Slide 503 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- XML-RPC Commentary • XML-RPC is extremely easy to use • Almost too easy to be honest • I have encountered a lot of major projects that are using XML-RPC for distributed control • Users seem to love it • I'm not so sure although I do love the quick and dirty hack aspect of it 83

Slide 504

Slide 504 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Other RPC Libraries • Some RPC libraries of interest. • Thrift. A cross-language RPC framework developed by Facebook and released as open- source. • Protocol Buffers. A cross-language RPC framework developed by Google. Also open- source. • Both use much more efficient data serialization than XML-RPC (and have other features) 84

Slide 505

Slide 505 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- RESTful Services • REST (Representation State Transfer) • It's a data-centric software architecture where servers host data (resources) and implement methods for remotely interacting with the data • Strongly tied to HTTP, but think about structured data instead of hacky HTML pages. 85 p = Process(target=somefunc)

Slide 506

Slide 506 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- REST Resources • Core component of REST is a "resource" • A resource usually represents data • Resources have an associated identifier (URI) 86 p = Process(target=somefunc) http://somehost.com/someresource • The URI alone contains everything needed to locate and identify the resource (protocol, hostname, path, etc.)

Slide 507

Slide 507 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Resource Representation • Data associated with a resource is typically represented using a standard data encoding 87 p = Process(target=somefunc) • Common formats are used (XML, JSON, etc.) • May be multiple representations resource client representation

Slide 508

Slide 508 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- REST Actions • Clients interact with servers and resources using a preset vocabulary of actions (verbs) 88 p = Process(target=somefunc) • These are usually just HTTP methods • PUT and DELETE are related to creating/updating a resource (not common with browsers) GET resource PUT resource DELETE resource POST resource HEAD resource

Slide 509

Slide 509 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- REST Examples • Retrieving a resource (GET) 89 p = Process(target=somefunc) resource client GET /some/resource HTTP/1.1 200 OK Content-type: application/xml ... ... HTTP Server

Slide 510

Slide 510 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- REST Examples • Updating a resource (PUT) 90 p = Process(target=somefunc) resource client PUT /some/resource Content-type: application/xml Content-length: 45123 ... HTTP/1.1 200 OK ... HTTP Server • Typically, this creates a new resource if it doesn't already exist on the server

Slide 511

Slide 511 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Stateless Implementation • REST services are stateless • Server does not record client state • GET, PUT, etc. are the only operations • May occur in any order and at any time • It's a critical feature of the architecture • May have multiple servers (heavy load) • Fault handling (if a server crashes, etc.) 91 p = Process(target=somefunc)

Slide 512

Slide 512 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Reuse of HTTP • REST web services build upon HTTP • Authentication/security • Caching • Proxies • Integrates well with existing software • HTTP servers • Middleware libraries • Almost anything that speaks HTTP 92 p = Process(target=somefunc)

Slide 513

Slide 513 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Implementing REST • You typically build a REST service using the same techniques for other web programming • CGI scripting • WSGI • Web frameworks (Django, Zope, etc.) • Stand-alone HTTP server • My preference: WSGI + WebOb 93 p = Process(target=somefunc)

Slide 514

Slide 514 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- REST Links • Too many packages to list (all Python 2) • restlib • restkit • restish • Many others on PyPI • Note: Don't confused with packages related to reStructured Text (reST) 94 p = Process(target=somefunc)

Slide 515

Slide 515 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 6- Exercise dist.10 95

Slide 516

Slide 516 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Generators and Coroutines 1 Section 7

Slide 517

Slide 517 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Introduction • In this section we look at generators and coroutines as a concurrency tool • These features of Python are not as well understood as other language elements • However, can be used as an alternative implementation tool for various aspects of distributed computing and I/O handling 2

Slide 518

Slide 518 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Reference Material • I've given some related PyCon tutorials • "Generator Tricks for Systems Programmers" at PyCON'08 3 http://www.dabeaz.com/generators • "A Curious Course on Coroutines and Concurrency" at PyCON'09 http://www.dabeaz.com/coroutines • This is a highly condensed version

Slide 519

Slide 519 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Background Material 4

Slide 520

Slide 520 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Generators • A generator is a function that produces a sequence of results instead of a single value 5 def countdown(n): while n > 0: yield n n -= 1 >>> for i in countdown(5): ... print(i,end=' ') ... 5 4 3 2 1 >>> • Instead of returning a value, you generate a series of values (using the yield statement) • Typically, you hook it up to a for-loop

Slide 521

Slide 521 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Generators 6 • Behavior is quite different than normal func • Calling a generator function creates an generator object. However, it does not start running the function. def countdown(n): print("Counting down from", n) while n > 0: yield n n -= 1 >>> x = countdown(10) >>> x >>> Notice that no output was produced

Slide 522

Slide 522 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Generator Functions • The function only executes on __next__() >>> x = countdown(10) >>> x >>> x.__next__() Counting down from 10 10 >>> • yield produces a value, but suspends the function • Function resumes on next call to __next__() >>> x.__next__() 9 >>> x.__next__() 8 >>> Function starts executing here 7

Slide 523

Slide 523 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Generator Functions • When the generator returns, iteration stops >>> x.__next__() 1 >>> x.__next__() Traceback (most recent call last): File "", line 1, in ? StopIteration >>> 8

Slide 524

Slide 524 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- A Practical Example • A Python version of Unix 'tail -f' 9 import time def follow(thefile): thefile.seek(0,2) # Go to the end of the file while True: line = thefile.readline() if not line: time.sleep(0.1) # Sleep briefly continue yield line • Example use : Watch a web-server log file logfile = open("access-log") for line in follow(logfile): print(line)

Slide 525

Slide 525 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Generators as Pipelines • One of the most powerful applications of generators is setting up processing pipelines • Similar to shell pipes in Unix 10 generator input sequence for x in s: generator generator • Idea: You can stack a series of generator functions together into a pipe and pull items through it with a for-loop

Slide 526

Slide 526 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- A Pipeline Example • Print all server log entries containing 'python' 11 def grep(pattern,lines): for line in lines: if pattern in line: yield line # Set up a processing pipe : tail -f | grep python logfile = open("access-log") loglines = follow(logfile) pylines = grep("python",loglines) # Pull results out of the processing pipeline for line in pylines: print(line) • This is just a small taste of what's possible

Slide 527

Slide 527 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Exercise gen.1 12

Slide 528

Slide 528 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Yield as an Expression • In Python 2.5, a slight modification to the yield statement was introduced (PEP-342) • You could now use yield as an expression • For example, on the right side of an assignment 13 def grep(pattern): print("Looking for", pattern) while True: line = yield if pattern in line: print(line) • Question : What is its value?

Slide 529

Slide 529 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Coroutines • If you use yield like this, you get a "coroutine" • These do more than just generate values • Instead, functions can consume values sent to it. 14 >>> g = grep("python") >>> next(g) # Prime it (explained shortly) Looking for python >>> g.send("Yeah, but no, but yeah, but no") >>> g.send("A series of tubes") >>> g.send("python generators rock!") python generators rock! >>> • Sent values are returned by (yield)

Slide 530

Slide 530 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Coroutine Execution • Execution is the same as for a generator • When you call a coroutine, nothing happens • They only run in response to next() and send() methods 15 >>> g = grep("python") >>> next(g) Looking for python >>> Notice that no output was produced On first operation, coroutine starts running

Slide 531

Slide 531 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Coroutine Priming • All coroutines must be "primed" by first calling .next() (or send(None)) • This advances execution to the location of the first yield expression. 16 .next() advances the coroutine to the first yield expression def grep(pattern): print("Looking for", pattern) while True: line = yield if pattern in line: print(line) • At this point, it's ready to receive a value

Slide 532

Slide 532 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Using a Decorator • Remembering to call .next() is easy to forget • Solved by wrapping coroutines with a decorator 17 def coroutine(func): def start(*args,**kwargs): cr = func(*args,**kwargs) next(cr) return cr return start @coroutine def grep(pattern): ...

Slide 533

Slide 533 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Processing Pipelines 18 • Coroutines can also be used to set up pipes coroutine coroutine coroutine send() send() send() • You just chain coroutines together and push data through the pipe with send() operations • Notice the striking similarity to actors

Slide 534

Slide 534 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- An Example 19 • A source that mimics Unix 'tail -f' import time def follow(thefile, target): thefile.seek(0,2) # Go to the end of the file while True: line = thefile.readline() if not line: time.sleep(0.1) # Sleep briefly continue target.send(line) • A sink that just prints the lines @coroutine def printer(): while True: line = yield print(line)

Slide 535

Slide 535 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- An Example 20 • A grep filter coroutine @coroutine def grep(pattern,target): while True: line = yield # Receive a line if pattern in line: target.send(line) # Send to next stage • Hooking it up f = open("access-log") follow(f, grep('python', printer())) follow() grep() printer() send() send() • A picture

Slide 536

Slide 536 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Exercise gen.2 21

Slide 537

Slide 537 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Generators as Tasks 22 • Generators and coroutines can clearly be used to set up problems in pipelining, dataflow, actors, etc. • However, generators can also serve the role of tasks as an alternative to threads or processes • It's subtle, but let's look at the big idea

Slide 538

Slide 538 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Program Execution 23 • When programs run, they alternate between CPU processing and I/O • For I/O, a program requests the services of the operating system (system calls) • I/O may cause the program to suspend run run run run I/O I/O I/O System calls in the operating system

Slide 539

Slide 539 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Task Switching 24 • Underneath the covers the operating system task-switches on I/O run I/O run I/O run I/O run I/O I/O run Task A: Task B: task switch • Since I/O operations might take awhile, the system does other work while waiting

Slide 540

Slide 540 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- An Insight 25 • The yield statement can be used to implement user-defined "task" switching • When a generator function hits a "yield" statement, it immediately suspends execution • If you are very clever, you can get your program to task switch between a collection of generator functions

Slide 541

Slide 541 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Multitasking Example • First, you set up a collection of "tasks" 26 p = Process(target=somefunc) def countdown_task(n): while n > 0: print(n) yield n -= 1 # A queue of tasks to run from collections import deque tasks = deque([ countdown_task(5), countdown_task(10), countdown_task(15) ]) • Each task is a generator function that yields

Slide 542

Slide 542 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Scheduling Example • Now, write a simple task scheduler 27 p = Process(target=somefunc) def scheduler(tasks): while tasks: task = tasks.popleft() try: next(task) # Run to the next yield tasks.append(task) # Reschedule except StopIteration: pass # Run it scheduler(tasks) • This loop will just run all of the generators (cycling between them) until there's nothing left to work on

Slide 543

Slide 543 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Scheduling Example • Output 28 p = Process(target=somefunc) 5 10 15 4 9 14 3 8 13 ... • You'll see the different tasks cycling • Okay, that's kind of interesting...

Slide 544

Slide 544 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Exercise gen.3 29 p = Process(target=somefunc)

Slide 545

Slide 545 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Yielding For I/O 30 • If you are a littler clever, you can have yield integrate with "blocking" I/O requests • The big idea : set up some kind of operation and then yield to have it carried out in the background by the generator scheduler

Slide 546

Slide 546 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- More About Yield 31 • In last section, we used yield to yield control, not a value • Although generators are used for iteration, we're talking about something completely different here • When yielding, control goes back to the scheduler which is free to choose what task to run next

Slide 547

Slide 547 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Talking to the Scheduler 32 • Tasks can send values back to the scheduler by yielding an "interesting" value • Consider the following classes class IOWait: def __init__(self,f): self.fileno = f.fileno() class ReadWait(IOWait): pass class WriteWait(IOWait): pass • These classes represent the concept of "waiting" for a specific kind of I/O event on a given file object

Slide 548

Slide 548 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- An Example Task 33 • Now, consider this generator function # Echo data received on s back to the sender def echo_data(s): while True: yield ReadWait(s) # Wait for data msg = s.recv(16384) # Read data yield WriteWait(s) # Wait for writing s.send(msg) • This generator yields instances of the classes just defined back to the scheduler • Now, let's go back to the scheduler code...

Slide 549

Slide 549 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Signaling an I/O Request • Here is the scheduler 34 p = Process(target=somefunc) def scheduler(tasks): while tasks: task = tasks.popleft() try: next(task) tasks.append(task) except StopIteration: pass # Run it scheduler(tasks) • When the task yields, an instance of ReadWait or WriteWait is going to be returned by next def echo_data(s): ... yield ReadWait(s) ... Task

Slide 550

Slide 550 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Signaling an I/O Request • A modified scheduler 35 p = Process(target=somefunc) def scheduler(tasks): while tasks: task = tasks.popleft() try: r = next(task) if isinstance(r,ReadWait): handle_read_wait(r,task) elif isinstance(r,WriteWait): handle_write_wait(r,task) else: tasks.append(task) except StopIteration: pass # Run it scheduler(tasks) Looking for different I/O wait requests and taking action

Slide 551

Slide 551 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Implementing I/O Waits 36 • We haven't built the I/O yet, but it's easy • To implement I/O waiting, you need two pieces • A holding area for tasks that are waiting for an I/O operation • An I/O poller that looks for I/O activity and removes tasks from the holding area when I/O is possible • Let's look at an example

Slide 552

Slide 552 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • A scheduler class 37 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w))

Slide 553

Slide 553 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • A scheduler class 38 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w)) Total number of tasks being managed Queue of tasks that can run

Slide 554

Slide 554 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • A scheduler class 39 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w)) Dictionaries that serve as I/O holding areas { 3 : , 7 : , 6 : , } file descriptor task

Slide 555

Slide 555 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • A scheduler class 40 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w)) An I/O polling function. This looks for any I/O activity on suspended tasks. If there is I/O, move the task back to the ready queue

Slide 556

Slide 556 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • Add some scheduling methods (convenience) 41 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} ... def new(self,task): self.ready.append(task) self.numtasks += 1 def readwait(self,fileno,task): self.read_waiting[fileno] = task def writewait(self,fileno,task): self.write_waiting[fileno] = task

Slide 557

Slide 557 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • Implement the main scheduler loop 42 p = Process(target=somefunc) class Scheduler: ... def run(self): while self.numtasks: try: task = self.ready.popleft() try: r = next(task) if isinstance(r,ReadWait): self.readwait(r.fileno,task) elif isinstance(r,WriteWait): self.writewait(r.fileno,task) else: self.ready.append(task) except StopIteration: self.numtasks -= 1 except IndexError: self.iopoll()

Slide 558

Slide 558 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • Implement the main scheduler loop 43 p = Process(target=somefunc) class Scheduler: ... def run(self): while self.numtasks: try: task = self.ready.popleft() try: r = next(task) if isinstance(r,ReadWait): self.readwait(r.fileno,task) elif isinstance(r,WriteWait): self.writewait(r.fileno,task) else: self.ready.append(task) except StopIteration: self.numtasks -= 1 except IndexError: self.iopoll() Run a task until it yields, check the return value

Slide 559

Slide 559 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Building a Scheduler • Implement the main scheduler loop 44 p = Process(target=somefunc) class Scheduler: ... def run(self): while self.numtasks: try: task = self.ready.popleft() try: r = next(task) if isinstance(r,ReadWait): self.readwait(r.fileno,task) elif isinstance(r,WriteWait): self.writewait(r.fileno,task) else: self.ready.append(task) except StopIteration: self.numtasks -= 1 except IndexError: self.iopoll() Poll for I/O (only runs if no other work to do)

Slide 560

Slide 560 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Example : Time Server 45 p = Process(target=somefunc) from socket import socket, AF_INET, SOCK_DGRAM import time def timeserver(addr): s = socket(AF_INET, SOCK_DGRAM) s.bind(addr) while True: yield ReadWait(s) msg,addr = s.recvfrom(8192) yield WriteWait(s) s.sendto((time.ctime()+"\n").encode('ascii'), addr) sched = Scheduler() sched.new(timeserver(('',15000)) # Create three server sched.new(timeserver(('',16000)) # instances and add sched.new(timeserver(('',17000)) # to the scheduler sched.run()

Slide 561

Slide 561 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Example : Echo Server 46 p = Process(target=somefunc) class EchoServer: def __init__(self,addr,sched): self.sched = sched sched.new(self.server_loop(addr)) def server_loop(self,addr): s = socket(AF_INET,SOCK_STREAM) s.bind(addr) s.listen(5) while True: yield ReadWait(s) c,a = s.accept() print("Got connection from", a) self.sched.new(self.client_handler(c)) def client_handler(self,client): while True: yield ReadWait(client) msg = client.recv(8192) if not msg: break yield WriteWait(client) client.send(msg) client.close() print("Client closed")

Slide 562

Slide 562 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Example : Echo Server 47 p = Process(target=somefunc) sched = Scheduler() echo = EchoServer(('',15000),sched) sched.run() • Running the echo server • Test it out with telnet • Will find that it works fine with multiple clients

Slide 563

Slide 563 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Exercise gen.4 48 p = Process(target=somefunc)

Slide 564

Slide 564 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Comments 49 • Multitasking with generators has some interesting aspects to it • First it's based on I/O polling--just like event- driven I/O systems • However, the execution model takes a completely different direction • Instead of triggering callbacks, I/O events merely cause suspended generators to resume

Slide 565

Slide 565 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Comments 50 p = Process(target=somefunc) • Generators have normal looking control flow class EchoServer: ... def client_handler(self,client): while True: yield ReadWait(client) msg = client.recv(8192) if not msg: break yield WriteWait(client) client.send(msg) client.close() print("Client closed") • Notice how it closely mimics what you would write with threads

Slide 566

Slide 566 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Problems 51 p = Process(target=somefunc) • The yield statement can only be used in the top- level function, not subroutines • This makes it really difficult to write subroutine libraries based on generators • It's not impossible, but you have to play some clever tricks (e.g., generator "trampolining") • Being addressed in PEP-380

Slide 567

Slide 567 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Problems 52 p = Process(target=somefunc) • Generators also have the same sorts of problems as with event-driven systems • Scalability of I/O polling • Long running calculations • How to handle blocking operations • Solutions are similar. For example, to handle a blocking operation, you might run it in a separate thread until it completes

Slide 568

Slide 568 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- Some Links 53 • Some related projects (not an exhaustive list) • gevent • Cogen • Greenlet • Eventlet • Stackless • Do a search on http://pypi.python.org

Slide 569

Slide 569 text

Copyright (C) 2011, David Beazley, http://www.dabeaz.com 7- More Information 54 • Look at my PyCon'09 presentation for a more in- depth study of coroutines and concurrency http://www.dabeaz.com/coroutines • I've intentionally not included all of that here mainly because I don't want to just duplicate my PyCON presentation (which is freely available online) and we're probably short on time