going to be a lot of moving parts • Make sure you can cut/paste code • Make sure you look at solution code • Make sure you copy solution code as needed 10
a set of support files and exercises 2 http://www.dabeaz.com/python/concurrent2011/ • We are using Python 3.2 (concurrent.zip) • Optional installs • numpy • ZeroMQ and pyzmq • Redis
that's on a lot of programmer's minds • Multicore CPUs, clusters, distributed computing, cloud computing, etc. • Increased interest in concurrent and functional programming languages (e.g., Erlang, Scala, Clojure, Haskell, etc.) 3
Concurrent programming is cool (and fun) • In fact, it's the whole reason I went into CS • As a book author, I'm interested in exploring different ways to understand and present advanced programming concepts • Eventually, this workshop will become a book, but that's for later (I want your feedback) 4
that work on more than one thing at a time--possibly spread out over a whole cluster of machines • Example : A network server that communicates with several hundred clients all connected at once • Example : A big number crunching job that spreads its work across hundreds of CPUs 6
a single machine, concurrency typically implies "multitasking" run run run run run Task A: Task B: task switch • If a single CPU, the operating system rapidly switches back and forth
You may have parallelism (many CPUs) • Here, you often get simultaneous task execution run run run run run Task A: Task B: run CPU 1 CPU 2 • Note: If the total number of tasks exceeds the number of CPUs, then each CPU also multitasks
Tasks may run in the same memory space run run run run run Task A: Task B: run CPU 1 CPU 2 object write read • Simultaneous access to objects • Often a source of unspeakable peril Process
might run in separate processes run run run run run Task A: Task B: run CPU 1 CPU 2 • Processes coordinate using IPC • Pipes, FIFOs, memory mapped regions, etc. Process Process IPC
Tasks may be running on distributed systems run run run run run Task A: Task B: run messages • For example, a cluster of workstations • Or servers out in the "cloud."
is interpreted 13 • Frankly, it doesn't seem like a natural match for this kind of programming • Isn't this a serious business left to more "serious" programming languages? "What the hardware giveth, the software taketh away."
using Python even appropriate? • Traditionally, distributed computing and parallelism has only been available to those programmers working at elite institutions with deep pockets. (and they don't tend to use "hobby" languages) • Times change. Cheap machines have multiple CPU cores. Anyone can purchase CPU time out in the "cloud" (even my mom). 14
All? • It's very high level • And it comes with a large library • Useful data types (dictionaries, lists,etc.) • Network protocols • Text parsing (regexs, XML, HTML, etc.) • Files and the file system • Databases • Python programmers think it's awesome 15
• Python is often used as a high-level framework • The various components might be a mix of languages (Python, C, C++, etc.) • Concurrency may be a core part of the framework's overall architecture • Python has to deal with it even if a lot of the underlying processing is going on in C 16
are often able to get complex systems to "work" in much less time using a high-level language like Python than if they're spending all of their time hacking C code. 17 "The best performance improvement is the transition from the nonworking to the working state." - John Ousterhout "You can always optimize it later." - Unknown "Premature optimization is the root of all evil." - Donald Knuth
Many programs are "I/O bound" • They spend virtually all of their time sitting around waiting • Python can "wait" just as fast as C (maybe even faster--although I haven't measured it). • If there's not much processing, who cares if it's being done in an interpreter? (One exception : if you need an extremely rapid response time as in real-time systems) 18
course, there are special cases where you probably would not want to use Python • Example : • Flight avionics • Nuclear power plants • Sharks with laser beams • Fine, we won't be talking about that. There are still many other applications where Python makes a lot of sense. 20
architecture. We're going to spend a lot of time studying different concurrency approaches and designs • Tradeoffs. The good and bad of different design choices. • Doing it yourself. We're going to build all sorts of cool things from scratch. I think it will be fun and inspiring. 23
Just plugging stuff into someone's already built programming framework • Frameworks are fine, but this workshop is about core concepts • It's also not promoting any specific set of tools (or even Python itself all that much) • We will look at some frameworks as we go 24
of the course exercises are advanced • Try to do things yourself if you can • Know that solution code is always given • You should absolutely be looking at the solution code throughout this workshop 25
We're going to write a lot of code, but that code should be viewed as a kind of "sketch" • An exploration of different ideas • It's not "production ready" • You would need to add a lot (testing, corner cases, error checking, etc.) 26
to thread programming • How to reliably use threads • Some programming idioms and common practices related to working with threads • Details on Python interpreter execution 2
Develop common programming patterns and designs related to all forms of concurrent and distributed computing (not just threads) • Investigate a variety of issues generally related to concurrently executing tasks (debugging, control, synchronization, messaging, etc.) 3
• Understanding issues related to thread programming is essential for any kind of work with concurrent programming • First, threads are one of the oldest and most widely used approaches (so, if you ignore them, you're missing the big picture) • Problems associated with threads tend to underly other concurrency techniques (thus threads are the poster-child for every horrible thing that can go wrong) 4
• Threads are an important part of almost every framework or library related to network programming and distributed computing • Even in libraries where users don't see threads • Sometimes threads are deeply buried inside libraries to deal with very specific tasks related to I/O, waiting, and synchronization • Examples : multiprocessing, twisted, etc. 5
What many programmers think of when they hear about "concurrent programming" • An independent task running inside a program • Shares resources with the main program (memory, files, network connections, etc.) • Has its own independent flow of execution (stack, current instruction, etc.) 6
% python program.py statement statement ... create thread(foo) def foo(): statement statement ... statement statement ... return or exit statement statement ... Key idea: Thread is like a little "task" that independently runs inside your program thread
How to launch a callable in a separate thread import threading import time def countdown(count): while count > 0: print("Counting down", count) count -= 1 time.sleep(5) t = threading.Thread(target=countdown,args=(10,)) t.start() • Thread() creates a thread object • start() method makes it run 12
Alternatively: A class that inherits from Thread class CountdownThread(threading.Thread): def __init__(self,count): threading.Thread.__init__(self) self.count = count def run(self): while self.count > 0: print("Counting down", self.count) self.count -= 1 time.sleep(5) thr = CountdownThread(10) thr.start() • Comment: I don't like this approach because it entangles your code with the implementation of the Thread class (prefer decoupling) 13
A thread runs until the specified callable exits • Unlike a function call, there is no return value (if a value is returned, it is ignored) • If a thread dies with an exception, it does not stop your whole program--just that one thread terminates (although you will see a traceback) • Emphasis: Threads execute independently from the code that launched them 14
Once started, threads only have a few operations t.is_alive() # Check if thread t is still running t.join([timeout]) # Wait for thread t to exit t.name # Access the thread name • You can't suspend threads • You can't signal threads • You can't kill threads • More later... 15 "It's Alive!!!!!!!!!"
With threads, the Python interpreter stays alive until all threads exit • This is even true if the "main thread" exits • Common confusion: The main program exits, but the interpreter doesn't quit because other threads are still executing 16
If a thread runs forever, make it "daemonic" t = threading.Thread(target=func) t.daemon = True t.start() • It is standard practice to use this for all non- terminating background tasks • I tend to use it for all threads 17 • Daemonic threads get killed on interpreter exit
There are two primary uses of threads. • Waiting. Program has to wait for I/O, an event, or for some other reason, but has other work that still needs to be carried out elsewhere. • Subdivision of work. Take a big computational problem and subdivide into threads so that you can use multiple CPUs or cores (parallelism) • In Python, threads are almost exclusively used for waiting (especially I/O). 20
21 • An example: Reading on a socket data = s.recv(1024) • This operation blocks until data is available • "blocks" - The program stops and waits • If no concurrency, then everything stops • Obviously, this can be undesirable (e.g., what if there's a GUI or if it's a game?)
• A solution: handle each client in a thread s1 s2 s3 s1.recv() s2.recv() s3.recv() server thread-1 thread-2 thread-3 • recv() still blocks, but it only affects one thread • Other threads can still run (life is good)
24 def handle_client(client_sock,client_addr): with client_sock: ... do stuff with the client ... def run_server(serv_sock): while True: # Wait for a new client connection c,a = serv_sock.accept() # Spawn a thread to handle it cthr = threading.Thread(target=handle_client, args=(c,a)) cthr.daemon = True cthr.start() • A simple threaded server template • Idea: Launch a new thread on each connection
mentioned, creating threads is really easy • You can create thousands of them if you want • Developing with threads is hard • Really hard 26 Q: Why did the multithreaded chicken cross the road? A: to To other side. get the -- Jason Whittington
• Although threads have independent control flow, threads share the same memory • So, all threads see global variables • Multiple threads can hold references to the same object and access it independently • Threads can also share files, sockets, etc. 27
Example 28 items = [] # A global variable def foo(): ... items.append(x) ... def bar(): ... y = items.pop() ... t1 = Thread(target=foo); t1.start() t2 = Thread(target=bar); t2.start() These operations are both manipulating the global variable "items" • There is danger here: More shortly...
execution is non-deterministic • Operations that take several steps might be interrupted mid-stream (non-atomic) • Thus, access to shared data structures is also non-deterministic (which is a really good way to have your head explode) 30
Events. An action must be performed in response to an event in a different thread • Concurrent updates. Multiple threads that update a shared value. 31
Consider a shared value x = 0 • One thread sets the value, another reads it Thread-1 -------- ... x = 42 ... Thread-2 -------- ... print(x) ... • Problem : Which thread runs first? • Answer : It could be either one... 32
Consider a shared value x = 0 • And two threads that modify it Thread-1 -------- ... x = x + 1 ... Thread-2 -------- ... x = x - 1 ... • Here, it's possible that the resulting value will be corrupted due to thread scheduling 33
Low level interpreter code Thread-1 -------- LOAD_GLOBAL 1 (x) LOAD_CONST 2 (1) BINARY_ADD STORE_GLOBAL 1 (x) Thread-2 -------- LOAD_GLOBAL 1 (x) LOAD_CONST 2 (1) BINARY_SUB STORE_GLOBAL 1 (x) thread switch 35 thread switch These operations get performed with a "stale" value of x. The computation in Thread-2 is lost.
you assume any operations are atomic? 36 alist.append(x) item = alist.pop() adict[key] = value del adict[key] ... • In general, you can't assume anything • Many implementations (jython, pypy, etc.) • Might be a user-defined class • "Atomic" meaning noninterruptible
If program behavior depends on thread scheduling, you have a "race condition." • It's often rather diabolical--a program may produce slightly different results each time it runs (even though you aren't using any random numbers) • Or it may just have a mysterious gremlin that shows up every couple of weeks 37
Identifying and fixing a race condition will make you a better programmer (e.g., it "builds character") • However, you'll probably never get that month of your life back... • To fix : You have to synchronize threads (e.g., coordinate their execution so that things happen in the right order) 38
Objects e = threading.Event() e.is_set() # Return True if event set e.set() # Set event e.clear() # Clear event e.wait([timeout]) # Wait for event • Used to make one thread wait for an event to occur in another thread 39
Using an event to synchronize execution order x = 0 x_event = threading.Event() 40 Thread-1 -------- ... x = 42 x_event.set() ... Thread-2 -------- ... x_event.wait() print(x) ... signals • Caution : Events only have one-time use • Can use to make sure threads do things in a specific order
Mutual Exclusion Lock m = threading.Lock() • Used to synchronize threads so that only one thread can make modifications to shared data at any given time • Think transactions 41
Using a lock with m: # Acquires the lock statements statements # Releases the lock statements • Key feature: Only one thread can execute inside the 'with' statement at once • If lock is already in use, a thread waits 42
Locks • Commonly used to enclose "critical sections" x = 0 x_lock = threading.Lock() 43 Thread-1 -------- ... with x_lock: x = x + 1 ... Thread-2 -------- ... with x_lock: x = x - 1 ... Critical Section • Only one thread can execute in critical section at a time (lock gives exclusive access)
Lock • It is your responsibility to identify and lock all "critical sections" 44 x = 0 x_lock = threading.Lock() Thread-1 -------- ... with x_lock: x = x + 1 ... Thread-2 -------- ... x = x - 1 ... If you use a lock in one place, but not another, then you're missing the whole point. All modifications to shared state must be enclosed by the with statement.
• If a thread already acquired a lock, don't have it acquire it again---everything will freeze 45 lock = threading.Lock() def foo(): with lock: # Freezes here (lock in use) statements def bar(): with lock: foo() bar() • Sometimes occurs if you try to reuse the same lock for too many things in your program (e.g., if you just had one global lock for everything)
• Never write code that acquires more than one mutex lock at a time 46 x = 0 y = 0 x_lock = threading.Lock() y_lock = threading.Lock() with x_lock: statements using x ... with y_lock: statements using x and y ... • This almost invariably ends up creating a program that mysteriously deadlocks (see dining philosophers)
Alternate interface for locks 47 x = 0 x_lock = threading.Lock() x_lock.acquire() statements using x ... x_lock.release() • Very tricky to use correctly due to issues with exception handling • Better to use the 'with' statement
Mutex Lock m = threading.RLock() # Create a lock m.acquire() # Acquire the lock m.release() # Release the lock • Similar to a normal lock except that it can be reacquired multiple times by the same thread • However, each acquire() must have a release() • Common use : Code-based locking (where you're locking function/method execution as opposed to data access) 50
Only allow one thread to execute methods in a class at a given time class Foo: _lock = threading.RLock() def bar(self): with Foo._lock: ... def spam(self): with Foo._lock: ... self.bar() ... 51 • Observe : Once any method is called, all of the methods are locked until the method returns • Nested calls and recursion are okay
counter-based synchronization primitive m = threading.Semaphore(n) # Create a semaphore m.acquire() # Acquire m.release() # Release • acquire() - Waits if the count is 0, otherwise decrements the count and continues • release() - Increments the count and signals waiting threads (if any) • Unlike locks, acquire()/release() can be called in any order and by any thread 52
Limiting concurrency. You can limit the number of threads performing certain operations. For example, performing database queries, making network connections, etc. • Signaling. Semaphores can be used to send "signals" between threads. For example, having one thread wake up another thread. 53
Using a semaphore to limit concurrency _fetch_limit = threading.Semaphore(5) # Max: 5-threads def fetch_page(url): with _fetch_limit: u = urllib.urlopen(url) return u.read() 54 • In this example, only 5 threads can be executing in the function at once (if more want to run, they have to wait)
Using a semaphore to signal done = threading.Semaphore(0) 55 ... statements statements statements done.release() done.acquire() statements statements statements ... Thread 1 Thread 2 • Here, acquire() and release() occur in different threads and in a different order • Sometimes used in queuing problems
Condition Objects cv = threading.Condition([lock]) cv.acquire() # Acquire the underlying lock cv.release() # Release the underlying lock cv.wait() # Wait for condition cv.notify() # Signal that a condition holds cv.notifyAll() # Signal all threads waiting 56 • A combination of locking/signaling • Lock is used to protect code that changes a shared data value • Signal is used to notify other threads that the data has changed state ("condition" changed)
Thread that prints value every time it changes x = 0 x_cond = Condition() 57 ... ... with x_cond: x = new value x_cond.notify() ... ... last_x = x while True: with x_cond: while (last_x == x): x_cond.wait() print("x=",x) last_x = x Thread 1 Thread 2 • Somewhat similar to an Event except that you can use it over and over again
• Many things all going at once • Weird control flow and tricky data handling • Ad-hoc coding hell and many bad examples • Weak debugging support • My advice : Focus on proper encapsulation 59
• When you launch a thread, you are creating a concurrently executing "task" • A "task" is simply a representation of work • Insight : Tasks are more general than threads • Threads are an implementation detail related to getting the task to run, but are unrelated to the actual work being carried out 60
Some desirable features of tasks • Start/stop/resume functionality • Crash recovery • Logging and diagnostics • Debugging support • To get this, you need planning and discipline 61
class CountdownTask: def __init__(self,count): self.count = count def run(self): while self.count > 0: print("Counting down", self.count) self.count -= 1 time.sleep(5) • Define your tasks as a class • Minimally, it has a run() method that performs the work associated with the task • Notice: No use of threads here
class CountdownTask: def __init__(self,count): self.count = count def bootstrap(self): self.run() def run(self): while self.count > 0: print("Counting down", self.count) self.count -= 1 time.sleep(5) • Always put an extra wrapper around run() • Think of it as "booting" the task • Purpose is to set up the runtime environment
class CountdownTask: ... def bootstrap(self): self.runnable = True self.run() def stop(self): self.runnable = False def run(self): while self.runnable and self.count > 0: ... • Give tasks some way to stop • Note: tasks must be programmed to check • Food for thought: Is stopping == killing?
65 class CountdownTask: ... def bootstrap(self): try: self.run() except Exception: self.exc_info = sys.exc_info() # Save it! # Log/report/handle the exception ... • Give tasks a catch-all exception handler • Buggy tasks will crash. You want to have a way to manage and debug it. • Advice : Always report and save the exception
class CountdownTask: def __init__(self,count): self.state = "INIT" ... def bootstrap(self): self.state = "RUNNING" try: self.run() except Exception: self.exc_info = sys.exc_info() self.state = "EXIT" • Give tasks a "state" attribute • Having this is very useful for debugging • Thought: Make it user-customizable
Set up the logging module for your program import logging logging.basicConfig( filename="debug.log", filemode="w", format="%(process)d:%(threadName)s" \ "%(levelname)s:%(message)s", level=logging.DEBUG) 68 • Examples of issuing log messages log = logging.getLogger("name") log.critical("A critical error occurred") log.error("File %s not found", filename) log.warning("This is your last warning") log.info("Just some information") log.debug("Debugging : n = %d", n)
class CountdownTask: ... def finalize(self): del self.log del self.exc_info ... self.state = "FINAL" • Define a finalization method • This method should clean-up all attributes related to the runtime environment (e.g., logging, exceptions, etc.) • Think of it as "__del__" for the runtime
• You have a task class with this interface: class Task: def __init__(self): ... def stop(self): ... def bootstrap(self): ... def run(self): ... def finalize(self): ... 72 • There are start/stop methods • There is error handling • There is logging for diagnostics
Our task class is a general representation of concurrently executing "work" • Think of it as a task environment • It is self-contained and decoupled • Task object is like a basic building block
Question : How do threads enter? • Think of threads as a very low-level implementation detail (like assembly language, low-level system calls, etc.). • Threads just enable concurrent execution 74
Tasks have two parts 76 task execution thread task = Task() task.start() task.stop() task.finalize() task creation and control launches def bootstrap(self): self.run() def run(self): ... do work ... • Yes, there are many moving parts • It looks complex, but you'll be thankful later
• Who is in charge here? • The threading library? • A third party library? • A framework? • Nobody? • You want to be in control of the environment • There is going to be complexity • Better to manage it than to react 78
go overboard abstracting away details • Keep it simple • Make sure you can test and debug it 79 "If you add the right abstraction layer, you can make the jump from an unknown number of problems to an unknowable number of problems." - Cameron Laird
Debugging concurrent tasks is tricky • Let's look at a few useful tips • Don't repeat yourself • Assigning thread/task names • How to use the main thread • Enabling post-mortem debugging support 81
Thread • Never use the main execution thread to perform any real work • If using threads, launch separate threads for all parts of your application • Why? If you do this, you can still use the interactive interpreter during execution • Incredibly useful for debugging (can examine tasks, program state, etc.) 83
• In production, have the main thread spin 84 def main(): import time while True: time.sleep(1) • It consumes virtually no CPU (sleeps) • Avoids annoyance of killing programs that use threads (Ctrl-C will work properly)
Here's a neat little trick involving pdb 85 class Task: ... def bootstrap(self): try: self.run() except Exception: self.exc_info = sys.exc_info() def pm(self): import pdb pdb.post_mortem(self.exc_info[2]) • If your task crashes with an exception, the pm() method launches the debugger on the saved traceback (can inspect internals)
• How did we get on this topic? • Oh yeah, threads share the same memory • It's great except that... • Shared memory sucks • Mutable data sucks • Locking sucks • Blasphemy! 87
Horror! • Writing reliable and maintainable programs based on shared memory is a "sisyphean task" • Shared state is not a feature • At first glance, it seems "convenient" • Actually a trap that leads to nothing but sorrow 88
Computer Science professors love locks, synchronization, and shared state! • Gives them something to talk about for 3-4 weeks while teaching operating systems (e.g., dining philosophers) • Easily applied to making students cry on exams • Professors don't have to maintain real code 89
Experience : Concurrent programs involving tasks, shared state, and complicated locking can't be understood (or debugged) by humans • Practical advice: • Tasks should never share state • Tasks can communicate, but should only pass immutable data structures • Strive for task isolation and simplicity 90
"If there's one lesson we've learned from 30+ years of concurrent programming it is: just don't share state. It's like two drunkards trying to share a beer. It doesn't matter if they're good buddies. Sooner or later they're going to get into a fight. And the more drunkards you add to the pavement, the more they fight each other over the beer. The tragic majority of multithreaded applications look like drunken bar fights." - ØMQ (The Guide)
Instead of having shared data structures, focus on communication between threads • Examples: • Producer/consumer • Publish/subscribe • Request/response • Big idea : think of threads as independent actors (like network servers) 93
• Threaded programs can often be organized into into producers and consumers 94 Task 1 (Producer) Task 2 (Consumer) inbox send(item) • Instead of "sharing" data, threads only coordinate by sending data to each other • Think pipes, sockets, etc.
• A thread-safe message queue • Basic operations from queue import Queue q = Queue([maxsize]) # Create a queue q.put(item) # Put an item on the queue q.get() # Get an item from the queue q.empty() # Check if empty q.full() # Check if full q.qsize() # Queue size 95 • To use: write your code so that it strictly adheres to get/put operations.
• Example of setting up a producer and consumer # Produce an item msg_q.put(item) 96 while True: item = msg_q.get() consume_item(item) from queue import Queue msg_q = Queue() Producer Thread Consumer Thread • Items are sent from one thread to another • No shared state except for the queue
98 import queue class Task: def bootstrap(self): self._messages = queue.Queue() ... def send(self,msg): self._messages.put(msg) def recv(self): return self._messages.get() def run(self): while True: # Get a message from the queue msg = self.recv() # Work on msg ... • Add an internal queue, send(), and recv() • send() stores incoming messages. run() receives messages with recv()
class TaskExit(Exception): pass class Task: ... def send(self,msg): self._messages.put(msg) def stop(self): # Use None to signal end of messages self._messages.put(TaskExit) ... • To gracefully shutdown, use a sentinel • Sentinel value goes on the end of the queue • Previous messages will get processed first
class TaskExit(Exception): pass class Task: ... def recv(self): msg = self._messages.get() if msg is TaskExit: raise TaskExit() return msg def bootstrap(self): ... try: self.run() except TaskExit: pass ... • Check for sentinel in recv() and raise exception • Can catch in bootstrap() behind the scenes
• For sanity, messages should be immutable! • Strings, Numbers, Tuples • This means... • No dictionaries or instances • Recall : "Nothing but sorrow" • Personal preference: Use tuples of immutables • If you can't do that, at least send copies
• Passing data between threads involves a transfer of information (and involves memory) Task 1 msg send() Task 2 • Are you sending a reference or a value? Task 1 msg Task 2 • If reference, there is shared state (danger) refcnt=1 refcnt=2
• With messages, it is often useful to have a tag that identifies the kind of message task.send(('foo',msg)) task.send(('bar',msg)) • Allows the receiver to recognize different kinds of messages and process accordingly def run(self): ... tag, msg = self.recv() if tag == 'foo': # Process 'foo' messages ... elif tag == 'bar': # Process 'bar' messages
• Tagging can also be used to separate messages originating from multiple sources Task 1 Task 2 Task 3 ('foo', msg) ('bar', msg) • Example: A consumer that receives data from multiple input sources
Queues can be created with a max size q = queue.Queue([maxsize]) # Bounded queue 107 • Can be used to prevent unbounded queue growth of consumers • Example: Limiting messages to slow or non- responsive consumers
Normally, put/get block a thread • Non-blocking get/put q.get(False) # Get item or queue.Empty exception q.put(item,False) # Put item or queue.Full exception 108 • Queuing with timeouts q.get(timeout=PERIOD) q.put(item, timeout=PERIOD)
Timeout 109 import queue class Task: ... def recv(self,*,timeout=None): try: self._messages.get(timeout=timeout) except queue.Empty: raise RecvTimeoutError() ... • Implementation • Receivers should have the option of not blocking forever (if they want)
Queue limits, non-blocking, and timeouts are significantly more complicated to use in practice than you might imagine • Introduces problem of "flow-control" • Queue limits might result in deadlock • Non-blocking might result in discarded messages
Producers might use a subscription model Publisher Subscriber channel channel Subscriber Subscriber • Think chat, RSS, XMPP, logging, etc... • Publishers send message into a channel, subscribers receive the feed
implement, it is common to define an intermediary object for message handling 113 Publisher Subscriber Subscriber Subscriber Gateway • Gateway receives messages and deals with details of distribution, routing, subscriptions, etc. • Goal : Loose coupling
You're already familiar with something that works exactly like this: the logging module 114 Handler Handler Handler Logger • Logger gets logging messages and publishes them to various subscribed handlers • Can use it as a rough design model log.info("Hi")
Simplistic implementation of a Gateway 115 class Gateway: def __init__(self): self._channels = {} def subscribe(self,task,channel): self._channels.setdefault(channel,set()).add(task) def unsubscribe(self,task,channel): self._channels[channel].remove(task) def publish(self,msg,channel): for task in self._channels[channels]: task.send(msg) • Caution: It's missing some locking (added in the exercise)
Internally, there are different channels 116 class Gateway: def __init__(self): self._channels = {} def subscribe(self,task,channel): self._channels.setdefault(channel,set()).add(task) def unsubscribe(self,task,channel): self._channels[channel].remove(task) def publish(self,msg,channel): for task in self._channels[channel]: task.send(msg)
publish method 118 • Simply forwards messages to any tasks subscribed on a channel class Gateway: def __init__(self): self._channels = {} def subscribe(self,task,channel): self._channels.setdefault(channel,set()).add(task) def unsubscribe(self,task,channel): self._channels[channel].remove(task) def publish(self,msg,channel): for task in self._channels[channel]: task.send(msg)
There is at least one gateway in the system • Used by all of the threads for publishing 119 Gateway Subscriber Subscriber Subscriber Publisher Publisher • Key: Multiple publishers and subscribers
class Task: ... def run(self): gateway.subscribe(self) try: ... finally: gateway.unsubscribe(self) • Problem : Tasks come and go. • Must be careful to manage subscriptions • Example: unsubscribe on exception/return
= {} def get_gateway(name): if name not in _gateways: _gateways[name] = Gateway() return _gateways[name] • Publishers should try to remain relatively decoupled from the gateways (i.e., avoid passing direct references around). • One approach: Emulate the logging interface • Use: gateway = get_gateway("mygateway") gateway.publish(msg,"channel")
123 • pub/sub is a very flexible approach • Promotes loose coupling of tasks • Reliability features (task restart, redundancy, etc.) • Scalability to multiple machines (later) • Can also be used to implement system-level features such as task monitoring, events, logging, debugging, etc.
allows diagnostic tools to be attached 124 Gateway Monitoring Consumer Publisher • Idea: Optional components can listen in on the communication and report back
have dedicated gateways for events 125 Task • Special tasks for handling abnormal situations, crashes, etc. Task Task Event Gateway crash ☠ Crash Manager
Sometimes concurrent tasks/threads are used to perform background work on behalf of other code 127 master worker task Request Response/result • Scenario : Master hands work over to a separate task and continues with other processing. Gets the result at some later time
Very tricky: Instead of just sending data, you have an asynchronous request/response cycle 128 master worker Request Response/result ??? • Issue : How does the result come back?
This is nothing like a normal function call • The work is finished at some undetermined time in the future • The master doesn't know when the result will arrive--and it may want to do other things in the meantime • Comment: This problem also comes up in other settings (distributed computing, etc.) 129
Define an object that represents a future result class UnavailableError(Exception): pass class FutureResult: def set(self,value): self._value = value def get(self): if hasattr(self,"_value"): return self._value else: raise UnavailableError("No result") 130 • The idea here: Return the result if it's been set, otherwise raise an exception
Sample use (interactive mode) >>> r = FutureResult() >>> r.get() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "result.py", line 9, in get raise UnavailableError("no result") __main__.UnavailableError: no result >>> r.set(42) >>> r.get() 42 >>> 131 • Now, let's see how you might use it
• FutureResult object is created by worker and given back to the requestor • Worker sets the result when work finished class WorkerTask(Task): def request(self,msg): fresult = FutureResult() # Create FutureResult self.send((fresult,msg)) # Send along with msg return fresult # Return result object def run(self): while True: # Get a message fresult,msg = self.recv() # Work on msg ... # Set the result fresult.set(response) # Set the response
Example of making a request: 133 fresult = worker.request(msg) # Make request to worker ... ... do other things while worker works ... # Get the result (at a later time) r = fresult.get() • Keep in mind: the worker is operating concurrently in the background • When it finishes (at unknown time), it will store data in the returned result object
What if a worker wants to communicate an error or exception? • This is tricky---you're basically passing an exception between tasks • It's not "normal" exception handling 134
• Modified Result object class UnavailableError(Exception): pass class FutureResult: def set(self,value): self._value = value def set_error(self): self._exc = sys.exc_info() def get(self): if hasattr(self,"_exc"): raise self._exc[1].with_traceback(self._exc[2]) elif hasattr(self,"_value"): return self._value else: raise UnavailableError("No result") 135 Reraise the exception that occurred in the worker Save current exception information
An example of setting an exception 136 class WorkerTask(Task): def run(self): ... try: ... some work ... result.set(value) except: result.set_error() fresult = worker.send(msg) ... ... r = fresult.get() Exception actually gets raised here (when someone is interested in the outcome) Save the current exception • Admittedly, it's a little mind-bending
• How does the master thread know when the result has been made available? 138 master worker Request Response/result • Does it have to constantly poll? • Does it just wait for awhile? • This is a timing/synchronization issue
• Using events in our result object from threading import Event class FutureResult: def __init__(self): self._evt = Event() def set(self,value): self._value = value self._evt.set() def get(self): self._evt.wait() return self._value 140 • Idea : get() will simply use the event to wait for the result to become available
Allow requestor to cancel 143 class FutureResult: def __init__(self): self._cancel = False def cancel(self): self._cancel = True • Example use in worker def run(self): while True: fresult, msg = self.recv() if fresult._cancel: continue ...
Results can include progress information 144 class FutureResult: def __init__(self): self.progress = 0 ... • Example use in worker (requestor can monitor) def run(self): while True: fresult, msg = self.recv() # Work on the result ... fresult.progress += n # Update progress ... # Done fresult.set(response) • Alternate: publish progress on a channel
A function that fires when result is ready 145 class FutureResult: def __init__(self): self._callback = None def set_callback(self,cb): self._callback = cb def set(self,result): self._value = result if self._callback: # Invoke callback (if set) self._callback(result) • Example use: def when_done(result): print(result) fresult = worker.request(msg) fresult.set_callback(when_done)
It is common for work to be farmed out to a pool of worker tasks 147 master worker pool Request Response/result worker worker worker worker • Example : A pool of dozen different worker threads handle incoming work
Sketch of implementation 149 class WorkerPool(Task): def __init__(self,nworkers=1): ... self.nworkers = nworkers def run(self): for n in range(1,self.nworkers): thr = threading.Thread(target=self.do_work) thr.daemon = True thr.start() self.do_work() def do_work(self): while True: fresult,msg = self.recv() ... do work ... fresult.set(value) • Tricky bits : Shutdown (see exercise) Launch of multiple worker threads
• In practice, worker pools tend to have an API that is similar to this: 150 class WorkerPool(Task): ... def apply(self,func,args=(),kwargs={}): # Runs func(*args,**kwargs) in a worker thread ... def map(self,func,sequence): # Runs [func(s) for s in sequence] in worker ... • apply() - Runs a callable in a worker • map() - Applies a function to a sequence using multiple workers. (related to map-reduce)
have done a lot of work putting it together • Task objects • Different communication patterns • Task diagnostics, debugging, logging, etc. • It sets the stage for other topics 152
Thread? • Python threads are real system threads • POSIX threads (pthreads) • Windows threads • Fully managed by the host operating system • All scheduling/thread switching • Represent threaded execution of the Python interpreter process (written in C) 154
• There's not much going on... • Here's what happens • Python creates a small data structure containing some interpreter state • A new thread (pthread) is launched • The thread calls PyEval_CallObject • Last step is just a C function call that runs whatever Python callable was specified 155
Each thread has its own interpreter specific data structure (PyThreadState) • Current stack frame (for Python code) • Current recursion depth • Thread ID • Some per-thread exception information • Optional tracing/profiling/debugging hooks • It's a small C structure (~84 bytes) 156
module • Certain functions in the sys module are tied to thread-specific state (exceptions, diagnostics) • Example : sys.exc_info() 158 try: statements except Exception: # Get per-thread exception information etype, value, tb = sys.exc_info() • Some other thread-specific functions sys.exc_clear() sys.settrace() sys.setprofile() ...
• But there's a catch... • Only one Python thread can execute in the interpreter process at once • There is a "global interpreter lock" that carefully controls thread execution • The GIL ensures that sure each thread gets exclusive access to the entire interpreter internals when it's running 159
Whenever a thread runs, it holds the GIL • However, the GIL is released on I/O 160 I/O I/O I/O release acquire release acquire acquire release • So, any time a thread is forced to wait, other "ready" threads get their chance to run • Basically a kind of "cooperative" multitasking run run run run acquire
If threads hammer the CPU and don't do I/O, they will periodically switch back and forth • Python 3.1 and earlier: threads have option of switching every 100 "instructions" (ticks) • Python 3.2 and newer: threads switch every 5ms • Switch period can be tuned (if needed) 161 sys.setcheckinterval(nticks) # Python 3.1 and earlier sys.setswitchinterval(secs) # Python 3.2 and newer
GIL there? • Simplifies the implementation of the Python interpreter (okay, sort of a lame excuse) • Better suited for reference counting (Python's memory management scheme) • Simplifies the use of C/C++ extensions. Extension functions do not need to worry about thread synchronization • Is it ever going away? Probably not soon. 162
• In June, 2009, I gave a talk about the GIL 163 http://www.dabeaz.com/python/GIL.pdf • I won't repeat it all here, but here's the gist: Python threads should not be used for CPU- bound processing (e.g., crunching data) • This led to a PyCON'2010 presentation http://www.dabeaz.com/GIL
• For heavy CPU processing, Python is limited to a single CPU-core • Thus, threads can not be used to provide any kind of parallel processing • No performance gain at all • In older versions of Python (< 3.2), threads might make the performance far worse 164
The worst part of the GIL is not the fact that it limits the use of multiple cores • There are other ways to utilize multicore (later) • GIL actually causes all sorts of bizarre problems with timing and scheduling of threads • Examples: Response time, I/O throughput 165
• Long running instructions block progress • Would manifest itself as an annoying "pause" in a GUI, game, or network application • Example : A request is sent to a server, but it doesn't respond for 10 seconds 168
• Illustration 169 Thread 1 Thread 2 STALLED long instruction (running) GIL release event arrives • No way for a long instruction to be preempted • All other threads stall, waiting for completion running sleeping done response
170 • Consider this code fragment: def receive_data(s): msg = bytearray() while True: data = s.recv(1024) ! if not data: break ! msg.extend(data) return msg • It receives and assembles a message on a socket • Imagine it's part of some web/messaging code
172 • Now, introduce a CPU bound thread def receive_data(s): msg = bytearray() while True: data = s.recv(1024) ! if not data: break ! msg.extend(data) return msg • Thread 2 is just doing some "work" Thread 1 Thread 2 def spin(): while True: pass
173 • A test: Sending 1MB of data between interpreters python python (2 threads) 1MB • Time to receive : ~2.23s (470Kb/sec) • Almost 250x times slower! Yikes!
• GIL performance problems are related to the mechanism used to switch threads • In particular, the preemption mechanism and lack of thread priorities • Must illustrate
Thread 1 Thread 2 SUSPENDED running • Now, a second thread makes an appearance... • It is suspended because it doesn't have the GIL • Somehow, it has to get it from Thread 1
Thread 1 Thread 2 SUSPENDED running • Second thread does a timed wait on GIL • The idea : Thread 2 waits to see if the GIL gets released voluntarily by Thread 1 (e.g., if Thread 1 performs I/O or goes to sleep) wait(gil, TIMEOUT) By default TIMEOUT is 5 milliseconds, but it can be changed
Thread 1 Thread 2 SUSPENDED running • A thread might give up the GIL voluntarily • This is straightforward--the GIL is just handed to the waiting thread (all is well) wait(gil, TIMEOUT) I/O wait release running SUSPENDED
Thread 1 Thread 2 SUSPENDED running • What if a thread doesn't suspend? (CPU bound) • After timeout, the waiting thread initiates a "drop request" and repeats the wait operation wait(gil, TIMEOUT) TIMEOUT drop_request wait(gil, TIMEOUT)
Thread 1 Thread 2 SUSPENDED running • Thread 1 suspends when drop request received • GIL released after the current instruction completes (recall, it might take awhile) wait(gil, TIMEOUT) TIMEOUT wait(gil, TIMEOUT) drop_request release running
Thread 1 Thread 2 SUSPENDED running • On a forced release, Thread 1 waits for an ack • Signal indicates that the other thread successfully got the GIL and is now running wait(gil, TIMEOUT) TIMEOUT wait(gil, TIMEOUT) drop_request release running WAIT wait(ack) SUSPENDED ack
Thread 1 Thread 2 SUSPENDED running • The process now repeats itself for Thread 1 • This sequence happens over and over again as CPU-bound threads execute wait(gil, TIMEOUT) TIMEOUT wait(gil, TIMEOUT) drop_request release running WAIT wait(ack) SUSPENDED wait(gil, TIMEOUT) ack
Time 184 • If a thread wants the GIL, it might wait 5ms • There is no way for a "high priority" thread to grab the GIL away from a CPU-bound thread • Example : Critical event received on network • You can't guarantee an immediate response
Illustration 185 Thread 1 Thread 2 READY running wait(gil, TIMEOUT) release running IOWAIT data arrives wait(gil, TIMEOUT) TIMEOUT drop_request • To handle I/O, thread 2 must go through the entire timeout sequence to get control
186 Thread 1 Thread 2 SUSPENDED running TIMEOUT drop_request SUSPENDED Thread 3 SUSPENDED SUSPENDED running release • Thread that first wants the GIL might not get it • Not only can you not guarantee response time, you can't guarantee scheduling
Release • CPU-bound threads degrade I/O 187 Thread 1 Thread 2 READY running run data arrives • Each I/O call (e.g., recv/send) drops the GIL and restarts the CPU-bound Thread 1 • Each time Thread 1 runs, need 5ms to preempt data arrives running READY run release running READY data arrives 5ms 5ms 5ms
188 • Recall our original code def receive_data(s): msg = bytearray() while True: data = s.recv(1024) ! if not data: break ! msg.extend(data) return msg • It required ~2.23s to receive 1MB (vs. 0.009s) • ~1024 recv() operations (each releases the GIL) • Thread 2 gets scheduled more than it ought to Thread 1 Thread 2 def spin(): while True: pass
C Code • C/C++ extensions can release the interpreter lock and run independently • Caveat : Once released, C code shouldn't do any processing related to the Python interpreter or Python objects • You might be able to use this to take advantage of multiple cores 190
C Extensions • Having C extensions release the GIL is how you get into true "parallel computing" 191 Python instructions C code GIL release GIL acquire C threads Python instructions Thread 1 Thread 2 Python instructions GIL acquire GIL release • Python and C can run in parallel
C Extensions • Having C extensions release the GIL is how you get into true "parallel computing" 192 Python instructions C code GIL release GIL acquire C threads Python instructions Thread 1 Thread 2 Python instructions GIL acquire GIL release • Python and C can run in parallel Key part: This execution must not involve the Python interpreter
the GIL • C extensions use special macros 193 PyObject *pyfunc(PyObject *self, PyObject *args) { ... Py_BEGIN_ALLOW_THREADS // Threaded C code ... Py_END_ALLOW_THREADS ... } • Certain extensions such as ctypes also release the GIL automatically
C Extensions • The trick with C extensions is that you have to make sure they do enough work • You won't get any benefit if the C code only runs a few simple calculations • You need to do a lot of calculation (e.g., thousands of floating point ops). 194
Another GIL workaround is to delegate CPU-intensive work to a subprocess • Example : Send work to a separate Python interpreter over a pipe or socket • Have that interpreter operate independently and send results back when done • We're going to look at this later... 195
• Threads are often considered expensive • Somewhat true, but the actual overhead tends to be overblown (especially in blog posts and rants about threads) 197
198 • Operating systems know how to deal with threads • For I/O waiting, it's highly optimized s1 s2 sockets OS Kernel thr1 thr2 thr3 wait queues thr thr thr ready queue thr running recv data • Basically a scheduler and a bunch of queues • Waiting/waking are just queue operations recv()
• Each time the system switches threads, it performs a "context-switch." • Saves CPU state (registers, etc.) • Might flush CPU cache, TLB, etc. • Actual behavior depends on system • However: excessive context switching is bad • Biggest killer of thread performance? (maybe)
200 • Context-switching overhead is a practical concern for threading based on messaging • Does every message sent involve a context-switch from the sender to the receiver for processing? • Or do messages get queued up for awhile • What happens during rapid messaging? • Requires detailed study and performance analysis
• Tasks that send messages at a high frequency, should be modified to group messages Producer Consumer messages Producer Consumer grouped messages • Results in fewer actual send() calls, less locking, less context switching, etc.
• Every I/O operation also involves a potential context switch because the GIL is released • For best performance : Threads should perform a small number of large I/O operations instead of a large number of small I/O operations (buffering) • Example : It's better to write a single 1Mb message to a socket than 1024 1K messages. • Make the operating system work for you...
• Don't send a bunch of small fragments while expression: ... s.sendall(fragment) ... • Use buffering and a single send msgbuf = bytearray() while expression: ... msgbuf.extend(fragment) ... s.sendall(msgbuf) • Will greatly reduce system overhead, amount of thread switching, locking, etc.
• Each thread gets its own C stack 204 t1 = Thread() t1.start() t2 = Thread() t2.start() t3 = Thread() t3.start() virtual memory thread 1 (8MB) thread 2 (8MB) thread 3 (8MB) main thread ... ... Heap • The size is determined by the system (Python uses the default settings) • Creating many threads may quickly eat up the VM address space stack
• Thread stack size can be adjusted 205 import threading threading.stack_size(65536) t1 = threading.Thread(...) t2 = threading.Thread(...) • Must be a 4K multiple, minimum is 32K • 32K is adequate for a lot of Python code • Each function call < 512 bytes of stack • Might need more for C extensions (if too small, may get a violent crash - SegFault)
"The best performance optimization is the transition from the non-working to the working state." -- John Ousterhout • If you are going to program with threads, correctness and reliability must have higher priority than all other concerns • You can optimize it later • There are other ways to scale things up
• Threads are very useful for certain kinds of problems involving I/O, waiting, etc. • In some cases, they're the best choice • In other cases, they're terrible • And sometimes, they're the only way
• We've covered a lot of ground • Threads/Tasks • Various design patterns • Messaging idioms • Performance considerations • All of this forms the foundation for other topics (processes, distributed computing, etc.)
209 • If you're going to use threads, keep them hidden in the background • Don't make end-users mess around with them • Observe: In our task library, end-user task classes don't do anything with threads (they receive messages, but details are hidden) • Again: Encapsulation is key
you know, Python has a global interpreter lock (GIL) that limits thread performance • It means that you can only utilize a single CPU within any given program • In this section, we look at a workaround-- carrying out work in a subprocess 2
• Each running program is a "process" • Executes independently • Has own memory • Has own resources (files, sockets, etc.) • Can be scheduled on different CPUs by OS • Each instance of the Python interpreter that runs on your system is a process 3
For CPU-intensive work, a common strategy is to use cooperating processes 4 python (master) python (worker) python (worker) • Multiple copies of the python interpreter that run on different CPUs and exchange data • Not networking (all on the same machine) CPU 1 CPU 2
thread-based programs were strongly tied to this kind of design (messaging, queues, etc.) • That was intentional • Threads have known scalability problems as problems and systems get large • To solve that, you are almost forced to move to processes (and later distributed systems) • This is a universal problem--not just Python 5
A standard library module for carrying out work in separate processes • Can be used to distribute work to other CPUs and to take advantage of multiple cores • Also has some distributed computing features (to be covered a little later) 6
A process-based worker pool 7 p = Process(target=somefunc) p = multiprocessing.Pool([numprocesses]) • This is the main feature you should use • It executes functions in a subprocess • It's very high-level (you don't need to worry a lot about internal details)
Core pool operations 8 p = Process(target=somefunc) p = multiprocessing.Pool([numprocesses]) p.apply(func [, args [, kwargs]]) p.apply_async(func [, args [, kwargs [, callback]]]) p.map(func, [, iterable [, chunksize]]) • There are some others, but these are enough to get started • Let's see some examples
Running a function in another process 9 p = Process(target=somefunc) def add(x,y): return x+y if __name__ == '__main__': p = Pool(2) r = p.apply(add,(2,3)) print(r) • apply() runs a function in one of the worker processes and returns the result • Note: It waits for the result to come back
Suppose you have a lot of threads 10 p = Process(target=somefunc) Thread 1 Thread 2 Thread 3 Thread 4 • If they're all I/O bound, life is good • Mostly they sleep, hardly any GIL contention
Now suppose a thread wants to do work 11 p = Process(target=somefunc) Thread 1 Thread 2 Thread 3 Thread 4 CPU-bound processing • Thread holds GIL • Causes contention with other threads GIL contention
Delegating work to a pool 12 p = Process(target=somefunc) Thread 1 Thread 2 Thread 3 Thread 4 CPU-bound processing Pool apply() result • Pool is separate process. No GIL waiting
Asynchronous execution 13 p = Process(target=somefunc) def add(x,y): return x+y if __name__ == '__main__': p = Pool(2) r = p.apply_async(add,(2,3)) # Other work ... # Collect the result at a later time print(r.get()) • Here, you get a handle to an object for retrieving the result at some later time (like a future result)
For asynchronous execution, you get a special AsyncResult object • Here is a mini reference on it 14 p = Process(target=somefunc) a.get([timeout]) # Get the result a.ready() # Result ready? a.successful() # Completed without errors a.wait([timeout]) # Wait for result • get() is the most useful method, but there are other operations for polling, querying error status, etc.
Asynchronous execution with callback 15 p = Process(target=somefunc) def add(x,y): return x+y def gotresult(result): print(result) if __name__ == '__main__': p = Pool(2) r = p.apply_async(add,(2,3),callback=gotresult) • Here, a callback function fires when the result is received
Used to initiate parallel computation 16 p = Process(target=somefunc) Thread Pool apply_async() • Thread initiates multiple operations • Collects results later results
• Usage depends on the context • Use pool.apply() if you're using a pool to do work on behalf of a thread (and there are a lot of threads) • Use pool.apply_async() if there's only one execution thread and it's trying to farm out work to multiple workers at once 17 p = Process(target=somefunc)
pool.map() - Maps a function onto a sequence 18 def square(x): return x*x if __name__ == '__main__': p = Pool(2) nums = range(1000) squares = p.map(square,nums,100) • This subdivides a sequence into chunks and farms out work to the pool workers p1 p2 result input pool workers
pool.map() is similar to a list comprehension 19 def square(x): return x*x # Compute [x*x for x in nums] nums = range(1000) squares = p.map(square,nums) • Restrictions: • Mapped function must be module-level • No lambda • No instance methods (won't work)
To effectively use a process pool, you need to make sure enough work gets carried out to recover the cost of communication • So, you probably wouldn't do it for just simple operations like adding two numbers • Likewise, you may not want to send massive amounts of data back and forth 22
Pools should only be created by the main thread of an application • If created at script-level, must be protected by a __main__ check 23 p = Process(target=somefunc) if __name__ == '__main__': p = Pool() ... • If you forget, may get in a recursive process creation loop (e.g., fork-bomb) on Windows
To be nice, you should shut down pools 24 p = Process(target=somefunc) p.close() # Indicate no further work p.join() # Wait for all pending work to finish • Do this at program termination • To immediately kill p.terminate()
State • Every worker in a process pool is completely isolated (no sharing) • They do not have access to any state in the master process that created the pool • This is the complete opposite of threads 25 p = Process(target=somefunc)
Pools have an initialization option for startup 26 p = Process(target=somefunc) def my_init(a,b,c): # Initialize myself ... if __name__ == '__main__': p = Pool(initializer=my_init, initargs=(1,2,3)) ... • This is the only safe way to initialize the state of worker processes in a pool • It is not safe to rely upon the values of global variables set prior to pool creation
• Pool workers will share any open files or sockets that were in use at Pool creation • You might want this • You might not • Could cause "unexplicable" system behavior related to resource management (e.g., files not being closed correctly, etc.) 27 p = Process(target=somefunc)
28 p = Process(target=somefunc) • Pools should be created before threads • Created immediately at program startup • Before any other threads are running • Do not create/launch pools within threads • It might "work", but you'll be on weak ground
• How to add to your application? • Best advice: Create a single pool at application startup and use it everywhere you want to do work in a subprocess 29 application task1 task3 task2 task4 pool subprocess subprocess subprocess subprocess
• Pool is then used as needed • For example, in a task-thread 30 class MyTask(Task): def run(self): while True: msg = self.recv() ... # Go run some CPU-intensive function r = pool.apply(some_func,arg) # Process the result ... • The pool is just a utility that tasks use when they they're going to do expensive work
• Some other good rules of thumb • Functions should not depend on global state and have no side effects • Passed arguments should only consist of simple data structures (strings, nums, tuples, lists, dicts, etc.). • Also: No instance methods allowed 31
Methods • An example: 32 pool = Pool() # Does this make any sense? items = [] pool.apply(items.append,123) • If you think about it long enough, you'll realize that it's nonsense (would have to send the whole instance and any modifications would be lost) • Anyways, it's not allowed (get exception)
fundamental building block used by multiprocessing is the Process object • It's pretty low-level, but its interface mimics threads • Allows you to launch a specific python function inside a subprocess 34
• Launching a function in a process def countdown(count): while count > 0: print "Counting down", count count -= 1 time.sleep(5) if __name__ == '__main__': p1 = multiprocessing.Process(target=countdown, args=(10,)) p1.start() • You create a Process object • Use start() to launch it 35
Defining a process by a class import time import multiprocessing class CountdownProcess(multiprocessing.Process): def __init__(self,count): multiprocessing. Process.__init__(self) self.count = count def run(self): while self.count > 0: print "Counting down", self.count self.count -= 1 time.sleep(5) return if __name__ == '__main__': p1 = CountdownProcess(10) # Create the process object p1.start() # Launch the process 36
• Joining a process (waits for termination) p = Process(target=somefunc) p.start() ... p.join() • Making a daemonic process 37 p = Process(target=somefunc) p.daemon = True p.start() • Terminating a process p = Process(target=somefunc) ... p.terminate() p = Process(target=somefunc) • These mirror similar thread functions
processes is really easy • Correct use of processes is hard • Partly due to platform differences • Also due to the means of process creation • Too many layers of abstraction? 38 p = Process(target=somefunc)
In order create a new process, multiprocessing carries out a process "fork" • A clone of the calling process is created p1 = Process() p1.start() 39 p = Process(target=somefunc) python python python fork() parallel execution • The clone is identical to the original process
Unix • Processes are created using os.fork() • Child process is identical to parent • Same state (e.g., variables, instances) • Same open files • Same open sockets • All threads except the caller are discarded • Assume that the worker gets the entire state of the parent process (sans threads) 40 p = Process(target=somefunc)
Windows has no such OS feature • Process creation on windows • A new Python process is created, and the process arguments are pickled across a pipe connecting the two processes • The startup time is horrible (vs. Unix) • Can not rely on any sharing • Assume that the worker only gets the parameters passed to it 41 p = Process(target=somefunc)
Multiprocessing provides large assortment of primitives for coordinating and communicating with low-level processes • Queue, JoinableQueue, Pipe, etc. • Lock, Rlock, Semaphore, Event, Condition • Shared memory objects, etc. • My advice: Don't bother (seriously) 42 p = Process(target=somefunc)
A consumer process 43 p = Process(target=somefunc) def consumer(input_q): while True: # Get an item from the queue item = input_q.get() # Process item print(item) • A producer process def producer(output_q): while not done: # Produce some item ... # Put on the output queue output_q.put(item)
Running the two processes 44 p = Process(target=somefunc) if __name__ == '__main__': from multiprocessing import Process, Queue q = Queue() # Launch the consumer process cons_p = Process(target=consumer,args=(q,)) cons_p.daemon = True cons_p.start() # Run the producer function on some data producer(q)
Multiprocessing serves a very specific purpose • If your system has multiple CPUs/cores, then pools can be used to take advantage of them • Advice : Do not use Process objects as the foundation for building a concurrent programming framework (like we did for threads) • Too many mind-boggling problems related to the reliable use of process forking and environment 45 p = Process(target=somefunc)
programs are heavily based on I/O • For example, network servers • Due to thread issues, some programmers have turned to alternative concurrency approaches • Usually based on event-driven programming 2 p = Process(target=somefunc)
I/O is a fundamental part of most programs • However, there are many different programming models for carrying it out • Blocking • Nonblocking • Polling/multiplexing ("async") 4 p = Process(target=somefunc)
If an I/O operation (e.g., read or write) does not return until the operation actually completes, the operation is said to "block." • Example : s.recv() on a network socket • Normally, this operation temporarily stops your program and waits until some kind of data is available to be read • This is the normal programming model 99% of programmers know about 5 p = Process(target=somefunc)
Underneath the covers, blocking I/O is tied to the underlying operating system and buffering • Every file descriptor (file, socket, etc.) has some internal memory buffers 6 p = Process(target=somefunc) send recv in out OS Kernel Python send recv in out buffers s1 s2
The decision to block is entirely based on the buffer contents • Empty buffers (recv) or full buffers (send) 7 p = Process(target=somefunc) send recv in out OS Kernel Python send recv in out blocked sender (no buffer space) s1 s2 blocked receiver (no data)
• Reading : If buffered data is available, return it. Otherwise block until data becomes available. • Writing : If buffer space is available, store output data in the buffer. If no more space is available, block until space becomes available • The buffering aspect of this is essential • Tuned to deal with mismatch between CPU speed and speed of I/O devices 8 p = Process(target=somefunc)
Sends • In network programs, send() often only sends the number of bytes that will actually fit into the system buffers • If writing low-level code, you have to check for this and use repeated sends for all data 9 p = Process(target=somefunc) s = socket(AF_INET, SOCK_STREAM) ... index = 0 while index < len(msg): index += s.send(msg[index:]) ... • Alternative: Use s.sendall(data)
• The amount of buffer space can often be tuned 10 p = Process(target=somefunc) s = some socket # Set the receive buffer size s.setsockopt(SOL_SOCKET, SO_RCVBUF, 65536) # Set the send buffer size s.setsockopt(SOL_SOCKET, SO_SNDBUF, 65536) • By changing these, you might be able to improve network performance, flow control, etc. • There are other TCP tuning parameters (see documentation for setsockopt)
An alternative I/O model that changes the behavior of I/O waiting • If an I/O operation would have blocked, the operation returns immediately with a raised exception instead of waiting 11 p = Process(target=somefunc)
Example of setting up non-blocking socket I/O 12 p = Process(target=somefunc) from socket import * s = socket(AF_INET, SOCK_STREAM) s.bind(("",15000)) s.listen(5) # Wait for a connection c,a = s.accept() # Turn on nonblocking mode on client connection c.setblocking(False) • Now, try to read from it >>> c.recv(8192) Traceback (most recent call last): File "<stdin>", line 1, in <module> socket.error: [Errno 35] Resource temporarily unavailable >>>
Catching a non-blocking error 13 p = Process(target=somefunc) import errno try: data = s.recv(8192) ... except socket.error as e: if e.errno == errno.EWOULDBLOCK: # Would have blocked ... else: # Some other socket error ... • It can get messy fast
• Non-blocking I/O is useful if you're trying to overlap I/O with other kinds of processing • Achieving a kind of concurrency between I/O operations and other computation • It offers a guarantee that an I/O operation won't cause your program to get "stuck." • It's heavily used in some network programming frameworks (especially those based on event handling, generators, and coroutines). 14 p = Process(target=somefunc)
Polling - An approach where you manually check for I/O activity and respond to it • Typically associated with event-loops 16 p = Process(target=somefunc) while True: ... processing ... if poll_for_io(): process I/O ... ... processing • For example, a program might check for I/O activity every few milliseconds
Used to support polling • Provides interfaces to the following • select() - Unix and Windows • poll() - Unix • epoll() - Linux • kqueue() - BSD • kevent() - BSD 17 p = Process(target=somefunc)
Used for I/O multiplexing/polling • Usage : select(rset,wset,eset [,timeout]) 18 p = Process(target=somefunc) readers = [...] # sockets waiting to read writers = [...] # sockets waiting to write exc = [...] # sockets to check for exceptions rset,wset,eset = select(readers,writers,exc) for r in rset: # Handle readers handle_read(r) for w in wset: # Handle writers handle_write(w) for e in eset: # Handle exceptions handle_exception(e)
You just give select() sets of all sockets/file descriptors of interest • select() then returns the sets of descriptors on which different actions can be performed • read - Data is available in read-buffer • write - Buffer space is available • exception - An exceptional condition (meaning depends on the kind of file) 19 p = Process(target=somefunc)
select() blocks until some activity is detected • There is an optional timeout parameter • Use the timeout if some other processing is going on at the same time • For example, if I/O polling is also embedded inside a GUI event loop 20 p = Process(target=somefunc)
There is often an OS limit of 1024 files • This limits the number of sockets/files that can be monitored by a single select() • Performance is O(n), n is # sockets • Causes scalability issues as n gets large • There are workarounds for both issues, but they aren't cross platform and you'll need to experiment (e.g., poll(), epoll(), etc.) 21 p = Process(target=somefunc)
• Can use select() to build event-driven systems • The underlying idea is actually pretty simple • You monitor a collection of I/O streams and create a stream of "events" that put pushed into callback or handler functions 23 p = Process(target=somefunc)
• Next, write an I/O event dispatcher using polling 25 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write()
• Event handler registration 26 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() Registration and management of event handlers
• Collecting handler read/write status 27 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() Collect all of the sockets that want to read or write
• Polling for activity handlers that want I/O 28 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() poll
• Invocation of I/O callback methods 29 p = Process(target=somefunc) class EventDispatcher: def __init__(self): self.handlers = set() def register(self,handler): self.handlers.add(handler) def unregister(self,handler): self.handlers.remove(handler) def run(self,timeout=None): while self.handlers: readers = [h for h in self.handlers if h.readable()] writers = [h for h in self.handlers if h.writable()] rset,wset,e = select(readers,writers,[],timeout) for r in rset: r.handle_read() for w in wset: w.handle_write() invoke handlers for sockets that can read/write
• In this framework, applications get implemented as IOHandler objects wrapped around a specific file or socket object 30 p = Process(target=somefunc) class SomeHandler(IOHandler): def __init__(self,sock): self.sock = sock ... def fileno(self): return self.sock.fileno() • The internals don't really matter, but there must be a fileno() method to supply a file descriptor to select()/poll() operations
• Tasks must keep internal state that determines if they are interested in reading or writing 31 p = Process(target=somefunc) class SomeHandler(IOHandler): def __init__(self,sock): ... self.wants_to_read = True self.wants_to_write = False ... def readable(self): return self.wants_to_read def writable(self): return self.wants_to_write ... • These methods tell the polling loop what events it should be looking for at any given time
• Tasks must define methods to actually handle read/write events 32 p = Process(target=somefunc) class SomeHandler(IOHandler): ... def handle_read(self): ... data = self.sock.recv(8192) ... def handle_write(self): ... self.sock.send(somedata) ... • These methods only get called if the event loop has received some kind of matching event
run multiple tasks, you just register multiple handlers with the event loop and run its main event loop 33 p = Process(target=somefunc) dispatcher = EventDispatcher() dispatcher.register(SomeHandler(s1)) # s1 is a socket dispatcher.register(SomeHandler(s2)) # s2 is a socket ... dispatcher.run() • In theory, this set up allows your program to monitor multiple network connections
Handler • How to run it 35 p = Process(target=somefunc) dispatcher = EventDispatcher() dispatcher.register(TimeHandler(('',10000))) dispatcher.run() • How to test it >>> from socket import * >>> s = socket(AF_INET, SOCK_DGRAM) >>> s.sendto(b"",("localhost",10000)) >>> s.recvfrom(8192) (b'Thu Dec 23 10:52:07 2010', ('127.0.0.1', 10000)) >>>
How to handle streaming connections? • Many network applications have long-lived network connections and bidirectional data transmission (TCP) • Further complication: Each connection request creates a new socket to manage 37 p = Process(target=somefunc)
handle long-lived connections, you define a IOHandler just for the client connection • Handler instances are then dynamically added and removed from the event dispatcher as connections are opened and closed • Added when a new connection is received • Removed when a connection is closed 38 p = Process(target=somefunc)
Client 39 p = Process(target=somefunc) class EchoClientHandler(IOHandler): def __init__(self,sock,addr,dispatcher): self.sock = sock self.dispatcher = dispatcher self.outgoing = bytearray() self.closed = False self.dispatcher.register(self) def fileno(self): return self.sock.fileno() def readable(self): return not self.closed def handle_read(self): msg = self.sock.recv(65536) if not msg: self.closed = True # Closed if not self.outgoing: self.dispatcher.unregister(self) self.close() else: self.outgoing.extend(msg) Read handlers get data and save it in an outgoing data buffer Handle client shutdown
Client 40 p = Process(target=somefunc) class EchoClientHandler(IOHandler): def __init__(self,sock,addr,dispatcher): self.sock = sock self.dispatcher = dispatcher self.outgoing = bytearray() self.closed = False self.dispatcher.register(self) ... def writable(self): return True if self.outgoing else False def handle_write(self): nsent = self.sock.send(self.outgoing) self.outgoing = self.outgoing[nsent:] if not self.outgoing and self.closed: self.dispatcher.unregister(self) self.sock.close() Write handler sends as much outgoing data as it can Handle client shutdown
The handler class we just defined is for already established connections. • Still need to plug it into some kind of server • Recall : This is a traditional TCP server 41 p = Process(target=somefunc) sock = socket(AF_INET, SOCK_STREAM) sock.bind(address) sock.listen(5) while True: client, addr = sock.accept() # Go handle the client • Need to have an event-driven version
Server 43 p = Process(target=somefunc) This is the key part. On each new connection, a new client handler is created and added to the dispatcher class TCPServerHandler(IOHandler): def __init__(self,address,handler,dispatcher): self.handler = handler self.dispatcher = dispatcher self.sock = socket(AF_INET, SOCK_STREAM) self.sock.setsockopt(SOL_SOCKET, SO_REUSEADDR,1) self.sock.bind(address) self.sock.listen(5) self.sock.setblocking(False) self.dispatcher.register(self) def fileno(self): return self.sock.fileno() def readable(self): return True def handle_read(self): c,a = self.sock.accept() c.setblocking(False) self.handler(c,addr,self.dispatcher) • Event-driven server for accepting connections Triggered on each new connection
Server • Running the server 44 p = Process(target=somefunc) dispatcher = EventDispatcher() serv = TCPServerHandler(('',20000), EchoClientHandler, dispatcher) dispatcher.run() • If it works, you'll be able to open up multiple connections and interact with it • Assuming your head hasn't exploded
overall concept underlying the last example is the basis for the much misunderstood (or maligned?) asyncore library module • Also the basis for the Twisted framework • As you will observe, the resulting programs can respond to multiple I/O channels without threads or processes 46 p = Process(target=somefunc)
• asyncore standard library module • Implements a wrapper around sockets that turn all blocking I/O operations into events 47 p = Process(target=somefunc) s = socket(...) s.accept() s.connect(addr) s.recv(maxbytes) s.send(msg) ... from asyncore import dispatcher class MyApp(dispatcher): def handle_accept(self): ... def handle_connect(self): ... def handle_read(self): ... def handle_write(self): ... # Create a socket and wrap it s = MyApp(socket())
You manipulate wrapped sockets that operate both as normal sockets and as event dispatchers • The general idea is the same as what we just covered so I won't go into more detail here • "Python Essential Reference, 4th Ed." has some detailed examples of using asyncore and the related asynchat module 48 p = Process(target=somefunc)
A large event-driven framework built around I/O polling and multiplexing concepts http://twistedmatrix.com • It's similar to what you would get if you started with asyncore and then built the entire universe on top of it using nothing but event handlers and callbacks
• Here is an echo server in Twisted (straight from the manual) from twisted.internet.protocol import Protocol, Factory from twisted.internet import reactor class Echo(Protocol): def dataReceived(self, data): self.transport.write(data) def main(): f = Factory() f.protocol = Echo reactor.listenTCP(45000, f) reactor.run() if __name__ == '__main__': main() An event callback Running the event loop
51 • All event-driven I/O systems have a variety of really tricky programming issues • Scalability • Long-running calculations • Blocking operations • Interoperability with other code • Let's briefly discuss
• Event-driven I/O is often based on polling mechanisms such as select() or poll() • Both of those operations scale rather poorly as the number of monitored objects increases • So, as the number of clients increases, more and more time is going to be spent performing the poll operation • A real issue if there is rapid messaging (although solvable in a non-portable manner)
53 • One benefit of event-driven I/O is that it has predictable resource use (especially memory) • Reduced context-switching (?) • Having thousands of open files/sockets doesn't really consume any significant memory • Unlike threads, you don't have to allocate extra stack space and other resources
• If an event handler runs a long calculation, it blocks everything until it completes • Example : Parsing a large XML message • Remember, there are no threads or preemption • This would manifest itself as program "stall". You've probably seen this with GUIs.
• Event-driven systems also have a really hard time dealing with blocking operations • Reading from the file system • Performing database queries • Connecting to other services • If any of these operations take place in an event handler, the entire server/application stalls until it completes (no threads)
57 • Consider this code... class ApplicationHandler(object): ... def handle_request(self): ... results = db.execute("select * from table where ...") for r in results: ... A database query (blocks?) • Everything waits until the callback method finishes its execution • An issue if it happens to take a long time Processing of the results (CPU-intensive?)
• Blocking operations might be handed to a separate thread/process to avoid stalling class ApplicationHandler(object): ... def handle_request(self): ... launch_thread(do_query, "select * from table where ...") • But there is the tricky of problem of coordinating what happens upon completion
• Common approach: Completion Callback class ApplicationHandler(object): ... def handle_request(self): ... launch_thread(do_query, "select * from table where ...", callback=self.process_results) def process_results(self,results): ... continued processing • Behind the scenes, system will run the operation in a separate thread, collect the result, coordinate with the event-handling framework, and then fire the callback
61 • Commentary: Coordinating workers and I/O polling is a lot trickier than it looks select() Event Loop Worker Thread select() launch thread Event Loop result callback(result) What happens here?
62 • Problem: select() only works with sockets • Workers utilize queues, messaging, other means Event Loop sockets queues select() ??? • Question: How do you make this work?
• Event-driven programming tends to force an event-driven programming style across your entire application program • This includes all external libraries and everything else used by your application • However, most programming libraries are not written in an event-driven style • For instance, the entire standard library
• I don't like event-driven I/O programming • Applying it across a large application is a very good way to create unmaintainable code that's a maze of twisty little passages, all different. • I put it in the same category as assembly code (although not as easy to follow - sic) • Okay to use it internally (in libraries), but don't expose it to the rest of the world
• Event-driven I/O is still useful is certain domains • Programs that are already event-based (for example GUI programs) • Games • Most normal network programs are better served by using threads or processes • Especially small-to-medium scale projects
• Multiple independent copies of the Python interpreter (or programs in other languages) • Running in separate processes • Possibly on different machines • Sending/receiving messages 2
passing is a well-established technique for concurrent programming • It has been successfully scaled up to systems involving tens of thousands of processors (e.g., supercomputers, Linux clusters, etc.) • The foundation of distributed computing • We've already covered some basic ideas 3
Process Process send() recv() connection • On the surface, it's really simple • Processes only send and receive messages • There are really only two main issues • What is a message? • How is it transported?
There is no universally accepted programming interface or implementation of messaging • There are dozens of different packages that offer different features and options • Covering every possible angle of message passing interfaces is simply impossible here • And a reference manual would be rather dull 5
There are actually many other issues • Reliability, redundancy, fault tolerance • Security, authentication, encryption • Performance (bandwidth, latency, load- balancing, quality of service, etc.) • Routing, network topology • Interoperability, systems integration • Messaging libraries are often a terror 6
Our focus is going to be on general programming idioms related to messaging • This mostly concentrates on the boundary between Python and the messaging layer 7 message transport message library Python message library Python Our Focus recv send send recv
• A minimal encoding (size prefixed bytes) size Message (bytes) • Message is just bytes with a size header • No interpretation of the bytes (opaque) • So, payload could be anything at all (any encoding, any programming language, etc.)
Messages have to be transmitted (somehow) between running processes • Inter-Process Communication (IPC) • Some low-level communication primitives • Pipes • FIFOs • Sockets (Network Programming) 11
file-like abstraction that allows a byte stream to be transmitted across processes • Perhaps the most portable way to set up a pipe is to use the subprocess module • A standard library module for launching subprocesses that works cross-platform 12
Launching a subprocess and hooking up the child process via a pipe • Use the subprocess module 13 import subprocess p = subprocess.Popen(['python','child.py'], stdin=subprocess.PIPE, stdout=subprocess.PIPE) p.stdin.write(data) # Send data to subprocess p.stdout.read(size) # Read data from subprocess Parent p.stdin p.stdout Child sys.stdin sys.stdout Pipe
It is also possible to set up a named pipe • Creating (in Unix) 14 import os os.mkfifo("/tmp/myfifo") • Using in different processes f = open("/tmp/myfifo","wb") f.write("Some data\n") f.flush() f = open("/tmp/myfifo","rb") line = f.readline() Writer Reader • Note : Not on Windows with this API
up a listener 15 # Set up a listener s = socket(AF_INET,SOCK_STREAM) s.bind(("",12345) s.listen(5) c,a = s.accept() • Connecting as a client s = socket(AF_INET,SOCK_STREAM) s.connect(("localhost",12345))
• There are many messaging frameworks • AMQP • ØMQ • RabbitMQ • Celery • Common theme : Putting a higher-level interface on top of sockets, pipes, etc.
19 • ZeroMQ (http://www.zeromq.org/) • In a nutshell : Message-based sockets • In their own words... "A ØMQ socket is what you get when you take a normal TCP socket, inject it with a mix of radioactive isotopes stolen from a secret Soviet atomic research project, bombard it with 1950-era cosmic rays, and put it into the hands of a drug-addled comic book author with a badly- disguised fetish for bulging muscles clad in spandex." • I would cautiously agree
20 • Here's an example of an echo server: • That's it • And this server can already handle requests from 100s (or 1000s) or connected clients # echoserver.py import zmq context = zmq.Context() sock = context.socket(zmq.REP) sock.bind("tcp://*:6000") while True: message = sock.recv() # Get a message sock.send(b"Hi:"+message) # Send a reply
21 • Here's an example of a echo client • That's also pretty simple (it just works) # echoclient.py import zmq context = zmq.Context() sock = context.socket(zmq.REQ) sock.connect("tcp://localhost:6000") sock.send(b"Spam") # Send a request resp = sock.recv() # Get response print(resp)
22 • Some cool features • Can start server or client in any order • Clients can connect to multiple servers (load balancing, redundancy, etc.) • Variety of socket types (Reply, Request, Push, Pull, Publish, Subscribe, etc.)
23 • Example client connected to multiple servers • Request gets sent to one of the servers • Think about scaling, redundancy, etc. import zmq context = zmq.Context() sock = context.socket(zmq.REQ) sock.connect("tcp://host1.com:6000") sock.connect("tcp://host2.com:7000") sock.connect("tcp://host3.com:7000") sock.send(b"Spam") # Send a request resp = sock.recv() # Get response
• Messaging systems are meant to be used internally, not exposed to end-users Web server HTTP "Mom" Messaging • Used by all of the back-end code--hidden away in dark server rooms, etc.
• As a general rule, you don't want internal messaging to be exposed to the outside • The usual techniques apply • Firewalls • Secure sockets (SSL) • Digital certificates, public/private key • VPNs
• You may have situations where messaging uses an exposed connection (e.g., allowing anyone to connect if they know the right address and port) • For this, you probably want to have some kind of authentication scheme • Common approach : Use cryptographic hash authentication (MD5, SHA, etc.)
This is only an authentication scheme • Can be used to keep unwanted clients and outsiders from establishing a connection to the messaging framework • Reasonably secure (based on difficulty of breaking cryptographic hashes) • It's not encryption. Messages themselves could be seen by a packet sniffer, etc. (although key is never transmitted)
• How to serialize Python objects? • Lists, dictionaries, sets, instances, etc. • An issue here is that Python is extremely flexible with respect to data types • Containers can also hold mixed data • There is no easy format for describing Python objects (e.g., a simple array or by fixed binary data structures)
A module for serializing Python objects 35 • Serializing an object onto a "file" import pickle ... pickle.dump(someobj,f) • Unserializing an object from a file someobj = pickle.load(f) • Here, a file might be a file, a pipe, a wrapper around a socket, etc.
What objects are compatible? • Nearly any object that consists of data • None, numbers, strings • Tuples, lists, dicts, sets, etc. • Instances of objects • Functions and classes (tricky) • The underlying message encoding is "self- describing" (which hides a lot of details) 36
Objects not compatible with pickle • Anything involving system or runtime state • Open files, sockets, etc. • Threads • Running generator functions • Stack frames • Closures 37
• Pickle also creates byte strings import pickle # Convert to a string s = pickle.dumps(someobj) ... # Load from a string someobj = pickle.loads(s) • This can be used if you need to embed a Python object into some other messaging protocol or data encoding 38
• Pickle supports class instances class Point: def __init__(self,x,y): self.x = x self.y = y • Example: 39 p = Point(23,24) pickle.dump(p,f) • Caveat: Class definition must be on receiver • Advice : Don't send instances around in messaging systems (too fragile)
Objects • Large objects can cause problems • Depending on the object, pickle might make a memory copy of the entire object (either while sending or during reconstruction) • Example: Arrays (array module) 40 import array a = array.array('i',range(10000000)) ... pickle.dump(a,f) # Makes a memory copy (40MB) • Better to split into smaller messages
• Functions and classes can be pickled, but they are only name references 41 # foo.py def bar(x,y): return x+y • Example of pickled data >>> import pickle >>> pickle.dumps(bar) 'cbar\nfoo\np0\n.' >>> • When unpickled, the name references are resolved by importing the needed modules Notice module and function name
It's not secure at all • Never use pickle with untrusted clients (malformed pickles can be used to execute arbitrary system commands) • Bottom line : Never receive pickled data on an untrusted or unauthenticated connection 42
Pickle is really only useful for Python • Would not use if you need to communicate to other programming languages • However, you can do some pretty amazing things with it if Python is your environment • There is already built-in messaging support 43
multiprocessing provides Listener and Client objects that transmit pickled data • A Listener (receives connections) 44 p = Process(target=somefunc) from multiprocessing.connection import Listener serv = Listener(('',15000),authkey='12345') # Wait for a connection client = serv.accept() # Now, wait for messages to arrive while True: msg = client.recv() # process the message
Example Client 45 p = Process(target=somefunc) from multiprocessing.connection import Client conn = Client(('localhost',15000),authkey='12345') conn.send(msg) • You will notice a similarity to sockets • Except that it's much higher level and it sends pickled objects
Some important features of connections • Authentication (uses HMAC, a technique based on message digests such as SHA) • Instead of bytes, you send Python objects • Data is encoded using pickle • Is extremely useful if you're just going to hook two Python interpreters together 46 p = Process(target=somefunc)
If dealing with C/C++ extensions or foreign software (non-python), you often turn to binary-encoded data • Such software might transmit data in a binary encoding of some sort • If you know what you are using, you can bridge Python to it as long as you speak the protocol 48
There are some standard binary messaging protocols in use • Protocol Buffers (Google) • Thrift (Facebook) • BSON (MongoDB??) • All of these have Python libraries and tools • Under the covers, they have to deal with binary data encoding/decoding 49
Packs/unpacks binary records and structures • Example: Suppose you had this structure struct Stock { char name[8]; int shares; double price; }; 50 • Now, suppose you wanted to encode/ decode raw byte streams with that record? name shares price 8 bytes 4 bytes 8 bytes
First, create a Struct object from struct import Struct StockStruct = Struct("8sid") 51 • Structure is described by a "format string" "8s" = char [8] "i" = int "d" = double • To write the format string, you have to precisely know what the structure is • And you need to know the format codes...
Packing/unpacking codes (based on C) 'c' char (1 byte string) 'b' signed char (8-bit integer) 'B' unsigned char (8-bit integer) 'h' short (16-bit integer) 'H' unsigned short (16-bit integer) 'i' int (32-bit integer) 'I' unsigned int (32-bit integer) 'l' long (32 or 64 bit integer) 'L' unsigned long (32 or 64 bit integer) 'q' long long (64 bit integer) 'Q' unsigned long long (64 bit integer) 'f' float (32 bit) 'd' double (64 bit) 's' char[] (String) 'p' char[] (String with 8-bit length) 'P' void * (Pointer) 52
Each code may be preceded by a repetition count '4i' 4 integers '20s' 20-byte string • Integer alignment and byte order modifiers '@' Native byte order and alignment '=' Native byte order, standard alignment '<' Little-endian, standard alignment '>' Big-endian, standard alignment '!' Network (big-endian), standard align • Only one modifier is allowed and it goes first 53 '<ii' 2 little endian integers '!hi' Network order short and integer
By default, structure fields are concatenated together with no padding or alignment '8sid' • Be aware that this differs from C/C++ struct Stock { char name[8]; int shares; double price; }; • Use '@' if native C alignment is needed 54 name shares price 8 bytes 4 bytes 8 bytes name shares shares price 0-7 8-15 16-23 unused padding StockStruct = Struct("@8sid")
Unpacking a byte string into a tuple from struct import Struct StockStruct = Struct("8sid") rawbytes = f.read(StockStruct.size) name, shares, price = StockStruct.unpack(rawbytes) stock = { 'name' : name.strip('\x00'), 'shares' : shares, 'price' : price } 56 • You do this for reading/receiving • If you need to create other datatypes (dicts, instances), that's an extra step
struct has some module-level functions 57 struct.pack("8sid", "GOOG", 100, 490.10) struct.unpack("8sid", rawbytes) • They can be used without having to create a special Struct object • However, they don't run as fast because they have to interpret the format each time • Probably best avoided in code that is doing a lot of packing/unpacking
Binary records are inherently "unportable" • Great attention to detail is required • Encoding can vary by platform (e.g., 32-bit vs. 64 bits) • Still useful if you have control over the environment and you know what you're doing 58
systems, networks, and message passing are the basis of distributed computing • So far, we've covered some basic design patterns and underlying mechanics • Topics not yet covered : more advanced messaging techniques • Connecting to the outside world (foreign systems, interoperability, etc.) 2 p = Process(target=somefunc)
In the thread section, we defined tasks that received and acted upon messages sent to them 4 p = Process(target=somefunc) class MyTask(Task): def run(self): while True: msg = self.recv() # Get a message ... # Do something with it ... m = MyTask() m.start() m.send(msg) # Send a task a message • Formally: this is an example of an "actor"
actors might work together 5 p = Process(target=somefunc) actor actor actor send() send() send() • Again, independent tasks sending messages actor send() send()
• Some desirable characteristics • No shared state (messages only) • One operation : sending a message • Messages are asynchronous • Concurrent execution • Again, we already built all of this 6 p = Process(target=somefunc)
• The programming model is very minimal • Thus, you can understand it (maybe) • There is a large body of theoretical knowledge (computer scientists have been studying them since the 1970s) • Techniques involving actors are applicable to more advanced scenarios 7 p = Process(target=somefunc)
• Consider an application built from a large collection of independent actors/tasks • How do the actors find and link to each other? • What happens if actors crash, restart, etc? • As a general rule, you want decoupling 9 p = Process(target=somefunc)
Don't want: Actors programmed so that they hold direct references to other actor instances 10 p = Process(target=somefunc) class Actor(Task): def __init__(self,target): self.target = target def run(self): ... self.target.send(msg) ... • This results in a very rigid/fragile design • Makes it nearly impossible to make adjustments to the system organization Actor Actor .target
Better : Indirectly refer to actors through some kind of naming registry 11 p = Process(target=somefunc) _registry = {} def register(name, actor): _registry[name] = actor def unregister(name): del _registry[name] def lookup(name): return _registry.get(name) • Give all actors an identifying name • Have them register/unregister as needed
Registration of actors 12 p = Process(target=somefunc) register("spam", SpamActor()) register("foo", FooActor()) register("bar", BarActor()) • This is just building a centralized table _registry = { 'spam' : <SpamActor object at 0x38d0b0>, 'foo' : <FooActor object at 0x38ea80>, 'bar' : <BarActor object at 0x3a88a0>, ... }
Perform all messaging through a global function that relies upon the registry 13 p = Process(target=somefunc) def send(target_name,msg): target = lookup(target_name) if target: target.send(msg) • All actors now always use the global send() class Actor(Task): def __init__(self,target_name): self.target_name = target_name def run(self): ... send(self.target_name, msg) ...
To distribute actors, you additionally need to have some kind of IPC/networking component 15 p = Process(target=somefunc) process 1 process 2 • You can use the earlier messaging techniques • multiprocessing, ØMQ, etc.
To implement distributed actors, you must focus on send() and actor names 16 p = Process(target=somefunc) local process local actors remote process remote actors "a" "b" "c" "d" "e" "f" "g" send() • Essentially, send() has to seamlessly work with both local and remote actors
Messages can be directed to a remote system through the use of a proxy 17 p = Process(target=somefunc) local process remote process remote actors "e" "f" "g" send() proxy "e" • A proxy receives messages for a remote actor and forwards them to a remote process "a" proxy "g" send()
Implementing a proxy 18 p = Process(target=somefunc) class ProxyTask(Task): def __init__(self,proxyname,target,conn): super().__init__(name="proxy") self.proxyname = proxyname self.target = target self.conn = conn def run(self): try: while True: msg = self.recv() conn.send((self.target,msg)) finally: unregister(self.proxyname) • Receives messages and forwards them on some kind of connection
p = Process(target=somefunc) • Create a utility function for it from multiprocessing.connection import Client def proxy(proxyname,target,address,authkey): conn = Client(address,authkey=authkey) pxy = ProxyTask(proxyname,target,conn) pxy.start() register(proxyname,pxy) proxy("e","e",("localhost",15000),authkey=b"12345") proxy("ext:f","f",("localhost",15000),authkey=b"12345") # Send a message to a remote actor send("e","hello world") send("ext:f","hello world") • Creates a connection using multiprocessing and registers a proxy task to accept messages
A dispatcher is needed to receive messages 20 p = Process(target=somefunc) local process actors "e" "f" "g" • The dispatcher is a server that accepts connections, receives messages, and forwards them to the local actors "a" dispatch proxy "e" proxy "g"
p = Process(target=somefunc) class DispatchClientTask(Task): def __init__(self,conn): super().__init__(name="dispatchclient") self.conn = conn def run(self): try: while True: target,msg = self.conn.recv() send(target,msg) finally: self.conn.close() • First, you need a task that receives messages from the proxy class • It just takes messages from the connection and sends them locally message handling
p = Process(target=somefunc) from multiprocessing.connection import Listener class DispatcherTask(Task): def __init__(self,address,authkey): super().__init__(name="dispatcher") self.address = address self.authkey = authkey def run(self): serv = Listener(self.address,authkey=self.authkey) while True: try: client = serv.accept() DispatchClientTask(client).start() except Exception as e: self.log.info("Error : %s", e, exc_info=True) • Next, you need a server that accepts connections • Launches a new client task each connection connection handling
p = Process(target=somefunc) _dispatcher = None def start_dispatcher(address,authkey): global _dispatcher if _dispatcher: return _dispatcher = DispatcherTask(address,authkey) _dispatcher.start() • Finally, a function to start the dispatcher • Important point • There is usually only one dispatcher • A singleton
• You launch it and forget about it 24 p = Process(target=somefunc) start_dispatcher(("localhost",15000),authkey=b"12345") • It operates entirely in the background and doesn't interfere with other tasks • It just delivers outside messages
p = Process(target=somefunc) "b" "a" dispatch proxy proxy send() send() Incoming messages from remote actors Messages sent to remote actors • Each process local actors • Many parts working together • Note: Think about failure modes
So far, actors have been sending messages directly to other actors 28 p = Process(target=somefunc) actor actor • This is fine, but what if you want to support some more advanced features? • Example : Replication, load balancing, etc.
Example: An actor pool 29 p = Process(target=somefunc) actor actor actor actor pool • Message goes to one actor in the pool (selected round-robin, system load, or some other criteria)
much does the sender of a message have to know about its destination? • Does it have to know about that pool? • Does it have to do all of the routing? • Or should it be blissfully unaware? 30 p = Process(target=somefunc)
Messages might be sent to an intermediary 31 p = Process(target=somefunc) actor actor actor actor pool • Broker is responsible for handling the message in some manner (selecting a target, routing, etc.) broker
Decoupling is an essential feature • sender doesn't have to know anything about how the broker operates • This kind of approach is essential for scaling and other features • Broker could transparently add/remove actors from the pool depending on load 32 p = Process(target=somefunc)
Message brokers become critical parts that can never fail under any circumstance • Failure of a broker is far more serious than loss of a single node (e.g., it takes all 1000 servers offline instead) • Obvious solution (sic) is to add even more complexity (replicated brokers, managers for the brokers, etc.). 33 p = Process(target=somefunc)
In distributed code, never assume that the mere act of sending a message is reliable • There are too many things that can go wrong in too many places • So, better to plan for the worst • We'll say more in a minute, but if a message must be delivered, you need to take extra steps to verify 34 p = Process(target=somefunc)
Trying to keep track of actor locations is hard • Especially if you force programmers to do it manually by hard-coding everything • Better solution: Create a global registry for mapping actor names to dispatchers (hosts) • Basically, a shared table that tracks actor names and locations 36 p = Process(target=somefunc)
In each process, the registry is consulted whenever send() is used to send a message to an unknown actor (maybe the registry knows) • To implement the registry, you need to the address the problem of managing state across the entire application • Problem : Registry has to be available to all tasks, all machines, etc. 38 p = Process(target=somefunc)
• Registry is essentially just a centralized table • A sensible option: Use a key-value store • It's exactly what it sounds like--a dictionary • You can easily build your own • Or use an existing one : memcached, redis, CouchDB, MongoDB, Cassandra, 39 p = Process(target=somefunc)
Key-value stores can be used for a variety of other purposes (more than just a registry) • Maintain system-wide configuration data • Store results from distributed calculation • Provide work queues • Etc. 41 p = Process(target=somefunc)
p = Process(target=somefunc) actor actor actor • Obtaining results actor key-value DB get results • Example : Actor sends out some message that disappears into a "cloud" of other actors • Picks up results by watching the DB. request
With actors, communication is one way 44 p = Process(target=somefunc) actor actor send() • However, two-way messaging is also common client server request reply • Mainstay of client/server computing • It's also the most "tricky"
Request/reply messaging adds connection state 45 p = Process(target=somefunc) send request wait request request wait reply reply client send reply server • If anything goes wrong, the connection may enter a deadlocked or invalid state
Failure modes for request/reply • Request message sent, but is lost • Server crashes before sending reply • Reply message gets lost • In all of these cases : client loses the connection or freezes waiting for a reply that never arrives 46 p = Process(target=somefunc)
• Lost requests/replies can be fixed • Have client retry requests after timeout/crash • However, this opens up even more problems 47 p = Process(target=somefunc)
Server may act upon duplicate requests • Maybe it was only the first reply that was lost 48 p = Process(target=somefunc) client • What if the request changes server state? • Example: "Do not hit reload or your credit card might be charged twice." server request reply lost??? request reply retry
Client might get two replies (slow server) 49 p = Process(target=somefunc) client server request reply request reply retry request reply • Client should be smart enough to discard reply from the duplicated request Notice how retried request results in a duplicate reply Which request?
Duplicates fixed with sequence numbers 50 p = Process(target=somefunc) client server request reply request reply retry request reply • Included in every request/reply and used to detect duplicate transactions, etc. seq: 13 seq: 13 seq: 13 seq: 14 seq: 13 seq: 14 discard
reliable messaging such as TCP means that you don't have to worry about any of this • Wrong. Dead wrong. • If you write code where a reply is expected, you have to account for failure • What if server dies unexpectedly (software crash, hardware failure, power-loss, etc.) 51 p = Process(target=somefunc)
• Remote invocation of procedures implemented on a server process 53 p = Process(target=somefunc) Server def foo(): ... def bar(): ... def spam(): ... Client s.foo() Client s.bar()
• RPC implementation uses a similar technique as used for distributed actors (dispatcher, proxies) 54 p = Process(target=somefunc) Server def foo(): ... def bar(): ... def spam(): ... Client s.foo() Client s.bar() Dispatcher proxy proxy
RPC messages simply identify a method name and include method arguments 55 p = Process(target=somefunc) # funcname = name of function # args = tuple of positional args # kwargs = dict of keyword args msg = (funcname, args, kwargs) # Make an RPC message send(target, msg) # Send it somewhere • In the server, just dispatch # Get a message funcname, args, kwargs = receive() # Look up the function and dispatch func = _functions[funcname] result = func(*args, **kwargs) send(sender, result)
RPC is a request/reply pattern • It has all of the reliability concerns discussed in the previous section • Missing replies • Crashed servers • Duplicate requests/replies • But it has other problems as well 56 p = Process(target=somefunc)
A major philosophy of RPC is that a remote procedure call should look exactly identical to a normal function call (user doesn't know) • Except that it's a flawed concept • Too much RPC leads to horrible performance (far worse than a local procedure) • Potential for partial system failures that are very difficult to debug and untangle 57 p = Process(target=somefunc)
It's very hard to scale to large systems (hard to incorporate features such as caching, fan-in, fan-out, monitoring, filtering, etc.) • Hard to maintain over time (software versions, API changes, etc.) • Rather than reconsider the design, there's a tendency to just keep pounding harder (resulting in even more complexity) 58 p = Process(target=somefunc)
Objects live on a server (where they stay put) • Clients remotely invoke instance methods 60 p = Process(target=somefunc) Server Client a.spam() Client c.bar() instances a b c spam() bar()
In principle, supporting distributed objects is similar to remote procedure call (RPC) • But there is one really big difference • Distributed objects involves the manipulation of state (instances) stored on the server • With state comes extra complication (memory management, locking, persistence, etc.) 61 p = Process(target=somefunc)
Objects are defined by a normal class 62 p = Process(target=somefunc) class Foo(object): def bar(self): ... def spam(self): ... • On the server, various instances are created a = Foo() b = Foo() c = Foo() • These are normal Python objects
• We build upon everything so far • Distributed objects are similar to actors except that they have more methods than just send() • Method invocation is usually like RPC (methods return results) • To implement, you still need dispatchers, proxies, registry services, etc. 63 p = Process(target=somefunc)
For remote access, a dispatcher is needed 64 p = Process(target=somefunc) Server instances a b c spam() bar() Dispatcher Client Requests • Exactly the same idea as with actors, RPC
Incoming requests must identify both the instance and a method to execute 65 p = Process(target=somefunc) Server instances a b c spam() bar() Dispatcher Client Requests ("a","spam",...) "a" : a "b" : b "c" : c instance registry instance names and methods are embedded
Clients generally want to use the same programming interface as the class 66 p = Process(target=somefunc) class Foo(object): def bar(self): ... def spam(self): ... a b Server a.bar() b.spam() Client • Ideally, client code shouldn't even be aware of the server (looks like a normal instance) bar() spam()
To emulate the API, proxy classes are needed 67 p = Process(target=somefunc) class FooProxy(object): def __init__(self,name,serveraddr): self.name = name self.conn = connect_to(serveraddr) def bar(self,*args): # send "bar" request to server # return result ... def spam(self,*args): # send "spam" request to server # return result ... • The proxy has the same programming API as the original object (same methods) • Proxy methods issue RPC requests to server
• In a distributed environment, many clients may be connected simultaneously • There might be server threads • May be concurrent access to the objects • Thus, you may need locking • All is lost (back to manipulating shared state) 68 p = Process(target=somefunc)
Creation • What happens if new instances get created on the server in response to requests? • How are they referenced by clients? • Who is responsible for managing them? • How long do they live? • Do they persist? (In a database) • Countless things can go wrong... 69 p = Process(target=somefunc)
• What happens if the server crashes? (objects disappear and clients crash?) • Can software on the server be fixed/updated? • Can class definitions be modified? • API changes? 70 p = Process(target=somefunc)
distributed objects is really bad idea for most projects • Massive amounts of added complexity, library dependencies, programming sophistication • Example: I once had a consulting gig where I was supposed to analyze a one million line distributed C++ application. 95% of the code was related to distributed objects (and it sucked) 71 p = Process(target=somefunc)
• Distributed object systems often follow the "objects all the way down" philosophy 73 p = Process(target=somefunc) Machine A Machine A Machine B Machine C == same • If objects are perfectly encapsulated, they can live anywhere (magically, out the cloud somewhere) • Fine except that experience says it doesn't work
Too much high-level abstraction? • Poor performance : Objects may interact in a suboptimal manner (excessive communication) • Partial failure : Part of the system dies, leaving the rest of it running, but not fully operational • Debugging and diagnostics? 74 p = Process(target=somefunc)
pyro (Python Remote Objects). A python- centric distributed object framework. Assumes that you're only working in Python. Simplifies many tasks that are harder in other systems. • CORBA. Distributed object framework designed for multiple languages. Look at: OmniORB, fnorb. Note: as far as I can tell CORBA is not hugely popular in the Python world (excessive complexity?) 75 p = Process(target=somefunc)
may want parts of your distributed system to interoperate with other components • Possibly written in other languages • Possibly located elsewhere • Possibly implemented by someone else 77 p = Process(target=somefunc)
To connect to foreign systems, you really want to focus on well-documented standards • Use common data encodings (XML, JSON, etc.) • Use common protocols (HTTP, XML-RPC, etc.) 78 p = Process(target=somefunc)
How to create a stand-alone server 80 from xmlrpc.server import SimpleXMLRPCServer def add(x,y): return x+y s = SimpleXMLRPCServer(("",8080)) s.register_function(add) s.serve_forever() • How to test it (xmlrpclib) >>> from xmlrpc.client import ServerProxy >>> s = ServerProxy("http://localhost:8080") >>> s.add(3,5) 8 >>> s.add("Hello","World") "HelloWorld" >>>
XML-RPC is extremely easy to use • Almost too easy to be honest • I have encountered a lot of major projects that are using XML-RPC for distributed control • Users seem to love it • I'm not so sure although I do love the quick and dirty hack aspect of it 83
• Some RPC libraries of interest. • Thrift. A cross-language RPC framework developed by Facebook and released as open- source. • Protocol Buffers. A cross-language RPC framework developed by Google. Also open- source. • Both use much more efficient data serialization than XML-RPC (and have other features) 84
REST (Representation State Transfer) • It's a data-centric software architecture where servers host data (resources) and implement methods for remotely interacting with the data • Strongly tied to HTTP, but think about structured data instead of hacky HTML pages. 85 p = Process(target=somefunc)
Core component of REST is a "resource" • A resource usually represents data • Resources have an associated identifier (URI) 86 p = Process(target=somefunc) http://somehost.com/someresource • The URI alone contains everything needed to locate and identify the resource (protocol, hostname, path, etc.)
Data associated with a resource is typically represented using a standard data encoding 87 p = Process(target=somefunc) • Common formats are used (XML, JSON, etc.) • May be multiple representations resource client representation
Clients interact with servers and resources using a preset vocabulary of actions (verbs) 88 p = Process(target=somefunc) • These are usually just HTTP methods • PUT and DELETE are related to creating/updating a resource (not common with browsers) GET resource PUT resource DELETE resource POST resource HEAD resource
Retrieving a resource (GET) 89 p = Process(target=somefunc) resource client GET /some/resource HTTP/1.1 200 OK Content-type: application/xml ... <?xml ...?> <root> ... </root> HTTP Server
Updating a resource (PUT) 90 p = Process(target=somefunc) resource client PUT /some/resource Content-type: application/xml Content-length: 45123 <?xml ...?> ... HTTP/1.1 200 OK ... HTTP Server • Typically, this creates a new resource if it doesn't already exist on the server
REST services are stateless • Server does not record client state • GET, PUT, etc. are the only operations • May occur in any order and at any time • It's a critical feature of the architecture • May have multiple servers (heavy load) • Fault handling (if a server crashes, etc.) 91 p = Process(target=somefunc)
• REST web services build upon HTTP • Authentication/security • Caching • Proxies • Integrates well with existing software • HTTP servers • Middleware libraries • Almost anything that speaks HTTP 92 p = Process(target=somefunc)
You typically build a REST service using the same techniques for other web programming • CGI scripting • WSGI • Web frameworks (Django, Zope, etc.) • Stand-alone HTTP server • My preference: WSGI + WebOb 93 p = Process(target=somefunc)
Too many packages to list (all Python 2) • restlib • restkit • restish • Many others on PyPI • Note: Don't confused with packages related to reStructured Text (reST) 94 p = Process(target=somefunc)
this section we look at generators and coroutines as a concurrency tool • These features of Python are not as well understood as other language elements • However, can be used as an alternative implementation tool for various aspects of distributed computing and I/O handling 2
I've given some related PyCon tutorials • "Generator Tricks for Systems Programmers" at PyCON'08 3 http://www.dabeaz.com/generators • "A Curious Course on Coroutines and Concurrency" at PyCON'09 http://www.dabeaz.com/coroutines • This is a highly condensed version
generator is a function that produces a sequence of results instead of a single value 5 def countdown(n): while n > 0: yield n n -= 1 >>> for i in countdown(5): ... print(i,end=' ') ... 5 4 3 2 1 >>> • Instead of returning a value, you generate a series of values (using the yield statement) • Typically, you hook it up to a for-loop
Behavior is quite different than normal func • Calling a generator function creates an generator object. However, it does not start running the function. def countdown(n): print("Counting down from", n) while n > 0: yield n n -= 1 >>> x = countdown(10) >>> x <generator object at 0x58490> >>> Notice that no output was produced
The function only executes on __next__() >>> x = countdown(10) >>> x <generator object at 0x58490> >>> x.__next__() Counting down from 10 10 >>> • yield produces a value, but suspends the function • Function resumes on next call to __next__() >>> x.__next__() 9 >>> x.__next__() 8 >>> Function starts executing here 7
• A Python version of Unix 'tail -f' 9 import time def follow(thefile): thefile.seek(0,2) # Go to the end of the file while True: line = thefile.readline() if not line: time.sleep(0.1) # Sleep briefly continue yield line • Example use : Watch a web-server log file logfile = open("access-log") for line in follow(logfile): print(line)
• One of the most powerful applications of generators is setting up processing pipelines • Similar to shell pipes in Unix 10 generator input sequence for x in s: generator generator • Idea: You can stack a series of generator functions together into a pipe and pull items through it with a for-loop
• Print all server log entries containing 'python' 11 def grep(pattern,lines): for line in lines: if pattern in line: yield line # Set up a processing pipe : tail -f | grep python logfile = open("access-log") loglines = follow(logfile) pylines = grep("python",loglines) # Pull results out of the processing pipeline for line in pylines: print(line) • This is just a small taste of what's possible
Expression • In Python 2.5, a slight modification to the yield statement was introduced (PEP-342) • You could now use yield as an expression • For example, on the right side of an assignment 13 def grep(pattern): print("Looking for", pattern) while True: line = yield if pattern in line: print(line) • Question : What is its value?
you use yield like this, you get a "coroutine" • These do more than just generate values • Instead, functions can consume values sent to it. 14 >>> g = grep("python") >>> next(g) # Prime it (explained shortly) Looking for python >>> g.send("Yeah, but no, but yeah, but no") >>> g.send("A series of tubes") >>> g.send("python generators rock!") python generators rock! >>> • Sent values are returned by (yield)
Execution is the same as for a generator • When you call a coroutine, nothing happens • They only run in response to next() and send() methods 15 >>> g = grep("python") >>> next(g) Looking for python >>> Notice that no output was produced On first operation, coroutine starts running
All coroutines must be "primed" by first calling .next() (or send(None)) • This advances execution to the location of the first yield expression. 16 .next() advances the coroutine to the first yield expression def grep(pattern): print("Looking for", pattern) while True: line = yield if pattern in line: print(line) • At this point, it's ready to receive a value
• Coroutines can also be used to set up pipes coroutine coroutine coroutine send() send() send() • You just chain coroutines together and push data through the pipe with send() operations • Notice the striking similarity to actors
• A source that mimics Unix 'tail -f' import time def follow(thefile, target): thefile.seek(0,2) # Go to the end of the file while True: line = thefile.readline() if not line: time.sleep(0.1) # Sleep briefly continue target.send(line) • A sink that just prints the lines @coroutine def printer(): while True: line = yield print(line)
• A grep filter coroutine @coroutine def grep(pattern,target): while True: line = yield # Receive a line if pattern in line: target.send(line) # Send to next stage • Hooking it up f = open("access-log") follow(f, grep('python', printer())) follow() grep() printer() send() send() • A picture
22 • Generators and coroutines can clearly be used to set up problems in pipelining, dataflow, actors, etc. • However, generators can also serve the role of tasks as an alternative to threads or processes • It's subtle, but let's look at the big idea
• When programs run, they alternate between CPU processing and I/O • For I/O, a program requests the services of the operating system (system calls) • I/O may cause the program to suspend run run run run I/O I/O I/O System calls in the operating system
• Underneath the covers the operating system task-switches on I/O run I/O run I/O run I/O run I/O I/O run Task A: Task B: task switch • Since I/O operations might take awhile, the system does other work while waiting
• The yield statement can be used to implement user-defined "task" switching • When a generator function hits a "yield" statement, it immediately suspends execution • If you are very clever, you can get your program to task switch between a collection of generator functions
First, you set up a collection of "tasks" 26 p = Process(target=somefunc) def countdown_task(n): while n > 0: print(n) yield n -= 1 # A queue of tasks to run from collections import deque tasks = deque([ countdown_task(5), countdown_task(10), countdown_task(15) ]) • Each task is a generator function that yields
Now, write a simple task scheduler 27 p = Process(target=somefunc) def scheduler(tasks): while tasks: task = tasks.popleft() try: next(task) # Run to the next yield tasks.append(task) # Reschedule except StopIteration: pass # Run it scheduler(tasks) • This loop will just run all of the generators (cycling between them) until there's nothing left to work on
30 • If you are a littler clever, you can have yield integrate with "blocking" I/O requests • The big idea : set up some kind of operation and then yield to have it carried out in the background by the generator scheduler
31 • In last section, we used yield to yield control, not a value • Although generators are used for iteration, we're talking about something completely different here • When yielding, control goes back to the scheduler which is free to choose what task to run next
Scheduler 32 • Tasks can send values back to the scheduler by yielding an "interesting" value • Consider the following classes class IOWait: def __init__(self,f): self.fileno = f.fileno() class ReadWait(IOWait): pass class WriteWait(IOWait): pass • These classes represent the concept of "waiting" for a specific kind of I/O event on a given file object
33 • Now, consider this generator function # Echo data received on s back to the sender def echo_data(s): while True: yield ReadWait(s) # Wait for data msg = s.recv(16384) # Read data yield WriteWait(s) # Wait for writing s.send(msg) • This generator yields instances of the classes just defined back to the scheduler • Now, let's go back to the scheduler code...
Request • Here is the scheduler 34 p = Process(target=somefunc) def scheduler(tasks): while tasks: task = tasks.popleft() try: next(task) tasks.append(task) except StopIteration: pass # Run it scheduler(tasks) • When the task yields, an instance of ReadWait or WriteWait is going to be returned by next def echo_data(s): ... yield ReadWait(s) ... Task
Request • A modified scheduler 35 p = Process(target=somefunc) def scheduler(tasks): while tasks: task = tasks.popleft() try: r = next(task) if isinstance(r,ReadWait): handle_read_wait(r,task) elif isinstance(r,WriteWait): handle_write_wait(r,task) else: tasks.append(task) except StopIteration: pass # Run it scheduler(tasks) Looking for different I/O wait requests and taking action
36 • We haven't built the I/O yet, but it's easy • To implement I/O waiting, you need two pieces • A holding area for tasks that are waiting for an I/O operation • An I/O poller that looks for I/O activity and removes tasks from the holding area when I/O is possible • Let's look at an example
• A scheduler class 37 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w))
• A scheduler class 38 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w)) Total number of tasks being managed Queue of tasks that can run
• A scheduler class 39 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w)) Dictionaries that serve as I/O holding areas { 3 : <generator>, 7 : <generator>, 6 : <generator>, } file descriptor task
• A scheduler class 40 p = Process(target=somefunc) class Scheduler: def __init__(self): self.numtasks = 0 self.ready = deque() self.read_waiting = {} self.write_waiting = {} def iopoll(self): rset,wset,eset = select(self.read_waiting, self.write_waiting,[]) for r in rset: self.ready.append(self.read_waiting.pop(r)) for w in wset: self.ready.append(self.write_waiting.pop(w)) An I/O polling function. This looks for any I/O activity on suspended tasks. If there is I/O, move the task back to the ready queue
• Implement the main scheduler loop 43 p = Process(target=somefunc) class Scheduler: ... def run(self): while self.numtasks: try: task = self.ready.popleft() try: r = next(task) if isinstance(r,ReadWait): self.readwait(r.fileno,task) elif isinstance(r,WriteWait): self.writewait(r.fileno,task) else: self.ready.append(task) except StopIteration: self.numtasks -= 1 except IndexError: self.iopoll() Run a task until it yields, check the return value
• Implement the main scheduler loop 44 p = Process(target=somefunc) class Scheduler: ... def run(self): while self.numtasks: try: task = self.ready.popleft() try: r = next(task) if isinstance(r,ReadWait): self.readwait(r.fileno,task) elif isinstance(r,WriteWait): self.writewait(r.fileno,task) else: self.ready.append(task) except StopIteration: self.numtasks -= 1 except IndexError: self.iopoll() Poll for I/O (only runs if no other work to do)
Server 45 p = Process(target=somefunc) from socket import socket, AF_INET, SOCK_DGRAM import time def timeserver(addr): s = socket(AF_INET, SOCK_DGRAM) s.bind(addr) while True: yield ReadWait(s) msg,addr = s.recvfrom(8192) yield WriteWait(s) s.sendto((time.ctime()+"\n").encode('ascii'), addr) sched = Scheduler() sched.new(timeserver(('',15000)) # Create three server sched.new(timeserver(('',16000)) # instances and add sched.new(timeserver(('',17000)) # to the scheduler sched.run()
Server 47 p = Process(target=somefunc) sched = Scheduler() echo = EchoServer(('',15000),sched) sched.run() • Running the echo server • Test it out with telnet • Will find that it works fine with multiple clients
Multitasking with generators has some interesting aspects to it • First it's based on I/O polling--just like event- driven I/O systems • However, the execution model takes a completely different direction • Instead of triggering callbacks, I/O events merely cause suspended generators to resume
= Process(target=somefunc) • Generators have normal looking control flow class EchoServer: ... def client_handler(self,client): while True: yield ReadWait(client) msg = client.recv(8192) if not msg: break yield WriteWait(client) client.send(msg) client.close() print("Client closed") • Notice how it closely mimics what you would write with threads
= Process(target=somefunc) • The yield statement can only be used in the top- level function, not subroutines • This makes it really difficult to write subroutine libraries based on generators • It's not impossible, but you have to play some clever tricks (e.g., generator "trampolining") • Being addressed in PEP-380
= Process(target=somefunc) • Generators also have the same sorts of problems as with event-driven systems • Scalability of I/O polling • Long running calculations • How to handle blocking operations • Solutions are similar. For example, to handle a blocking operation, you might run it in a separate thread until it completes
• Look at my PyCon'09 presentation for a more in- depth study of coroutines and concurrency http://www.dabeaz.com/coroutines • I've intentionally not included all of that here mainly because I don't want to just duplicate my PyCON presentation (which is freely available online) and we're probably short on time