An Asynchronous, Scalable Django with Twisted (PyCon TW 2016 Keynote)

Slide 1

Slide 1 text

An Asynchronous, Scalable Django with Twisted

Slide 2

Slide 2 text

Hello, I’m Amber Brown (HawkOwl)

Slide 3

Slide 3 text

Twitter: @hawkieowl Pronouns: she/her

Slide 4

Slide 4 text

I live in Perth, Western Australia

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Core Developer Release Manager Ported 40KLoC+ to Python 3

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Binary release management across 3 distros Ported Autobahn|Python (Tx) and Crossbar.io to Python 3 Web API/REST integration in CB

Slide 9

Slide 9 text

Scaling Django Applications

Slide 10

Slide 10 text

Django serves one request at a time

Slide 11

Slide 11 text

gunicorn, mod_wsgi, etc run multiple copies in threads + processes

Slide 12

Slide 12 text

Concurrent Requests == processes x threadpool size

Slide 13

Slide 13 text

nginx gunicorn worker thread thread thread thread gunicorn worker thread thread thread thread Example server: two workers with four threads each

Slide 14

Slide 14 text

Need more requests? Add more web servers!

Slide 15

Slide 15 text

nginx gunicorn worker thread thread thread thread gunicorn worker thread thread thread thread nginx gunicorn worker thread thread thread thread gunicorn worker thread thread thread thread HAProxy Server 2 Server 3 Server 1

Slide 16

Slide 16 text

Scaling has required adding a new piece

Slide 17

Slide 17 text

Higher scale means higher complexity

Slide 18

Slide 18 text

Is there a better way to handle many requests?

Slide 19

Slide 19 text

Problem Domain

Slide 20

Slide 20 text

Modern web applications have two things that take a long time to do

Slide 21

Slide 21 text

CPU-bound work Math, natural language processing, other data processing

Slide 22

Slide 22 text

On most Python interpreters, Python threads are unsuitable for dispatching CPU-heavy work

Slide 23

Slide 23 text

Of N Python threads only 1 may run Python code because of the Global Interpreter Lock

Slide 24

Slide 24 text

Of N Python threads only N may run C code, since the Global Interpreter Lock is released

Slide 25

Slide 25 text

I/O-bound work Database requests, web requests, other network I/O

Slide 26

Slide 26 text

Threads work better for I/O-bound work

Slide 27

Slide 27 text

Thread switching overhead is expensive Rapidly acquiring/releasing the GIL is expensive

Slide 28

Slide 28 text

First, let's focus on I/O-bound applications.

Slide 29

Slide 29 text

Asynchronous I/O & Event-Driven Programming

Slide 30

Slide 30 text

Your code is triggered on events

Slide 31

Slide 31 text

Events can be: incoming data on the network some computation is finished a subprocess has ended etc, etc

Slide 32

Slide 32 text

How do we know when events have occurred?

Slide 33

Slide 33 text

All events begin from some form of I/O, so we just wait for that!

Slide 34

Slide 34 text

Event-driven programming frameworks

Slide 35

Slide 35 text

Twisted (the project I work on!)

Slide 36

Slide 36 text

(of SVN history)

Slide 37

Slide 37 text

asyncio was introduced much later

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

Same at their core, using "selector functions"

Slide 40

Slide 40 text

select() and friends (poll, epoll, kqueue)

Slide 41

Slide 41 text

Selector functions take a list of file descriptors (e.g. sockets, open files) and tell you what is ready for reading or writing

Slide 42

Slide 42 text

Selector loops can handle thousands of open sockets and events

Slide 43

Slide 43 text

Data is channeled through a transport to a protocol (e.g. HTTP, IMAP, SSH)

Slide 44

Slide 44 text

Sending data is queued until the network is ready

Slide 45

Slide 45 text

Nothing blocks, it simply gives control to the next event to be processed

Slide 46

Slide 46 text

No blocking means no threads

Slide 47

Slide 47 text

“I/O loops” or “reactors” (as it "reacts" to I/O)

Slide 48

Slide 48 text

Higher density per core No threads required! Concurrency, not parallelism

Slide 49

Slide 49 text

Best case: high I/O throughput, high-latency clients, low CPU processing

Slide 50

Slide 50 text

But what if we need to process CPU bound tasks?

Slide 51

Slide 51 text

Event Driven Programming with Work Queues

Slide 52

Slide 52 text

CPU bound tasks are added to a queue, rather than being ran directly

Slide 53

Slide 53 text

Web Server Task Queue Worker Worker Worker

Slide 54

Slide 54 text

We have made the CPU-bound task an I/O-bound one for our web server

Slide 55

Slide 55 text

We have also made the scaling characteristics horizontal

Slide 56

Slide 56 text

Web Server Task Queue Worker Server 2 CPU3 Worker Server 2 CPU2 Worker Server2 CPU1 Worker Server 1 CPU2 Worker Server 1 CPU1 Worker Server 2 CPU4

Slide 57

Slide 57 text

Putting tasks on the queue and removing them is cheap

Slide 58

Slide 58 text

Task queues scale rather well

Slide 59

Slide 59 text

Add more workers to scale!

Slide 60

Slide 60 text

Do we have an implementation of this?

Slide 61

Slide 61 text

The Architecture of Django Channels

Slide 62

Slide 62 text

Project to make an "asynchronous Django"

Slide 63

Slide 63 text

Authored by Andrew Godwin (behind South, Migrations)

Slide 64

Slide 64 text

Interface Server Channel Queue Worker Worker Worker Worker Worker Worker Server 1 Server 2 Server 3 Server 4

Slide 65

Slide 65 text

Interface server accepts requests, puts them on the Channel (task queue)

Slide 66

Slide 66 text

Workers take requests oﬀ the Channel and process them

Slide 67

Slide 67 text

Results from processed requests are written back to the Channel

Slide 68

Slide 68 text

The interface server picks up these responses and writes it back out to the HTTP request

Slide 69

Slide 69 text

The interface server is only I/O bound and does no "work" of its own

Slide 70

Slide 70 text

Perfect application for asynchronous I/O!

Slide 71

Slide 71 text

Daphne, the reference interface server implementation, is written in Twisted

Slide 72

Slide 72 text

Daphne is capable of handling thousands of requests a second on modest hardware

Slide 73

Slide 73 text

The channel layer can be sharded

Slide 74

Slide 74 text

Channel Queue Server 2 Interface Server Worker Worker Worker Worker Worker Worker Server 1 Server 4 Server 5 Channel Queue Server 3 (Sharding)

Slide 75

Slide 75 text

Workers do not need to be on the web server... but you can put them there if you want!

Slide 76

Slide 76 text

For small sites, the channel layer can simply be an Inter- Process-Communication bus

Slide 77

Slide 77 text

Channel Queue (Shared Memory) Interface Server Worker Worker Worker Server 1

Slide 78

Slide 78 text

And Twisted understands WebSockets... so can Channels too?

Slide 79

Slide 79 text

Yep!

Slide 80

Slide 80 text

How Channels Works

Slide 81

Slide 81 text

A Channel is where requests are put to be serviced

Slide 82

Slide 82 text

What is a request? - incoming HTTP requests - connected WebSocket connection - data on a WebSocket

Slide 83

Slide 83 text

http.request http.disconnect websocket.connect websocket.receive websocket.disconnect

Slide 84

Slide 84 text

Your worker listens on these channel names

Slide 85

Slide 85 text

Information about the request (e.g. a body and headers), and a "reply channel" identifier

Slide 86

Slide 86 text

http.response! http.request.body! websocket.send!

Slide 87

Slide 87 text

http.response!c134x7y http.request.body!c134x7y websocket.send!c134x7y

Slide 88

Slide 88 text

Reply channels are connection specific so that the correct response gets to the correct connection

Slide 89

Slide 89 text

In handling a request, your code calls send() on a response channel

Slide 90

Slide 90 text

But because Channels is event-driven, you can't get a "response" from the event

Slide 91

Slide 91 text

The workers themselves do not use asynchronous I/O by default!

Slide 92

Slide 92 text

Under Channels, you write synchronous code, but smaller synchronous code

Slide 93

Slide 93 text

@receiver(post_save, sender=BlogUpdate) def send_update(sender, instance, **kwargs): Group("liveblog").send({ "id": instance.id, "content": instance.content, })

Slide 94

Slide 94 text

Group? What's a group?

Slide 95

Slide 95 text

Pool of request-specific channels for eﬀiciently sending one-to-many messages

Slide 96

Slide 96 text

e.g: add all open WebSocket connections to a group that is notified when your model is saved

Slide 97

Slide 97 text

Handling diﬀerent kinds of requests

Slide 98

Slide 98 text

Workers can listen on specific channels, they don't have to listen to all of them!

Slide 99

Slide 99 text

Interface Server Channel Queue Worker Worker Server 1 Server 2 Server 3 (high performance) Server 4 (standard) http.request bigdata.process

Slide 100

Slide 100 text

Because you can create and listen for arbitrary channels, you can funnel certain kinds of work into diﬀerent workers

Slide 101

Slide 101 text

my_data_set = request.body Channel("bigdata.process").send( {"mydata": my_data_set})

Slide 102

Slide 102 text

How do we support sending that data down the current request when it's done?

Slide 103

Slide 103 text

my_data_set = request.body Channel("bigdata.process").send({ "mydata": my_data_set, "reply_channel": message.reply_channel})

Slide 104

Slide 104 text

All our big data worker needs to do then is send the response on the reply channel!