Django & Twisted (Django Under The Hood 2015)

Django & Twisted Django Under The Hood, 2015

Hello, I’m Amber Brown (HawkOwl)

I live in Perth, Western Australia

I organise Django Girls events!

omg it’s russ I organise Django Girls events!

I serve on the Django Code of Conduct Committee.

I’m a Twisted core developer …and release manager (get hype
for 15.5!)

(image by isometri.cc)

Paraphrasing the DjangoCon AU 2014 Keynote

I’m an invited speaker

It’s expected that I have something of use to tell
you

Talks are only worthwhile if they educate or entertain

So I’m going to say this upfront, with no ambiguity:

This talk’s conclusion is NOT “Django sucks”.

This talk’s conclusion is NOT “using Twisted makes you a
better programmer”.

This talk’s conclusion is that the future of Python web
development is working together.

Any interpretation drawing a different conclusion is incorrect

>>> Django == good True >>> Twisted == good True

WARNING This talk is full of spiders

No question is stupid

Concepts

For the purposes of this talk, synchronous code returns in-line

def sync(): return 1

…and asynchronous code calls another function with a result at
some later time

def async(func): func(1)

However, this is also asynchronous

def asyncyieldfrom(): a = yield from somefunc() return a

Contrary to how it looks, yield-from—using functions do not “return
immediately”

Python suspends at the yield point, and can run other
things — purely syntactic sugar

Blocking, for the purposes of this talk, means that Python
cannot run absolutely anything else during that period due to I/O operations

Short, CPU-bound tasks are not considered “blocking”

Long CPU-bound, or “short”/long I/O bound operations are “blocking”

“Short” I/O still takes a long time

PING google.com (150.101.170.180): 56 data bytes 64 bytes: icmp_seq=0 ttl=60
time=13.217 ms 64 bytes: icmp_seq=1 ttl=60 time=18.227 ms 64 bytes: icmp_seq=2 ttl=60 time=13.117 ms

13ms in computer time is an eternity

What is Twisted?

Asynchronous networking framework

At least a decade old

Stable & Mature (thanks to a robust Compatibility Policy)

Many protocol implementations (HTTP/1.0+1.1, SMTP, IMAP, DNS, SSH, many many
more)

Python 2.7/3.3+ (Python 3.3+ port is incomplete, 50%+ there)

Time-based versioning 15.0 == 1st release in ’15 15.5 ==
6th release in ‘15

How Twisted’s Reactor Works

Sockets usually block until the data is sent

Twisted configures the sockets to be non- blocking

Given the non-blocking socket socket, socket.write() will write to the
send buffer and return immediately

If the send buffer is full, it raises EWOULDBLOCK

Further socket.write() calls are put in a secondary send buffer
by Twisted

This secondary send buffering is taken care of by the
Twisted Protocol class (socket.write() is never called directly by user code)

socket.read() is also called automatically by Protocol

Twisted’s reactor then alerts Protocol when there is more data
to be read, or more data can be written

select, poll, epoll, kqueue

Takes a list of file descriptors (eg. sockets) and returns
the ones that can have further data written/read

If more data can be written, Protocol tries to empty
its secondary send buffer

If more data can be read, Protocol reads it and
gives it to user code with the overridden dataReceived method

That handles sending/ receiving data, but we operate on a
higher level

Each Protocol implements something — WebSockets, SMTP, et al

The Protocol is asynchronous, so the consumption of its data
must also be asynchronous

Deferreds

<anger>

“If you don’t understand Deferreds, you’re too stupid for Twisted”

That belief has no place in any Twisted I’m a
part of

If you don’t “get” Deferreds, that is OUR failure.

We need better documentation

We need better examples

We need to adopt syntactic changes that make it easier

</anger>

Deferreds are an object which holds a result at some
point in time

Callbacks mean ‘when you have a result, call this function
with the result’

Deferreds have a “callback chain”, where the result is passed
through

d = Deferred() d.addCallback(lambda t: t + 1) d.addCallback(lambda t:
print(t)) d.callback(12)

>>> d = Deferred() >>> d.addCallback(lambda t: t + 1)
<Deferred at 0x100a03c50> >>> d.addCallback(lambda t: print(t)) <Deferred at 0x100a03c50> >>> d.callback(12) 13

addCallback returns a Deferred, so you can chain it

Deferred() \ .addCallback(lambda t: t + 1) \ .addCallback(lambda t:
print(t)) \ .callback(12)

Callbacks can be synchronous (although they should not block) or
return more Deferreds

Many things return Deferreds

>> import treq >> treq.get("https://google.com") <Deferred at 0x10d6db5c0>

import treq from twisted.internet.task import react def get(reactor): d =
treq.get("http://atleastfornow.net") d.addCallback(treq.content) d.addCallback(lambda _: print(_)) return d react(get)

@inlineCallbacks

inlineCallbacks makes Deferreds act like Futures/coroutines

import treq from twisted.internet.task import react from twisted.internet.defer import inlineCallbacks
@inlineCallbacks def get(reactor): request = yield treq.get( "http://atleastfornow.net") content = yield treq.content(request) print(content) react(get)

Supported in Twisted since generators were introduced

Return a value with defer.returnValue()

Works with regular Deferreds — a function wrapped with inlineCallbacks
returns a Deferred automatically

To wait for a Deferred to fire, use yield in
the function

Making Django Asynchronous

Django is synchronous at its core

WSGI relies on what it calls being synchronous

Django’s ORM does blocking I/O

Making either of these asynchronous is complex

asynchronousness can’t be bolted on

Everything has to cooperate or everything falls apart

“Common Sense” async == hard sync == easy

In reality, each approach has tradeoffs

Synchronous Upsides • Code ﬂow is easier to understand —
do x, then y • Only one “thread” of execution, for simplicity • Many libraries are synchronous

Synchronous Downsides • You can only do one thing at
once • Although suited to the request/response cycle, it can only really do that • Persistent connections are not simple to implement

Asynchronous Upsides • Massively scalable network concurrency • Multiple “threads”
of execution — the code handling the request doesn’t have to ﬁnish after the request is written • Handling persistent/evented connections is super easy • Reactor model async is threadless • Python 3 adds some syntactic sugar that makes it easier to write

Asynchronous Downsides • “Callback hell” when using raw futures/deferreds •
You have to be a good citizen — blocking in the reactor loop is disastrous for performance • Doing I/O is “harder” because you have to be explicit about it • Python 2 lacks a bunch of async syntactic sugar

You can’t get the upsides of both

But you can try!

Threaded WSGI Runner • The standard Django deployment method —
run lots of threads, so it doesn’t matter if it blocks • Each thread is blocking, so it can’t run multiple I/O operations at once • To handle many concurrent requests, you need many threads

Hendrix • Hendrix is a “Twisted Django” • WSGI server
using Twisted, plus WebSockets • Multiprocessing, multithreaded • https://github.com/hangarunderground/hendrix

Crochet • Run Twisted code side-by-side with blocking code •
Runs a Twisted reactor in another thread, rather than Twisted calling Django • https://github.com/itamarst/crochet

The Future of Django (Django Channels)

Brainchild of Andrew Godwin

Django Channels makes Django event-driven

Asynchronous server (Twisted) + Synchronous “workers”

Requests and WebSocket events are now “events” sent through “channels”

You write synchronous code which handles these events

Channel events go on a queue, and are picked up
by workers

Workers can also put things on the queue (but can’t
get the result)

Channels Upsides • It allows you to use WebSockets! •
If you don’t care about the response (eg. a page view counter), it can be sent by a channel and run by a worker without blocking the current event • The workers don’t have to be on the same machine, allowing distribution

Channels Downsides • You can’t get the results of events
you create in your code • Your code can still only “do” one thing at a time • Your code is a few steps removed from the real WebSocket or HTTP connections, which makes it less ﬂexible

So, what does Channels look like?

When a HTTP/WebSocket event comes in from a client, it
sends a message to a channel

You implement consumers for these channels

You are given a channel to send the result of
your consumer when it is called

In the case of a HTTP request, you send back
a “channel encoded” response object

In the case of Websockets, you send back content

This content is then returned to the client

WebSocket clients can be put into “Groups”

You can then broadcast a message out to a Group

What makes it different?

Channels doesn’t actually make your code asynchronous, it just adds
async runners for your sync code

It doesn’t tackle the “hard” problem of running Django asynchronously

So it doesn’t get all the benefits as if it
did

Maybe that’s enough?

It’s a positive development for Django

It supports Python 2.7 and Python 3.3+

Check it out: http://git.io/vYEbp

So, why not just use Twisted?

Well…

The Future of Django (alternate)

WSGI II Electric Boogaloo

WSGI is currently inherently request/ response

WebSockets is useful, and WSGI II would need to support
it

WebSockets 2?

HTTP falls out of use?

Metal WSGear Solid 3 Snake Eater

Async is undergoing another renaissance

Django has to decide where it is going to sit

Adopting an asynchronous framework is a long-term way forward

It will require a lot of broken eggs, but Django
can make the transition

This is Django…

This is Django… …with async views…

This is Django… …with async views… …with an async ORM…

…running on Twisted Web…

…running on Twisted Web… …with no WSGI.

Live Demo

Caveats: I wrote this on a plane, the ORM runs
in a threadpool, the tests fail hilariously

But it’s serving concurrent web requests in pure Python

async_create() which returns a Deferred, etc

ORM needs more work

The ORM does a lot of things that cause cursor.execute()
where you wouldn’t expect

The backends need to be truly asynchronous

More separation between SQL generation, and executing that SQL

Then we have all the requirements for an asynchronous Django!

Django users have to be good async citizens

Like I said, everything has to cooperate or it all
falls apart

yield from Python 3.4

await, async iterators, async context managers PEP 492 in Python
3.5

Django might be able to support async & sync views

WSGI would work as it does now

If using Twisted as your web server, you can use
async views

Django’s ORM and other features would then be usable by
Twisted libraries

Then Django doesn’t need to care about WebSockets, or whatever
comes next

– someone, unless I imagined that “Django should have been
a Twisted plugin.”

The Future of Twisted

Twisted isn’t perfect

Contributor onboarding improvements

Contributor tooling improvements

Git migration

Twisted’s future is new blood, and we need to work
for that

Adopting a Django-style Deprecation Policy (removing deprecated junk)

Shedding the past (Python 2.6 support)

Adopting Python 3 features  (def async, yield from)

Twisted + Django

I would like to see this happen

Like I said earlier, you cannot get the upsides of
async and sync code at the same time

But with asyncio, writing asynchronous code in Python is becoming
“normal”

Features like yield from and async def can be adopted
by Twisted, even though they’re targeted at asyncio

This removes some of the difficulty of writing async code
(“callback hell”)

Makes async code look sequential

Ugly hax: github.com/hawkowl/django

Questions answered before you ask

What about gevent?

Glyph’s “Unyielding” https://goo.gl/lYDtct

— Glyph “Despite the fact that implicit coroutines masquerade under
many different names, many of which don’t include the word “thread” – for example, “greenlets”, “coroutines”, “fibers”, “tasks” – green or lightweight threads are indeed threads … In the long run, when you build a system that relies upon them, you eventually have all the pitfalls and dangers of full-blown preemptive threads.”

What would an async Django get me?

Websockets More I/O efficiency You don’t need a task manager
to run things after a response

Why do you wear a red trenchcoat?

Questions!

Django & Twisted (Django Under The Hood 2015)

Django & Twisted (Django Under The Hood 2015)

More Decks by Amber Brown (HawkOwl)

Other Decks in Programming

Featured

Transcript