Upgrade to Pro — share decks privately, control downloads, hide ads and more …

An Asynchronous, Scalable Django with Twisted (PyCon TW 2016 Keynote)

An Asynchronous, Scalable Django with Twisted (PyCon TW 2016 Keynote)

3d37232726396a1d3c7412dd915095ea?s=128

Amber Brown (HawkOwl)

June 04, 2016
Tweet

Transcript

  1. An Asynchronous, Scalable Django with Twisted

  2. Hello, I’m Amber Brown (HawkOwl)

  3. Twitter: @hawkieowl Pronouns: she/her

  4. I live in Perth, Western Australia

  5. None
  6. Core Developer Release Manager Ported 40KLoC+ to Python 3

  7. None
  8. Binary release management across 3 distros Ported Autobahn|Python (Tx) and

    Crossbar.io to Python 3 Web API/REST integration in CB
  9. Scaling Django Applications

  10. Django serves one request at a time

  11. gunicorn, mod_wsgi, etc run multiple copies in threads + processes

  12. Concurrent Requests == processes x threadpool size

  13. nginx gunicorn worker thread thread thread thread gunicorn worker thread

    thread thread thread Example server: two workers with four threads each
  14. Need more requests? Add more web servers!

  15. nginx gunicorn worker thread thread thread thread gunicorn worker thread

    thread thread thread nginx gunicorn worker thread thread thread thread gunicorn worker thread thread thread thread HAProxy Server 2 Server 3 Server 1
  16. Scaling has required adding a new piece

  17. Higher scale means higher complexity

  18. Is there a better way to handle many requests?

  19. Problem Domain

  20. Modern web applications have two things that take a long

    time to do
  21. CPU-bound work Math, natural language processing, other data processing

  22. On most Python interpreters, Python threads are unsuitable for dispatching

    CPU-heavy work
  23. Of N Python threads only 1 may run Python code

    because of the Global Interpreter Lock
  24. Of N Python threads only N may run C code,

    since the Global Interpreter Lock is released
  25. I/O-bound work Database requests, web requests, other network I/O

  26. Threads work better for I/O-bound work

  27. Thread switching overhead is expensive Rapidly acquiring/releasing the GIL is

    expensive
  28. First, let's focus on I/O-bound applications.

  29. Asynchronous I/O & Event-Driven Programming

  30. Your code is triggered on events

  31. Events can be: incoming data on the network some computation

    is finished a subprocess has ended etc, etc
  32. How do we know when events have occurred?

  33. All events begin from some form of I/O, so we

    just wait for that!
  34. Event-driven programming frameworks

  35. Twisted (the project I work on!)

  36. (of SVN history)

  37. asyncio was introduced much later

  38. None
  39. Same at their core, using "selector functions"

  40. select() and friends (poll, epoll, kqueue)

  41. Selector functions take a list of file descriptors (e.g. sockets,

    open files) and tell you what is ready for reading or writing
  42. Selector loops can handle thousands of open sockets and events

  43. Data is channeled through a transport to a protocol (e.g.

    HTTP, IMAP, SSH)
  44. Sending data is queued until the network is ready

  45. Nothing blocks, it simply gives control to the next event

    to be processed
  46. No blocking means no threads

  47. “I/O loops” or “reactors” (as it "reacts" to I/O)

  48. Higher density per core No threads required! Concurrency, not parallelism

  49. Best case: high I/O throughput, high-latency clients, low CPU processing

  50. But what if we need to process CPU bound tasks?

  51. Event Driven Programming with Work Queues

  52. CPU bound tasks are added to a queue, rather than

    being ran directly
  53. Web Server Task Queue Worker Worker Worker

  54. We have made the CPU-bound task an I/O-bound one for

    our web server
  55. We have also made the scaling characteristics horizontal

  56. Web Server Task Queue Worker Server 2 CPU3 Worker Server

    2 CPU2 Worker Server2 CPU1 Worker Server 1 CPU2 Worker Server 1 CPU1 Worker Server 2 CPU4
  57. Putting tasks on the queue and removing them is cheap

  58. Task queues scale rather well

  59. Add more workers to scale!

  60. Do we have an implementation of this?

  61. The Architecture of Django Channels

  62. Project to make an "asynchronous Django"

  63. Authored by Andrew Godwin (behind South, Migrations)

  64. Interface Server Channel Queue Worker Worker Worker Worker Worker Worker

    Server 1 Server 2 Server 3 Server 4
  65. Interface server accepts requests, puts them on the Channel (task

    queue)
  66. Workers take requests off the Channel and process them

  67. Results from processed requests are written back to the Channel

  68. The interface server picks up these responses and writes it

    back out to the HTTP request
  69. The interface server is only I/O bound and does no

    "work" of its own
  70. Perfect application for asynchronous I/O!

  71. Daphne, the reference interface server implementation, is written in Twisted

  72. Daphne is capable of handling thousands of requests a second

    on modest hardware
  73. The channel layer can be sharded

  74. Channel Queue Server 2 Interface Server Worker Worker Worker Worker

    Worker Worker Server 1 Server 4 Server 5 Channel Queue Server 3 (Sharding)
  75. Workers do not need to be on the web server...

    but you can put them there if you want!
  76. For small sites, the channel layer can simply be an

    Inter- Process-Communication bus
  77. Channel Queue (Shared Memory) Interface Server Worker Worker Worker Server

    1
  78. And Twisted understands WebSockets... so can Channels too?

  79. Yep!

  80. How Channels Works

  81. A Channel is where requests are put to be serviced

  82. What is a request? - incoming HTTP requests - connected

    WebSocket connection - data on a WebSocket
  83. http.request http.disconnect websocket.connect websocket.receive websocket.disconnect

  84. Your worker listens on these channel names

  85. Information about the request (e.g. a body and headers), and

    a "reply channel" identifier
  86. http.response!<client> http.request.body!<client> websocket.send!<client>

  87. http.response!c134x7y http.request.body!c134x7y websocket.send!c134x7y

  88. Reply channels are connection specific so that the correct response

    gets to the correct connection
  89. In handling a request, your code calls send() on a

    response channel
  90. But because Channels is event-driven, you can't get a "response"

    from the event
  91. The workers themselves do not use asynchronous I/O by default!

  92. Under Channels, you write synchronous code, but smaller synchronous code

  93. @receiver(post_save, sender=BlogUpdate) def send_update(sender, instance, **kwargs): Group("liveblog").send({ "id": instance.id, "content":

    instance.content, })
  94. Group? What's a group?

  95. Pool of request-specific channels for efficiently sending one-to-many messages

  96. e.g: add all open WebSocket connections to a group that

    is notified when your model is saved
  97. Handling different kinds of requests

  98. Workers can listen on specific channels, they don't have to

    listen to all of them!
  99. Interface Server Channel Queue Worker Worker Server 1 Server 2

    Server 3 (high performance) Server 4 (standard) http.request bigdata.process
  100. Because you can create and listen for arbitrary channels, you

    can funnel certain kinds of work into different workers
  101. my_data_set = request.body Channel("bigdata.process").send( {"mydata": my_data_set})

  102. How do we support sending that data down the current

    request when it's done?
  103. my_data_set = request.body Channel("bigdata.process").send({ "mydata": my_data_set, "reply_channel": message.reply_channel})

  104. All our big data worker needs to do then is

    send the response on the reply channel!
  105. Channels as a bridge to an asynchronous future

  106. A channel doesn't care if you are synchronous or asynchronous

  107. ...or written in Django or even Python!

  108. Channels implements "Asynchronous Server Gateway Interface"

  109. The path to a hybrid future Go, Django, Twisted, etc

    etc
  110. Channels is due to land in Django 1.11/2.0

  111. Try it out! channels.readthedocs.io

  112. Questions? (pls no statements, save them for after)