Slide 1

Slide 1 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server Thomas Schorr Plone Conference 2020

Slide 2

Slide 2 text

WSGI Why Rust? Project Status Performance Demo Next steps PEP-3333: Python Web Server Gateway Interface def application(environ, start_response): """Simplest possible WSGI application""" status = '200 OK' response_headers = [ ('Content-type', 'text/plain')] start_response(status, response_headers) return [b'Hello World!\n']

Slide 3

Slide 3 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives

Slide 4

Slide 4 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests

Slide 5

Slide 5 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server

Slide 6

Slide 6 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request

Slide 7

Slide 7 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading

Slide 8

Slide 8 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads

Slide 9

Slide 9 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing

Slide 10

Slide 10 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ...

Slide 11

Slide 11 text

WSGI Why Rust? Project Status Performance Demo Next steps The Server Side • The server invokes the application callable once for each HTTP request it receives • Many possibilities for handling requests • Single threaded server • Spawn a thread for each incoming request • 1:1 threading, 1:n threading • maintain a pool of worker threads • multiprocessing • ... • The WSGI server can give hints through environ dictionary

Slide 12

Slide 12 text

WSGI Why Rust? Project Status Performance Demo Next steps The Application Side • often needs to connect to components that outlive the single request

Slide 13

Slide 13 text

WSGI Why Rust? Project Status Performance Demo Next steps The Application Side • often needs to connect to components that outlive the single request • databases, caches

Slide 14

Slide 14 text

WSGI Why Rust? Project Status Performance Demo Next steps The Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe

Slide 15

Slide 15 text

WSGI Why Rust? Project Status Performance Demo Next steps The Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive

Slide 16

Slide 16 text

WSGI Why Rust? Project Status Performance Demo Next steps The Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope

Slide 17

Slide 17 text

WSGI Why Rust? Project Status Performance Demo Next steps The Application Side • often needs to connect to components that outlive the single request • databases, caches • connection might not be thread safe • connection/setup might be expensive • all of the above is true for Zope • recipe for disaster: choose a WSGI server with an inappropriate worker model

Slide 18

Slide 18 text

WSGI Why Rust? Project Status Performance Demo Next steps Consequence: Limited Choice of WSGI servers suitable for Zope/Plone. • waitress (the default) with very good overall performance • bjoern: fast, non-blocking, single threaded • ...

Slide 19

Slide 19 text

WSGI Why Rust? Project Status Performance Demo Next steps More options please Wishlist: • multithreaded, 1:1 threading, workerpool • PasteDeploy entry point • handle the Zope/Plone use case • non-blocking • File wrapper supporting sendfile • competitive performance Non Goals • Python 2 • ASGI (not yet at least) • Windows

Slide 20

Slide 20 text

WSGI Why Rust? Project Status Performance Demo Next steps Why Rust? Naive expectations: • Faster than Python • Easier to use than C

Slide 21

Slide 21 text

WSGI Why Rust? Project Status Performance Demo Next steps Performance Performance Emmerich, P. et al (2019): The Case for Writing Network Drivers in High-Level Programming Languages. - https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/the-case-for-writing-network-drivers-in-high-level-languages.pdf .

Slide 22

Slide 22 text

WSGI Why Rust? Project Status Performance Demo Next steps Memory Management through Ownership • feature unique to Rust • a set of rules that the compiler checks at compile time (https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html) • Each value in Rust has a variable that’s called it’s owner. • There can be only one owner at a time. • When the owner goes out of scope, the value will be dropped. • Drop is a trait; there’s a default implementation that you can override • You can still control where (stack or heap) your data is stored.

Slide 23

Slide 23 text

WSGI Why Rust? Project Status Performance Demo Next steps How is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps

Slide 24

Slide 24 text

WSGI Why Rust? Project Status Performance Demo Next steps How is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF)

Slide 25

Slide 25 text

WSGI Why Rust? Project Status Performance Demo Next steps How is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF)

Slide 26

Slide 26 text

WSGI Why Rust? Project Status Performance Demo Next steps How is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps

Slide 27

Slide 27 text

WSGI Why Rust? Project Status Performance Demo Next steps How is that relevant? Example: interfacing with Python • Python memory management: reference counting + garbage collection • association: increasing an objects’ refcount using Py_INCREF • should match with corresponding Py_DECREF invocations • garbage collection when object refcount goes to 0 • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in zope.interface (50 Py_DECREF) • 1 Py_INCREF in rust-cpython (4 Py_DECREF) • very hard to create a mismatch of Py_INCREF/Py_DECREF invocations, making it harder to create memory leaks or core dumps • still possible to create more references than needed

Slide 28

Slide 28 text

WSGI Why Rust? Project Status Performance Demo Next steps Other Rust features • strict typing will find many problems at compile time • Pattern matching • very good documentation, helpful compiler messages

Slide 29

Slide 29 text

WSGI Why Rust? Project Status Performance Demo Next steps What is Pyruvate from a user perspective • a package available from PyPI:

Slide 30

Slide 30 text

WSGI Why Rust? Project Status Performance Demo Next steps What is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate

Slide 31

Slide 31 text

WSGI Why Rust? Project Status Performance Demo Next steps What is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module:

Slide 32

Slide 32 text

WSGI Why Rust? Project Status Performance Demo Next steps What is Pyruvate from a user perspective • a package available from PyPI: pip install pyruvate • an importable Python module: import pyruvate def application(environ, start_response): """WSGI application""" ... pyruvate.serve(application, '0.0.0.0:7878', 3)

Slide 33

Slide 33 text

WSGI Why Rust? Project Status Performance Demo Next steps Using Pyruvate with Zope/Plone with plone.recipe.zope2instance: • buildout.cfg [instance] recipe = plone.recipe.zope2instance http-address = 127.0.0.1:8080 eggs = Plone pyruvate wsgi-ini-template = ${buildout:directory}/ templates/pyruvate.ini.in • pyruvate.ini.in Template [server:main] use = egg:pyruvate#main socket = %(http_address)s workers = 2

Slide 34

Slide 34 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate project structure • initially created with cargo new --lib

Slide 35

Slide 35 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate project structure • initially created with cargo new --lib • Rust sources in src folder

Slide 36

Slide 36 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies

Slide 37

Slide 37 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point

Slide 38

Slide 38 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518)

Slide 39

Slide 39 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules)

Slide 40

Slide 40 text

WSGI Why Rust? Project Status Performance Demo Next steps Pyruvate project structure • initially created with cargo new --lib • Rust sources in src folder • Cargo.toml pulls Rust dependencies • setup.py • uses setuptools_rust to build a RustExtension • defines PasteDeploy entry point • pyproject.toml to specify build system requirements (PEP 518) • tests folder containing (currently only) Python tests (unit tests in Rust modules) • __init__.py in pyruvate folder • Paste Deploy entry point • FileWrapper import

Slide 41

Slide 41 text

WSGI Why Rust? Project Status Performance Demo Next steps Gitlab Pipeline • Two stages: test + build

Slide 42

Slide 42 text

WSGI Why Rust? Project Status Performance Demo Next steps Gitlab Pipeline • Two stages: test + build • Linting: rustfmt, clippy

Slide 43

Slide 43 text

WSGI Why Rust? Project Status Performance Demo Next steps Gitlab Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test

Slide 44

Slide 44 text

WSGI Why Rust? Project Status Performance Demo Next steps Gitlab Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io

Slide 45

Slide 45 text

WSGI Why Rust? Project Status Performance Demo Next steps Gitlab Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox

Slide 46

Slide 46 text

WSGI Why Rust? Project Status Performance Demo Next steps Gitlab Pipeline • Two stages: test + build • Linting: rustfmt, clippy • cargo test • coverage report using kcov, uploaded to https://codecov.io • Python integration tests with tox • build wheels

Slide 47

Slide 47 text

WSGI Why Rust? Project Status Performance Demo Next steps Binary packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions

Slide 48

Slide 48 text

WSGI Why Rust? Project Status Performance Demo Next steps Binary packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout)

Slide 49

Slide 49 text

WSGI Why Rust? Project Status Performance Demo Next steps Binary packages • manylinux2010 wheels for Python 3.6-3.9 • switched from manylinux1 after stable Rust stopped supporting the old ABI (ELF file OS ABI invalid error when loading rust shared libraries) 1.47.0 • manylinux2010 needs recent pip and setuptools versions • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust) • setuptools >= 42.0.0 (when using zc.buildout) • wanted: MacOS

Slide 50

Slide 50 text

WSGI Why Rust? Project Status Performance Demo Next steps Features • rust-cpython based Python interface (https://github.com/dgrunwald/rust-cpython) • Nonblocking IO using mio (https://github.com/tokio-rs/mio) • Nonblocking read • blocking or nonblocking write • Worker pool based on threadpool (https://docs.rs/threadpool); 1:1 threading • PasteDeploy entry point • integrates with Python logging • asynchronous logging -> no need to hold the GIL when creating the log message • logging configuration in wsgi.ini • TCP or Unix Domain sockets • supports systemd socket activation

Slide 51

Slide 51 text

WSGI Why Rust? Project Status Performance Demo Next steps Performance Pierre Terre / Rabbit Hole, Monarch’s Way / CC BY-SA 2.0 • number of requests/amount of data transferred per unit of time • Testing and eventually improving it

Slide 52

Slide 52 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring

Slide 53

Slide 53 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive

Slide 54

Slide 54 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking?

Slide 55

Slide 55 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box

Slide 56

Slide 56 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker?

Slide 57

Slide 57 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html)

Slide 58

Slide 58 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase()

Slide 59

Slide 59 text

WSGI Why Rust? Project Status Performance Demo Next steps Approach • Static code analyis + refactoring • reminder: pyruvate started as a Hello Rust project • memory allocations are expensive • How to induce socket blocking? • limiting socket buffer sizes of a Vagrant box • Docker? • Flame graphs from perf data (http://www.brendangregg.com/flamegraphs.html) • .to_lower() is much more expensive than .to_ascii_uppercase() • load testing with siege and ab

Slide 60

Slide 60 text

WSGI Why Rust? Project Status Performance Demo Next steps Performance: Design considerations • Python Global Interpreter Lock: Python code can only run when holding the GIL • Multiple worker threads need to acquire the GIL in turn • acquire GIL only for application execution • drop GIL when doing IO • more than one possible way to do this • IO event polling • abstraction: mio Poll instance • accepted connections are registered for read events with a Poll instance in the main thread • completely read requests + connection are passed to the worker pool • iterate over WSGI response chunks (needs GIL) • blocking write: loop until response is completely written • non-blocking write: • write until EAGAIN • register connection for write events with per worker Poll instance • drop GIL, stash response

Slide 61

Slide 61 text

WSGI Why Rust? Project Status Performance Demo Next steps Performance: current status • Lenovo X390 and Vagrant (2 CPU, 2 G RAM, 8K write buffer size limit) • faster than waitress on a Hello world WSGI application • faster that waitress on / (looking at https://zope.readthedocs.io/en/4.x/wsgi.html#test-criteria-for- recommendations) • but slower on /Plone • more performance testing needed

Slide 62

Slide 62 text

WSGI Why Rust? Project Status Performance Demo Next steps Live Demo

Slide 63

Slide 63 text

WSGI Why Rust? Project Status Performance Demo Next steps Release 1.0 • Planned for end of this year • Reuse connections (keep-alive + chunked transport) • Branch on Gitlab, needs some work • MacOS support wanted • optimize pipeline • use a kcov binary package • async logging: thread ID • More testing + bugfixing

Slide 64

Slide 64 text

WSGI Why Rust? Project Status Performance Demo Next steps Thanks for your attention • Thomas Schorr • [email protected] • https://gitlab.com/tschorr/pyruvate • https://pypi.org/project/pyruvate