$30 off During Our Annual Pro Sale. View Details »

Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server

Pyruvate, a reasonably fast, non-blocking, multithreaded WSGI server

Slides from my talk at https://2020.ploneconf.org/
#ploneconf2020 #plone #ploneconf

Thomas Schorr

January 13, 2021
Tweet

More Decks by Thomas Schorr

Other Decks in Programming

Transcript

  1. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate, a reasonably fast, non-blocking,
    multithreaded WSGI server
    Thomas Schorr
    Plone Conference 2020

    View Slide

  2. WSGI Why Rust? Project Status Performance Demo Next steps
    PEP-3333: Python Web Server Gateway Interface
    def application(environ, start_response):
    """Simplest possible WSGI application"""
    status = '200 OK'
    response_headers = [
    ('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [b'Hello World!\n']

    View Slide

  3. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives

    View Slide

  4. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests

    View Slide

  5. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests
    • Single threaded server

    View Slide

  6. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests
    • Single threaded server
    • Spawn a thread for each incoming request

    View Slide

  7. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests
    • Single threaded server
    • Spawn a thread for each incoming request
    • 1:1 threading, 1:n threading

    View Slide

  8. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests
    • Single threaded server
    • Spawn a thread for each incoming request
    • 1:1 threading, 1:n threading
    • maintain a pool of worker threads

    View Slide

  9. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests
    • Single threaded server
    • Spawn a thread for each incoming request
    • 1:1 threading, 1:n threading
    • maintain a pool of worker threads
    • multiprocessing

    View Slide

  10. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests
    • Single threaded server
    • Spawn a thread for each incoming request
    • 1:1 threading, 1:n threading
    • maintain a pool of worker threads
    • multiprocessing
    • ...

    View Slide

  11. WSGI Why Rust? Project Status Performance Demo Next steps
    The Server Side
    • The server invokes the application callable once for each HTTP request it
    receives
    • Many possibilities for handling requests
    • Single threaded server
    • Spawn a thread for each incoming request
    • 1:1 threading, 1:n threading
    • maintain a pool of worker threads
    • multiprocessing
    • ...
    • The WSGI server can give hints through environ dictionary

    View Slide

  12. WSGI Why Rust? Project Status Performance Demo Next steps
    The Application Side
    • often needs to connect to components that outlive the single request

    View Slide

  13. WSGI Why Rust? Project Status Performance Demo Next steps
    The Application Side
    • often needs to connect to components that outlive the single request
    • databases, caches

    View Slide

  14. WSGI Why Rust? Project Status Performance Demo Next steps
    The Application Side
    • often needs to connect to components that outlive the single request
    • databases, caches
    • connection might not be thread safe

    View Slide

  15. WSGI Why Rust? Project Status Performance Demo Next steps
    The Application Side
    • often needs to connect to components that outlive the single request
    • databases, caches
    • connection might not be thread safe
    • connection/setup might be expensive

    View Slide

  16. WSGI Why Rust? Project Status Performance Demo Next steps
    The Application Side
    • often needs to connect to components that outlive the single request
    • databases, caches
    • connection might not be thread safe
    • connection/setup might be expensive
    • all of the above is true for Zope

    View Slide

  17. WSGI Why Rust? Project Status Performance Demo Next steps
    The Application Side
    • often needs to connect to components that outlive the single request
    • databases, caches
    • connection might not be thread safe
    • connection/setup might be expensive
    • all of the above is true for Zope
    • recipe for disaster: choose a WSGI server with an inappropriate worker
    model

    View Slide

  18. WSGI Why Rust? Project Status Performance Demo Next steps
    Consequence: Limited Choice
    of WSGI servers suitable for Zope/Plone.
    • waitress (the default) with very good overall performance
    • bjoern: fast, non-blocking, single threaded
    • ...

    View Slide

  19. WSGI Why Rust? Project Status Performance Demo Next steps
    More options please
    Wishlist:
    • multithreaded, 1:1 threading, workerpool
    • PasteDeploy entry point
    • handle the Zope/Plone use case
    • non-blocking
    • File wrapper supporting sendfile
    • competitive performance
    Non Goals
    • Python 2
    • ASGI (not yet at least)
    • Windows

    View Slide

  20. WSGI Why Rust? Project Status Performance Demo Next steps
    Why Rust?
    Naive expectations:
    • Faster than Python
    • Easier to use than C

    View Slide

  21. WSGI Why Rust? Project Status Performance Demo Next steps
    Performance
    Performance
    Emmerich, P. et al (2019): The Case for Writing Network Drivers in High-Level Programming Languages. -
    https://www.net.in.tum.de/fileadmin/bibtex/publications/papers/the-case-for-writing-network-drivers-in-high-level-languages.pdf
    .

    View Slide

  22. WSGI Why Rust? Project Status Performance Demo Next steps
    Memory Management through Ownership
    • feature unique to Rust
    • a set of rules that the compiler checks at compile time
    (https://doc.rust-lang.org/book/ch04-01-what-is-ownership.html)
    • Each value in Rust has a variable that’s called it’s owner.
    • There can be only one owner at a time.
    • When the owner goes out of scope, the value will be dropped.
    • Drop is a trait; there’s a default implementation that you can override
    • You can still control where (stack or heap) your data is stored.

    View Slide

  23. WSGI Why Rust? Project Status Performance Demo Next steps
    How is that relevant?
    Example: interfacing with Python
    • Python memory management: reference counting + garbage collection
    • association: increasing an objects’ refcount using Py_INCREF
    • should match with corresponding Py_DECREF invocations
    • garbage collection when object refcount goes to 0
    • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps

    View Slide

  24. WSGI Why Rust? Project Status Performance Demo Next steps
    How is that relevant?
    Example: interfacing with Python
    • Python memory management: reference counting + garbage collection
    • association: increasing an objects’ refcount using Py_INCREF
    • should match with corresponding Py_DECREF invocations
    • garbage collection when object refcount goes to 0
    • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
    • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
    zope.interface (50 Py_DECREF)

    View Slide

  25. WSGI Why Rust? Project Status Performance Demo Next steps
    How is that relevant?
    Example: interfacing with Python
    • Python memory management: reference counting + garbage collection
    • association: increasing an objects’ refcount using Py_INCREF
    • should match with corresponding Py_DECREF invocations
    • garbage collection when object refcount goes to 0
    • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
    • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
    zope.interface (50 Py_DECREF)
    • 1 Py_INCREF in rust-cpython (4 Py_DECREF)

    View Slide

  26. WSGI Why Rust? Project Status Performance Demo Next steps
    How is that relevant?
    Example: interfacing with Python
    • Python memory management: reference counting + garbage collection
    • association: increasing an objects’ refcount using Py_INCREF
    • should match with corresponding Py_DECREF invocations
    • garbage collection when object refcount goes to 0
    • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
    • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
    zope.interface (50 Py_DECREF)
    • 1 Py_INCREF in rust-cpython (4 Py_DECREF)
    • very hard to create a mismatch of Py_INCREF/Py_DECREF
    invocations, making it harder to create memory leaks or core dumps

    View Slide

  27. WSGI Why Rust? Project Status Performance Demo Next steps
    How is that relevant?
    Example: interfacing with Python
    • Python memory management: reference counting + garbage collection
    • association: increasing an objects’ refcount using Py_INCREF
    • should match with corresponding Py_DECREF invocations
    • garbage collection when object refcount goes to 0
    • Py_INCREF/Py_DECREF mismatch: memory leaks, core dumps
    • 63 occurences of Py_INCREF in BTrees (79 Py_DECREF), 19 in
    zope.interface (50 Py_DECREF)
    • 1 Py_INCREF in rust-cpython (4 Py_DECREF)
    • very hard to create a mismatch of Py_INCREF/Py_DECREF
    invocations, making it harder to create memory leaks or core dumps
    • still possible to create more references than needed

    View Slide

  28. WSGI Why Rust? Project Status Performance Demo Next steps
    Other Rust features
    • strict typing will find many problems at compile time
    • Pattern matching
    • very good documentation, helpful compiler messages

    View Slide

  29. WSGI Why Rust? Project Status Performance Demo Next steps
    What is Pyruvate from a user perspective
    • a package available from PyPI:

    View Slide

  30. WSGI Why Rust? Project Status Performance Demo Next steps
    What is Pyruvate from a user perspective
    • a package available from PyPI:
    pip install pyruvate

    View Slide

  31. WSGI Why Rust? Project Status Performance Demo Next steps
    What is Pyruvate from a user perspective
    • a package available from PyPI:
    pip install pyruvate
    • an importable Python module:

    View Slide

  32. WSGI Why Rust? Project Status Performance Demo Next steps
    What is Pyruvate from a user perspective
    • a package available from PyPI:
    pip install pyruvate
    • an importable Python module:
    import pyruvate
    def application(environ, start_response):
    """WSGI application"""
    ...
    pyruvate.serve(application, '0.0.0.0:7878', 3)

    View Slide

  33. WSGI Why Rust? Project Status Performance Demo Next steps
    Using Pyruvate with Zope/Plone
    with plone.recipe.zope2instance:
    • buildout.cfg
    [instance]
    recipe = plone.recipe.zope2instance
    http-address = 127.0.0.1:8080
    eggs =
    Plone
    pyruvate
    wsgi-ini-template = ${buildout:directory}/
    templates/pyruvate.ini.in
    • pyruvate.ini.in Template
    [server:main]
    use = egg:pyruvate#main
    socket = %(http_address)s
    workers = 2

    View Slide

  34. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate project structure
    • initially created with cargo new --lib

    View Slide

  35. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate project structure
    • initially created with cargo new --lib
    • Rust sources in src folder

    View Slide

  36. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate project structure
    • initially created with cargo new --lib
    • Rust sources in src folder
    • Cargo.toml pulls Rust dependencies

    View Slide

  37. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate project structure
    • initially created with cargo new --lib
    • Rust sources in src folder
    • Cargo.toml pulls Rust dependencies
    • setup.py
    • uses setuptools_rust to build a
    RustExtension
    • defines PasteDeploy entry point

    View Slide

  38. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate project structure
    • initially created with cargo new --lib
    • Rust sources in src folder
    • Cargo.toml pulls Rust dependencies
    • setup.py
    • uses setuptools_rust to build a
    RustExtension
    • defines PasteDeploy entry point
    • pyproject.toml to specify build system
    requirements (PEP 518)

    View Slide

  39. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate project structure
    • initially created with cargo new --lib
    • Rust sources in src folder
    • Cargo.toml pulls Rust dependencies
    • setup.py
    • uses setuptools_rust to build a
    RustExtension
    • defines PasteDeploy entry point
    • pyproject.toml to specify build system
    requirements (PEP 518)
    • tests folder containing (currently only) Python
    tests (unit tests in Rust modules)

    View Slide

  40. WSGI Why Rust? Project Status Performance Demo Next steps
    Pyruvate project structure
    • initially created with cargo new --lib
    • Rust sources in src folder
    • Cargo.toml pulls Rust dependencies
    • setup.py
    • uses setuptools_rust to build a
    RustExtension
    • defines PasteDeploy entry point
    • pyproject.toml to specify build system
    requirements (PEP 518)
    • tests folder containing (currently only) Python
    tests (unit tests in Rust modules)
    • __init__.py in pyruvate folder
    • Paste Deploy entry point
    • FileWrapper import

    View Slide

  41. WSGI Why Rust? Project Status Performance Demo Next steps
    Gitlab Pipeline
    • Two stages: test + build

    View Slide

  42. WSGI Why Rust? Project Status Performance Demo Next steps
    Gitlab Pipeline
    • Two stages: test + build
    • Linting: rustfmt, clippy

    View Slide

  43. WSGI Why Rust? Project Status Performance Demo Next steps
    Gitlab Pipeline
    • Two stages: test + build
    • Linting: rustfmt, clippy
    • cargo test

    View Slide

  44. WSGI Why Rust? Project Status Performance Demo Next steps
    Gitlab Pipeline
    • Two stages: test + build
    • Linting: rustfmt, clippy
    • cargo test
    • coverage report using kcov, uploaded to
    https://codecov.io

    View Slide

  45. WSGI Why Rust? Project Status Performance Demo Next steps
    Gitlab Pipeline
    • Two stages: test + build
    • Linting: rustfmt, clippy
    • cargo test
    • coverage report using kcov, uploaded to
    https://codecov.io
    • Python integration tests with tox

    View Slide

  46. WSGI Why Rust? Project Status Performance Demo Next steps
    Gitlab Pipeline
    • Two stages: test + build
    • Linting: rustfmt, clippy
    • cargo test
    • coverage report using kcov, uploaded to
    https://codecov.io
    • Python integration tests with tox
    • build wheels

    View Slide

  47. WSGI Why Rust? Project Status Performance Demo Next steps
    Binary packages
    • manylinux2010 wheels for Python 3.6-3.9
    • switched from manylinux1 after stable Rust stopped supporting the old
    ABI (ELF file OS ABI invalid error when loading rust shared libraries)
    1.47.0
    • manylinux2010 needs recent pip and setuptools versions

    View Slide

  48. WSGI Why Rust? Project Status Performance Demo Next steps
    Binary packages
    • manylinux2010 wheels for Python 3.6-3.9
    • switched from manylinux1 after stable Rust stopped supporting the old
    ABI (ELF file OS ABI invalid error when loading rust shared libraries)
    1.47.0
    • manylinux2010 needs recent pip and setuptools versions
    • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust)
    • setuptools >= 42.0.0 (when using zc.buildout)

    View Slide

  49. WSGI Why Rust? Project Status Performance Demo Next steps
    Binary packages
    • manylinux2010 wheels for Python 3.6-3.9
    • switched from manylinux1 after stable Rust stopped supporting the old
    ABI (ELF file OS ABI invalid error when loading rust shared libraries)
    1.47.0
    • manylinux2010 needs recent pip and setuptools versions
    • pip >= 19.0 if pip prefers sdist over wheel (and there’s no Rust)
    • setuptools >= 42.0.0 (when using zc.buildout)
    • wanted: MacOS

    View Slide

  50. WSGI Why Rust? Project Status Performance Demo Next steps
    Features
    • rust-cpython based Python interface
    (https://github.com/dgrunwald/rust-cpython)
    • Nonblocking IO using mio (https://github.com/tokio-rs/mio)
    • Nonblocking read
    • blocking or nonblocking write
    • Worker pool based on threadpool (https://docs.rs/threadpool); 1:1
    threading
    • PasteDeploy entry point
    • integrates with Python logging
    • asynchronous logging -> no need to hold the GIL when creating the log
    message
    • logging configuration in wsgi.ini
    • TCP or Unix Domain sockets
    • supports systemd socket activation

    View Slide

  51. WSGI Why Rust? Project Status Performance Demo Next steps
    Performance
    Pierre Terre / Rabbit Hole, Monarch’s Way / CC BY-SA 2.0
    • number of requests/amount of
    data transferred per unit of time
    • Testing and eventually
    improving it

    View Slide

  52. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring

    View Slide

  53. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring
    • reminder: pyruvate started as a Hello Rust project
    • memory allocations are expensive

    View Slide

  54. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring
    • reminder: pyruvate started as a Hello Rust project
    • memory allocations are expensive
    • How to induce socket blocking?

    View Slide

  55. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring
    • reminder: pyruvate started as a Hello Rust project
    • memory allocations are expensive
    • How to induce socket blocking?
    • limiting socket buffer sizes of a Vagrant box

    View Slide

  56. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring
    • reminder: pyruvate started as a Hello Rust project
    • memory allocations are expensive
    • How to induce socket blocking?
    • limiting socket buffer sizes of a Vagrant box
    • Docker?

    View Slide

  57. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring
    • reminder: pyruvate started as a Hello Rust project
    • memory allocations are expensive
    • How to induce socket blocking?
    • limiting socket buffer sizes of a Vagrant box
    • Docker?
    • Flame graphs from perf data
    (http://www.brendangregg.com/flamegraphs.html)

    View Slide

  58. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring
    • reminder: pyruvate started as a Hello Rust project
    • memory allocations are expensive
    • How to induce socket blocking?
    • limiting socket buffer sizes of a Vagrant box
    • Docker?
    • Flame graphs from perf data
    (http://www.brendangregg.com/flamegraphs.html)
    • .to_lower() is much more expensive than
    .to_ascii_uppercase()

    View Slide

  59. WSGI Why Rust? Project Status Performance Demo Next steps
    Approach
    • Static code analyis + refactoring
    • reminder: pyruvate started as a Hello Rust project
    • memory allocations are expensive
    • How to induce socket blocking?
    • limiting socket buffer sizes of a Vagrant box
    • Docker?
    • Flame graphs from perf data
    (http://www.brendangregg.com/flamegraphs.html)
    • .to_lower() is much more expensive than
    .to_ascii_uppercase()
    • load testing with siege and ab

    View Slide

  60. WSGI Why Rust? Project Status Performance Demo Next steps
    Performance: Design considerations
    • Python Global Interpreter Lock: Python code can only run when holding
    the GIL
    • Multiple worker threads need to acquire the GIL in turn
    • acquire GIL only for application execution
    • drop GIL when doing IO
    • more than one possible way to do this
    • IO event polling
    • abstraction: mio Poll instance
    • accepted connections are registered for read events with a Poll instance
    in the main thread
    • completely read requests + connection are passed to the worker pool
    • iterate over WSGI response chunks (needs GIL)
    • blocking write: loop until response is completely written
    • non-blocking write:
    • write until EAGAIN
    • register connection for write events with per worker Poll instance
    • drop GIL, stash response

    View Slide

  61. WSGI Why Rust? Project Status Performance Demo Next steps
    Performance: current status
    • Lenovo X390 and Vagrant (2 CPU, 2 G RAM, 8K write buffer size limit)
    • faster than waitress on a Hello world WSGI application
    • faster that waitress on / (looking at
    https://zope.readthedocs.io/en/4.x/wsgi.html#test-criteria-for-
    recommendations)
    • but slower on /Plone
    • more performance testing needed

    View Slide

  62. WSGI Why Rust? Project Status Performance Demo Next steps
    Live Demo

    View Slide

  63. WSGI Why Rust? Project Status Performance Demo Next steps
    Release 1.0
    • Planned for end of this year
    • Reuse connections (keep-alive + chunked transport)
    • Branch on Gitlab, needs some work
    • MacOS support wanted
    • optimize pipeline
    • use a kcov binary package
    • async logging: thread ID
    • More testing + bugfixing

    View Slide

  64. WSGI Why Rust? Project Status Performance Demo Next steps
    Thanks for your attention
    • Thomas Schorr
    [email protected]
    • https://gitlab.com/tschorr/pyruvate
    • https://pypi.org/project/pyruvate

    View Slide