$30 off During Our Annual Pro Sale. View Details »

Production-grade Packaging with Anaconda

Production-grade Packaging with Anaconda

Python's packaging vs. packaging Python. Challenges in shipping Python products, and how Anaconda fits in.

Mahmoud Hashemi

April 10, 2018
Tweet

More Decks by Mahmoud Hashemi

Other Decks in Programming

Transcript

  1. Production-grade
    Packaging with Anac nda
    Mahmoud Hashemi - AnacondaCon 2018

    View Slide

  2. Packaging
    Turning code into a deployable archive.
    But how?
    2

    View Slide


  3. TBD
    — Packaging section,
    my first design doc
    3

    View Slide

  4. Developer naïveté
    ● “Packaging is just the last step”
    ● “We’ll just fudge something at the end and take the B”
    ● “If I can build it, someone will ship it”
    4
    A few rookie mistakes:

    View Slide


  5. “The first 90% of the code accounts for the first 90%
    of development time.
    The last 10% of the code accounts for the other 90%
    of development time.”
    — Tom Cargill, Bell Labs
    (probably talking about packaging)
    5
    The
    Rule

    View Slide

  6. Production-first
    development
    6
    1

    View Slide

  7. Where’s the code going?
    Think distance and target environment before writing anything.
    7

    View Slide

  8. Python Packaging revue
    Best practices, I googled them for you.
    2

    View Slide

  9. Standalone modules
    ● A .py file is a module
    ● Standalone: only imports from the standard library
    ● schema, ashes, boltons, bottle.py
    ● Targets Python: easy to distribute and integrate
    ○ “vendoring”
    9
    The smallest unit of Python

    View Slide

  10. Pure-Python package
    ● A directory full of .py files is a “package”
    ● Generally includes an __init__.py
    ● Django, requests, hyperlink, face
    ● Easy to install with pip
    ○ pip installs packages, after all, right?
    10
    The molecule to the module’s atom

    View Slide

  11. Pardon my dist
    ● Proper packages need to be single redistributable archives
    ● A distribution is an archive of zero or more packages
    ● Motivational case studies: PIL & Pillow. PyCrypto(dome).
    ● Built by setuptools through setup.py
    ○ Example: sdist
    ○ Simple, source-only .tar.gz
    11
    Packages are not proper packages.

    View Slide

  12. The full package
    ● Python s interoperability & performance
    ● Pillow, gevent, lxml
    ● wheel
    ○ The shiny new binary distribution, or bdist
    ○ Supports most Windows, Mac, & Linux
    ○ No compiler necessary
    12
    Python is more than just .py files.

    View Slide

  13. 13
    python setup.py sdist bdist_wheel upload
    The modern way to build and upload a Python package
    sum()

    View Slide

  14. 14
    Questions?
    Thanks for coming!
    Show’s over, check out my projects:
    github.com/mahmoud

    View Slide

  15. Big questions!
    Packaging is never as easy as the docs make it out to be.
    15

    View Slide

  16. System libraries?
    ● Prebuilt standard libraries managed through the OS
    ○ .dll and .so
    ○ libcrypto (OpenSSL), libxml2, libpng, etc.
    ● Static vs dynamic linking
    ○ Big wheels
    ● conda!
    16
    The world is not written in Python (yet).

    View Slide

  17. Python’s
    Packaging
    17
    Packaging
    Python
    <
    Seems like...

    View Slide

  18. Basically...
    18
    1. .py - standalone modules
    2. sdist - Pure-Python packages
    3. wheel - Python packages
    4. conda - Python + system libraries
    (With room to spare for static vs. dynamic linking)
    1 2 3 4
    But wait...

    View Slide

  19. Production
    You can’t ship to production without a product!
    19

    View Slide

  20. Libraries
    What developers work on and with
    day in and day out.
    Rarely a product (maybe SDKs)
    There are two
    kinds of packages
    Applications
    Services and other products with
    high-level, non-code interfaces.
    Most products.
    20

    View Slide

  21. 21
    PyPI is not an app store
    Python is general-purpose, but pip and PyPI are not

    View Slide

  22. Don’t pip install prod
    ● pip requires developer attention to debug and resolve
    ● No dependency resolution or transactional installs
    ● Even dev tools work better with pipsi
    ○ Every application deserves its own env
    22
    Especially not from PyPI

    View Slide

  23. Shipping product
    23
    1. PEX - Python libraries included
    2. anaconda - Python ecosystem
    3. freezers - Python included
    4. images - system libraries included
    5. containers - sandboxed images
    6. virtual machines - kernel included
    7. hardware - plug and play appliances
    1 2 3 4 5 6 7
    Summarizing packaging for Python applications

    View Slide

  24. Shipping product
    24
    1. PEX - libraries included
    2. anaconda - Python ecosystem
    3. freezers - Python included
    4. images - system libraries included
    5. containers - sandboxed images
    6. virtual machines - kernel included
    7. hardware - plug and play appliances
    Summarizing packaging for Python applications
    1 2 5 6 7
    4
    3
    http://sedimental.org/talks.html
    for the rest.

    View Slide

  25. Putting Anaconda
    into Production
    Bigger stakes, bigger snakes.
    3

    View Slide

  26. The PayPal story
    ● Started in 2009, grew to a team in 2011
    ● 30+ midtier apps, services, and batch jobs
    ○ Max single service volume: 1.2 billion reqs/day (2016)
    ○ Max single service throughput: 10,000 reqs/sec/worker (2016)
    ● Multiprotocol, service-focused, gevent-based framework
    ● Hundreds of users, almost all grassroots
    26
    http://github.com/paypal/support
    Wonderfully, spontaneously Python.

    View Slide

  27. More environments than any other stack.
    (8 environments x 10 binary libraries) / 5 team members =
    ∞ static builds.
    27
    PayPal Python
    Environment Support Matrix (2014)
    Operating System Architecture Python Version
    Linux 32-bit/64-bit 2.6/2.7
    Mac 64-bit 2.7
    Solaris 32-bit 2.7
    Windows 32-bit 2.7

    View Slide

  28. cross-platform
    package manager?
    They’re a very rare breed!
    28
    If only there was some sort of...

    View Slide

  29. 29
    h!

    View Slide

  30. 80% environment coverage x 500+ packages =
    ∞% better
    30
    Anaconda
    Environment Support Matrix (2015)
    Operating System Architecture Python Version
    Linux 32-bit/64-bit 2.7-3.x
    Mac 64-bit 2.7-3.x
    Windows 32-bit/64-bit 2.7-3.x

    View Slide

  31. Anaconda in PayPal LIVE
    ● PayPal was on RHEL5
    ○ Python 2.4
    ○ Plus 2.6-2.7 (sort of)
    ● No Anaconda. Couldn’t just target conda install
    ● We could bring our own by putting Miniconda in an RPM...
    31
    Production-first development in practice.

    View Slide

  32. 3-steps to a conda RPM
    1. With requirements in hand, conda install --download-only
    2. Miniconda + .tar.bz2 archives into the RPM
    3. Run a tiny installer script in the RPM postinstall section
    Ready to test and deploy!
    https://www.paypal-engineering.com/2016/09/07/python-packaging-at-paypal/
    https://github.com/paypal/support/blob/master/examples/miniconda
    32
    One way to box a snake:

    View Slide

  33. 3+ ways to RPM conda
    1. https://github.com/ImmobilienScout24/snakepit
    2. https://github.com/pelson/conda-rpms
    3. https://github.com/jcrist/conda-pack
    RPM yourself a conda, today!
    (Also worth a look: https://github.com/conda/constructor )
    33
    The innovation never stops:

    View Slide

  34. Putting Anaconda
    into Containerized Envs
    RPMs and debs are so 90s.
    4

    View Slide

  35. Anaconda internals
    ● Built on OS and Python features
    ● Userspace Filesystem layout
    ● Python landmarking
    ● Paths and linking
    ● PatchELF
    35
    Anaconda’s internal layout (lib, include, bin, etc)
    A whole new ecosystem.

    View Slide

  36. Userspace images
    Anaconda meets old autorun CDs
    ● Your userspace in its own partition!
    ● Literally v1 was ISO9660
    ● E.g., AppImage / kdenlive
    36
    Even trendier than selfies!
    Anaconda meets old autorun CDs
    https://github.com/AppImage/AppImageKit
    https://github.com/appimage-packages/kdenlive

    View Slide

  37. “Containers”
    Reusable and disposable, but check the seal...
    ● Userspace images
    ○ + sandboxing
    ○ + distribution
    ● Flatpak & Snappy
    ○ .deb vs. .rpm round 2
    ● Docker / Moby
    37

    View Slide

  38. The shopkick story
    ● 150k commits and 12+ services
    ● CentOS 6 + Python 2.6 + LXC
    ● 100% Mac local dev (iOS)
    One mission: Upgrade stack from 2009 to 2017.
    38
    Legacy environment and codebase.

    View Slide

  39. Local production
    Closing the rift.
    ● Production-first: Better to target Linux than MacOS
    ● Docker’s native MacOS xhyve virtualization
    ○ Docker Machine (virtualbox)
    ○ Minikube
    ● Avoid lethal exposure of Docker
    ● Account for new tools: GitLab & DCOS (later k8s)
    39

    View Slide

  40. OpenSky
    Big, blue, and cloud-ready.
    ● Specify a legacy service into an images for local, stage, and prod
    ● Library dependencies from: yum, conda, pip, and sky (git repo)
    ● Service deps specified in terms of docker images
    ● Plugins + core shipped as PEX to local developers
    ● Open-sourced just for you: https://github.com/shopkick/sky
    40

    View Slide

  41. docker + conda
    Conda just works, despite docker headaches
    ● Bring your own Miniconda
    ● Custom installer script that calls conda
    ○ docker build instrumentation severely lacking. Layer proliferation.
    ● Tips:
    ○ Run after yum, before pip
    ○ Size is always an issue: nomkl unless you really need the speed, conda clean.
    ○ --no-channel-priority useful for mixing and matching
    41

    View Slide

  42. Wrapping up
    One neat package.
    ● Production-first development
    ● Leverage the ecosystem
    ● Conda in OS packages works
    ● Conda in containers works
    ● Simple is better than complex
    42

    View Slide

  43. 43
    Thanks!
    Questions?
    Show’s over for real this time, slides & more:
    sedimental.org/talks.html
    @mhashemi

    View Slide