Production-grade Packaging with Anaconda

Production-grade Packaging with Anaconda

Python's packaging vs. packaging Python. Challenges in shipping Python products, and how Anaconda fits in.

B4bbc497062643a8913884e7aba305f2?s=128

Mahmoud Hashemi

April 10, 2018
Tweet

Transcript

  1. 4.

    Developer naïveté • “Packaging is just the last step” •

    “We’ll just fudge something at the end and take the B” • “If I can build it, someone will ship it” 4 A few rookie mistakes:
  2. 5.

    “ “The first 90% of the code accounts for the

    first 90% of development time. The last 10% of the code accounts for the other 90% of development time.” — Tom Cargill, Bell Labs (probably talking about packaging) 5 The Rule
  3. 9.

    Standalone modules • A .py file is a module •

    Standalone: only imports from the standard library • schema, ashes, boltons, bottle.py • Targets Python: easy to distribute and integrate ◦ “vendoring” 9 The smallest unit of Python
  4. 10.

    Pure-Python package • A directory full of .py files is

    a “package” • Generally includes an __init__.py • Django, requests, hyperlink, face • Easy to install with pip ◦ pip installs packages, after all, right? 10 The molecule to the module’s atom
  5. 11.

    Pardon my dist • Proper packages need to be single

    redistributable archives • A distribution is an archive of zero or more packages • Motivational case studies: PIL & Pillow. PyCrypto(dome). • Built by setuptools through setup.py ◦ Example: sdist ◦ Simple, source-only .tar.gz 11 Packages are not proper packages.
  6. 12.

    The full package • Python s interoperability & performance •

    Pillow, gevent, lxml • wheel ◦ The shiny new binary distribution, or bdist ◦ Supports most Windows, Mac, & Linux ◦ No compiler necessary 12 Python is more than just .py files.
  7. 13.

    13 python setup.py sdist bdist_wheel upload The modern way to

    build and upload a Python package sum()
  8. 16.

    System libraries? • Prebuilt standard libraries managed through the OS

    ◦ .dll and .so ◦ libcrypto (OpenSSL), libxml2, libpng, etc. • Static vs dynamic linking ◦ Big wheels • conda! 16 The world is not written in Python (yet).
  9. 18.

    Basically... 18 1. .py - standalone modules 2. sdist -

    Pure-Python packages 3. wheel - Python packages 4. conda - Python + system libraries (With room to spare for static vs. dynamic linking) 1 2 3 4 But wait...
  10. 20.

    Libraries What developers work on and with day in and

    day out. Rarely a product (maybe SDKs) There are two kinds of packages Applications Services and other products with high-level, non-code interfaces. Most products. 20
  11. 22.

    Don’t pip install prod • pip requires developer attention to

    debug and resolve • No dependency resolution or transactional installs • Even dev tools work better with pipsi ◦ Every application deserves its own env 22 Especially not from PyPI
  12. 23.

    Shipping product 23 1. PEX - Python libraries included 2.

    anaconda - Python ecosystem 3. freezers - Python included 4. images - system libraries included 5. containers - sandboxed images 6. virtual machines - kernel included 7. hardware - plug and play appliances 1 2 3 4 5 6 7 Summarizing packaging for Python applications
  13. 24.

    Shipping product 24 1. PEX - libraries included 2. anaconda

    - Python ecosystem 3. freezers - Python included 4. images - system libraries included 5. containers - sandboxed images 6. virtual machines - kernel included 7. hardware - plug and play appliances Summarizing packaging for Python applications 1 2 5 6 7 4 3 http://sedimental.org/talks.html for the rest.
  14. 26.

    The PayPal story • Started in 2009, grew to a

    team in 2011 • 30+ midtier apps, services, and batch jobs ◦ Max single service volume: 1.2 billion reqs/day (2016) ◦ Max single service throughput: 10,000 reqs/sec/worker (2016) • Multiprotocol, service-focused, gevent-based framework • Hundreds of users, almost all grassroots 26 http://github.com/paypal/support Wonderfully, spontaneously Python.
  15. 27.

    More environments than any other stack. (8 environments x 10

    binary libraries) / 5 team members = ∞ static builds. 27 PayPal Python Environment Support Matrix (2014) Operating System Architecture Python Version Linux 32-bit/64-bit 2.6/2.7 Mac 64-bit 2.7 Solaris 32-bit 2.7 Windows 32-bit 2.7
  16. 29.
  17. 30.

    80% environment coverage x 500+ packages = ∞% better 30

    Anaconda Environment Support Matrix (2015) Operating System Architecture Python Version Linux 32-bit/64-bit 2.7-3.x Mac 64-bit 2.7-3.x Windows 32-bit/64-bit 2.7-3.x
  18. 31.

    Anaconda in PayPal LIVE • PayPal was on RHEL5 ◦

    Python 2.4 ◦ Plus 2.6-2.7 (sort of) • No Anaconda. Couldn’t just target conda install • We could bring our own by putting Miniconda in an RPM... 31 Production-first development in practice.
  19. 32.

    3-steps to a conda RPM 1. With requirements in hand,

    conda install --download-only 2. Miniconda + .tar.bz2 archives into the RPM 3. Run a tiny installer script in the RPM postinstall section Ready to test and deploy! https://www.paypal-engineering.com/2016/09/07/python-packaging-at-paypal/ https://github.com/paypal/support/blob/master/examples/miniconda 32 One way to box a snake:
  20. 33.

    3+ ways to RPM conda 1. https://github.com/ImmobilienScout24/snakepit 2. https://github.com/pelson/conda-rpms 3.

    https://github.com/jcrist/conda-pack RPM yourself a conda, today! (Also worth a look: https://github.com/conda/constructor ) 33 The innovation never stops:
  21. 35.

    Anaconda internals • Built on OS and Python features •

    Userspace Filesystem layout • Python landmarking • Paths and linking • PatchELF 35 Anaconda’s internal layout (lib, include, bin, etc) A whole new ecosystem.
  22. 36.

    Userspace images Anaconda meets old autorun CDs • Your userspace

    in its own partition! • Literally v1 was ISO9660 • E.g., AppImage / kdenlive 36 Even trendier than selfies! Anaconda meets old autorun CDs https://github.com/AppImage/AppImageKit https://github.com/appimage-packages/kdenlive
  23. 37.

    “Containers” Reusable and disposable, but check the seal... • Userspace

    images ◦ + sandboxing ◦ + distribution • Flatpak & Snappy ◦ .deb vs. .rpm round 2 • Docker / Moby 37
  24. 38.

    The shopkick story • 150k commits and 12+ services •

    CentOS 6 + Python 2.6 + LXC • 100% Mac local dev (iOS) One mission: Upgrade stack from 2009 to 2017. 38 Legacy environment and codebase.
  25. 39.

    Local production Closing the rift. • Production-first: Better to target

    Linux than MacOS • Docker’s native MacOS xhyve virtualization ◦ Docker Machine (virtualbox) ◦ Minikube • Avoid lethal exposure of Docker • Account for new tools: GitLab & DCOS (later k8s) 39
  26. 40.

    OpenSky Big, blue, and cloud-ready. • Specify a legacy service

    into an images for local, stage, and prod • Library dependencies from: yum, conda, pip, and sky (git repo) • Service deps specified in terms of docker images • Plugins + core shipped as PEX to local developers • Open-sourced just for you: https://github.com/shopkick/sky 40
  27. 41.

    docker + conda Conda just works, despite docker headaches •

    Bring your own Miniconda • Custom installer script that calls conda ◦ docker build instrumentation severely lacking. Layer proliferation. • Tips: ◦ Run after yum, before pip ◦ Size is always an issue: nomkl unless you really need the speed, conda clean. ◦ --no-channel-priority useful for mixing and matching 41
  28. 42.

    Wrapping up One neat package. • Production-first development • Leverage

    the ecosystem • Conda in OS packages works • Conda in containers works • Simple is better than complex 42
  29. 43.

    43 Thanks! Questions? Show’s over for real this time, slides

    & more: sedimental.org/talks.html @mhashemi