Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2016 - Cindy Sridharan - The Python Deployment Albatross

PyBay
December 26, 2016

2016 - Cindy Sridharan - The Python Deployment Albatross

A history of Python packaging and distribution as well as deployment

PyBay

December 26, 2016
Tweet

More Decks by PyBay

Other Decks in Technology

Transcript

  1. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • nix + conda
  2. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • nix + conda
  3. What is Python? python hello_world.py Python – or /usr/bin/python –

    as your system understands it, is a program called the interpreter
  4. What is Python? • Python can be invoked via a

    script, by calling the direct interpreter with -c or -m • When Python is invoked with –c, the program passed in as string (terminates option list) • python –c “import datetime; print datetime.datetime.now()”
  5. Running a Python program • Source code -> bytecode ->

    virtual machine • Bytecode compilation generates .pyc files • If the Python process has write access, .pyc files are stored on the filesystem – else in memory
  6. Running a Python program • Bytecode != machine code •

    Python runtime = bytecode compiler + virtual machine • The Python virtual machine is a loop that iterates through bytecode instructions to carry out the instructions • No build or make step is required to run a Python program
  7. Python import system • A file ending in .py is

    a module • In Python 2.7, a collection of modules under one directory with an __init__.py is considered a package • Even if there is no initialization code to run when the package is imported, an empty __init__.py file is still needed for the interpreter to find any modules or subpackages in that directory. • http://bit.ly/1hGmaAN
  8. Python import system • PYTHONPATH – augments the default search

    path for module files • PYTHONHOME – change the location of the standard Python libraries – can be set to a single directory • In Python 3.3, any directory on sys.path with a name that matches the package name being looked for will be recognized as contributing modules and subpackages to that package.
  9. Python “packaging” • In Python 2.7, a collection of modules

    under one directory with an __init__.py is considered a package • On Linux based systems, software is treated as a collection of well defined units called packages • Package from the OS perspective is an archive file along with the dependencies, version number, name, vendor, checksum etc. • In Python, what an OS calls a package is called a distribution • package(distribution) -> build -> install -> execute
  10. Python import system… for directories • Any directory with an

    __init__.py is considered a package • Any directory with a __main__.py is treated as an executable • Typing python –m package will execute package/__main__.py if it exists
  11. Python import system… for zipfiles • The zipimport module provides

    a default import hook for Python >=2.4 • If a zipfile (in either source or compiled form) has an __init__.py file, it’s considered a package • If a zipfile has a __main__.py file, it’s considered an executable
  12. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • virtualenv • pex + pants • Docker • nix + conda
  13. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • Distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • nix + conda
  14. distutils • distutils shipped with Python in 1998 • distutils

    can make tarballs of Python code and knows how to invoke compilers • A setup.py file has a call to distutils’ main entry point – the setup function • setup.py file can build, distribute, publish or install … • … which some perceive as a flaw, since everything is bundled together in setup.py
  15. setuptools • setuptools is more fully featured than distutils but

    doesn’t ship with Python • setuptools introduced easy_install, which has been replaced by pip • setuptools monkeypatches distutils • For setuptools to work with an existing setup.py, edit the target package's setup.py and add from setuptools import setup • Doing this replaces the existing import of the setup function
  16. pip • installer/package manager introduced in 2008 to download packages

    from PyPI • vastly better than easy_install • pip can do operations like list, upgrade etc. • pip ships with wheels support and can build wheels and cache them • Python 3.4 ships with pip
  17. virtualenv • Isolates project dependencies • Isolates system dependencies pip

    install –user –upgrade virtualenv virtualenv .venv • virtualenv is a part of Python 3.3
  18. Benefits of virtualenv Allows multiple Python projects that have different

    (and often conflicting) requirements, to coexist on the same computer, including a copy of : 1. the Python binary 2. the entire Python standard library 3. the pip installer 4. site-packages directory
  19. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • nix + conda
  20. Disadvantages of sdists • Closely couples build systems and installers

    • Run arbitrary code to build and install and recompile • Need to recompile code to create a new virtualenv • Ergo slow • Hard to maintain • Defined by and require distutils/setuptools
  21. bdists • Distribution with files + metadata that only need

    to be moved to the correct location on the target host • Doesn’t require a build step – can be installed directly on the host with an installer like pip • Python files do not have to be precompiled
  22. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • nix + conda
  23. eggs • Distribution format of setuptools generated packages • No

    build or install step is required, just put them on PYTHONPATH or sys.path and import them • Key principle of eggs is that they should be discoverable and importable.
  24. eggs There are two basic formats currently implemented for Python

    eggs: 1. .egg format: a directory or zipfile containing the project’s code and resources, along with an EGG-INFO subdirectory that contains the project’s metadata 2. .egg-info format: a file or directory placed adjacent to the project’s code and resources, that directly contains the project’s metadata.
  25. wheels • New standard for Python distribution • Zip file

    with a specially formatted file name and the .whl extension • No build step required • No build step == no build system (C compilers etc.)
  26. wheels • Supported by pip >=1.4 and setuptools >=0.8 •

    pip builds and caches wheels by default. • Unzipped into site-packages or sys.path • Retains enough information to be later moved to the final paths
  27. wheels • Amortizes compile times over many installations • Creating

    virtual environments is even cheaper since tearing down an environment and building a new one will not require recompilation.
  28. Advantages of wheels • No build step == no build

    system (C compilers etc.) • No arbitrary code execution for installation – no setup.py • No arbitrary code execution == faster installation for pure Python and native C extension packages • Creates .pyc files during installation to match the Python interpreter used
  29. Advantages of wheels • More consistent across platforms and machines

    • Less dependent on system Python so long as it doesn’t ship with extensions that link to libpython • With manylinux, possible to distribute wheels for Linux platforms*
  30. How to create and upload wheels If package is not

    using 2to3: 1. pip install wheel 2. pip install twine 3. python setup.py sdist bdist_wheel 4. twine upload dist/*
  31. Universal Wheels • If project is Python 2/3 compatible, can

    create a universal wheel • To create a universal wheel, create a setup.cfg file with [bdist_wheel] Universal=1 • Don’t push universal wheels for a project with C extensions as pip will prefer this version over source
  32. Wheel file format • {distrib}-{version}-{optional-build}-{python}-{ABI}- {platform}.whl pybay-1.0-3.0-py27-abi3-linux_x86_64.whl • ABI (Application

    Binary Interface) tag + platform tag == compatibility tags which express the package’s basic interpreter requirements
  33. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • nix + conda
  34. pex • Any directory with a __init__.py is considered a

    package • Any directory with a __main__.py is treated as an executable • The zipimport module provides a default import hook for Python >=2.4 • If the Python import framework sees a zip file with a proper __init__.py, it can be treated as a directory • pex == all of the above put together
  35. pex

  36. pex

  37. pex

  38. pex

  39. pants • Python Ants • Build system based on Google

    Blaze • Facebook Buck, Google Bazel, Linkedin’s PyGradle etc • Pants supports Java, Scala, Python, C/C++, Go, Thrift, Protobuf and Android code. • Builds pex files for Python.
  40. Gradle Maven Ant Setuptools Buildout Pants cMake Powerful dependency resolution

    API YES YES NO NO NO NO NO Decoupled dependency metadata YES YES NO NO NO YES NO Ivy based metadata YES NO NO NO NO YES NO Pluggable YES YES YES YES* YES YES YES Scriptable YES NO NO YES YES* YES NO Human readable YES NO NO YES YES YES YES* Natively polyglot YES NO NO NO YES YES NO Generic artifact hosting support YES NO NO NO NO NO NO
  41. pants • Designed for fast, reproducible builds in a monorepo

    • Goal of pants was to build modularity into a monorepo • But just what is a monorepo?
  42. Monorepo • Large codebase, growing rapidly • Many subprojects that

    share a large amount of code • Complex dependencies of third party libs • Variety of languages, code gen frameworks etc. • No need to maintain strict backwards compatibility
  43. Putative benefits of a monorepo • Increases code reuse. •

    Allows for easy collaboration between many authors. • Encourages a cohesive codebase where problems are refactored - not worked around. • Simplifies dependency management within the codebase. All of the code you run with is visible at a single commit in the repo.
  44. pants • Fine-grained invalidation. • Shared build caches – remote

    and local • Concurrent task execution. • Incremental compilation. • Extensibility, via a plugin API.
  45. pants • BUILD file – describes the dependency graph •

    Many BUILD files per source tree define targets and goals • 1:1:1 rule – one target per directory representing a single package • Goals describe what you want to do to the targets • Can see all goals with the command pants goal list
  46. pants • Doesn’t cross outside of Python’s build • Does

    not isolate build time dependencies • Uses system provided .so’s - any dependencies that has built .so files (greenlet, for example) will be packaged but not the ones they rely on during runtime.
  47. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • nix + conda
  48. Docker • What IS Docker? • VM replacement • Build,

    ship, run anything, anywhere • #kubernetes, #swarm, #containerize everything #otherbuzzwords • Python + Docker == nirvana? • Not really
  49. What IS Docker From the official docs - • Docker

    containers guarantees that the software will always run the same, regardless of its environment. • “The goal is to encapsulate a software component and all its dependencies • run it - without extra dependencies regardless of the underlying machine and the contents of the container.”
  50. Docker… what does it do? • The Docker engine is

    a container runtime • It is also an image format • overlay networking • With 1.12 in swarm mode, it’s also a cluster scheduler • Process manager • … and much, much more (service discovery, load balancing, TLS ...) • All compiled into one gigantic binary running as root
  51. Docker Do I still need to use a virtualenv? YES!

    Do I still need to build wheels? YES! Does this mean the dependency problem is solved for good? NO!
  52. Outline of this talk • What is Python “packaging” •

    Brief history of Python packaging • distutils, setuptools, pip and virtualenv • sdists and bdists • eggs and wheels • pex + pants • Docker • Nix + conda (frankly, another talk in its own right)