Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Packaging and Deployment with Conda

Packaging and Deployment with Conda

Conda is a cross-platform system level tool for managing packages and deploying software. It is the foundation of a Python distribution called Anaconda, but can be used to package anything on any system. This talk provides a motivation as to why we built conda and gives and overview of what it is and some examples of what it can be used for.

Travis E. Oliphant

November 08, 2013
Tweet

More Decks by Travis E. Oliphant

Other Decks in Technology

Transcript

  1. Packaging and Deployment
    with conda
    Travis E. Oliphant, PhD
    Continuum Analytics, Inc

    View full-size slide

  2. What this tutorial is not
    •Not a re-hash of the history of packaging and
    deployment solutions in Python (though I’ve lived
    and suffered through all of it).
    •Not a general comparison of all the ways you can
    currently deploy and package Python (though I
    know something about most of them and have
    used in earnest many of them).
    •Not a commercial for anything Continuum is
    selling --- conda and binstar and binstar.org are
    free.

    View full-size slide

  3. Why this tutorial
    •Packaging is a critical part of software.
    •Poor packaging and deployment tools in Python
    have left to software engineering mistakes I’ve
    made in SciPy and NumPy.
    •Poor packaging and deployment solutions are
    everywhere in open source and industry and lead
    to software engineering mistakes with poorly
    factored code:
    •hard to test
    •hard to debug
    •hard to maintain

    View full-size slide

  4. Case Study: SciPy
    There was this thing called the
    Internet and one could make a
    web-page and put code up on
    it and people started using it ...
    Facebook for Hackers
    I started SciPy in 1999 while I was in grad-
    school at the Mayo Clinic
    (it was called Multipack back then)

    View full-size slide

  5. Case Study: SciPy
    Packaging circa 1999: Source tar ball and
    make file (it was all about build)
    (with pip it still is...)
    SciPy is basically a bunch of C/C++/Fortran routines
    with Python interfaces
    Observation: Popularity of Multipack (Early SciPy)
    grew significantly when Robert Kern made pre-
    built binaries for Windows

    View full-size slide

  6. Case Study: SciPy





    •Getting people to try your cool stuff -- hard if they
    have to “build” first (our internal optimizer for
    cognitive ease)
    •Getting a suitable build environment with a C/C++
    and Fortran compiler can be difficult (still at the
    root of why pip install scipy does not work well).
    •Build is harder when there are dependencies
    (combinatorial explosion of possible versions, APIs,
    etc.)

    View full-size slide

  7. Case Study: SciPy
    • Difficulty of producing binaries plus the desire to avoid
    the dependency chain and lack of broad packaging
    solutions led to early SciPy being a “distribution” instead
    of separate inter-related libraries.
    • There were (and are) too many different projects in
    SciPy (projects need 1-5 core contributors for
    communication dynamic reasons related to team-sizes)

    View full-size slide

  8. Case Study: NumPy
    NumPy started in 2005 while I was
    teaching at BYU (it was a merger of
    Numeric and Numarray)
    NumPy ABI has not changed officially
    since 1.0 came out in 2006
    Presumably extension modules (SciPy, scikit-learn, matplotlib,
    etc.) compiled against NumPy 1.0 will still work on NumPy 1.8
    This was not a design goal!!!

    View full-size slide

  9. Case Study: NumPy
    This was a point of some contention and
    community difficulty when date-time was added in
    version 1.4 (impossible without changing the ABI)
    but not really settled until version 1.7
    The fundamental reason was a user-driven
    obsession with keeping ABI compatibility.
    Windows users lacked useful packaging
    solution in face of NumPy-Stack

    View full-size slide

  10. NumPy Stack (cry for conda...)
    NumPy
    SciPy Pandas Matplotlib
    scikit-learn
    scikit-image statsmodels
    PyTables
    OpenCV
    Cython
    Numba SymPy NumExpr
    astropy BioPython GDAL
    PySAL
    ... many many more ...

    View full-size slide

  11. Conda helps dramatically
    Setup a test environment
    $ conda update conda
    $ conda create -n test python pip
    $ source activate test
    Try to Install via pip
    Try to Install via conda
    (test)$ pip install scikit-learn
    (test)$ conda install scikit-learn

    View full-size slide

  12. Fundamental Principles
    •Complex things are built out of simple things
    •Fundamental principle of software engineering is
    “separation of concerns” (modularity)
    •Reusability is enhanced when you “do one thing
    and do it well”
    •To deploy you need to bring the pieces back
    together
    •This all means you need a good packaging system

    View full-size slide

  13. System Packaging solutions
    yum (rpm)
    apt-get (dpkg)
    Linux OSX
    macports
    homebrew
    Windows
    ??
    Cross-platform
    conda
    With virtual environments conda provides a modern, cross-
    platform, system-level packaging and deployment solution

    View full-size slide

  14. Conda Features
    • Excellent support for real system-level environments
    (lighter weight than docker.io)
    • Minimizes code-copies (uses hard/soft links if possible)
    • Dependency solver using fast native-code satisfiability
    solver (SAT solver)
    • Simple format binary tar-ball + meta-data
    • Meta-data allows static analysis of dependencies
    • Easy to create multiple “channels” which are repositories
    for binary packages
    • User installable (no root privileges needed)
    • Can still use standard tools like pip and virtualenv ---
    conda fills in where they fail.

    View full-size slide

  15. First steps
    $ conda create -n py3k python=3.3
    $ source activate py3k
    Create an environment
    Install IPython notebook
    (py3k) $ conda install ipython-notebook
    $ conda create -n py3k python=3.3 ipython-notebook
    $ source activate py3k
    All in One

    View full-size slide

  16. Anaconda installation
    ROOT_DIR
    The directory that Anaconda was installed into; for
    example, /opt/Anaconda or C:\Anaconda
    /pkgs
    Also referred to as PKGS_DIR. This directory contains
    exploded packages, ready to be linked in conda
    environments. Each package resides in a subdirectory
    corresponding to its canonical name.
    /envs
    The system location for additional conda environments to
    be created.
    the default, or root, environment
    /bin
    /include
    /lib
    /share

    View full-size slide

  17. Look at conda package --- a simple .tar.bz2
    http://docs.continuum.io/conda/intro.html

    View full-size slide

  18. Environments
    One honking great idea!
    Let’s do more of those!
    Easy to make
    Easy to throw away
    Uses:
    • Testing (python 2.6, 2.7, 3.3)
    • Development
    • Trying new packages from PyPI
    • Separating deployed apps with
    different dependency needs
    • Trying new versions of Python
    • Reproducing someone’s work conda create -h

    View full-size slide

  19. conda info -e
    conda info

    View full-size slide

  20. conda install
    Uses pip install if conda package is not found!
    http://repo.continuum.io/pkgs/dev
    Experimental or developmental versions of packages
    http://repo.continuum.io/pkgs/gpl
    GPL licensed packages
    http://repo.continuum.io/pkgs/free
    non GPL open source packages
    Default package repositories (configurable)

    View full-size slide

  21. conda search

    View full-size slide

  22. conda list
    also includes packages installed via pip!

    View full-size slide

  23. conda update

    View full-size slide

  24. conda remove

    View full-size slide

  25. conda config
    # This is a sample .condarc file
    # channel locations. These override conda defaults, i.e., conda will
    # search *only* the channels listed here, in the order given. Use "default" to
    # automatically include all default channels.
    channels:
    - defaults
    - http://some.custom/channel
    # Proxy settings
    # http://[username]:[password]@[server]:[port]
    proxy_servers:
    http: http://user:[email protected]:8080
    https: https://user:[email protected]:8080
    envs_dirs:
    - /opt/anaconda/envs
    - /home/joe/my-envs
    pkg_dirs:
    - /home/joe/user-pkg-cache
    - /opt/system/pkgs
    changeps1: False
    # binstar.org upload (not defined here means ask)
    binstar_upload: True

    View full-size slide

  26. conda package -u
    conda package --pkg-name bulk --pkg-version 0.1
    Untracked files
    Easy way to install into an environment using
    anything (pip, make, setup.py, etc.) and then package
    up all of it into a binary tar-ball deployable via
    conda install .tar.bz2
    pickle for binary code!

    View full-size slide

  27. conda build --build-recipe
    Building new packages
    1)
    2) conda build

    View full-size slide

  28. Conda Recipe == A directory
    build.sh BASH build commands (POSIX)
    bld.bat CMD build commands (Win)
    meta.yaml extended yaml declarative meta-data
    Required
    Optional
    run_test.py will be executed during test phase
    *.patch patch-files for the source
    * any other resources needed by build but not included
    in sources described in meta.yaml file

    View full-size slide

  29. MetaData
    package:
    name: # name of package
    version: # version of package
    about:
    home: # home-page
    license: # license
    # All optional from here....
    source:
    fn: # filename of source
    url: # url of source
    md5: # hash of source
    # or from git:
    git_url:
    git_tag:
    patches: # list of patches to source
    - fix.patch
    build:
    entry_points: # entry-points (binary commands or scripts)
    - name = module:function
    number: # defaults to 0
    requirements: # lists of requirements
    build: # requirements for build (as a list)
    run: # requirements for running (as a list)
    test:
    requires: # list of requirements for testing
    commands: # commands to run for testing (entry-points)
    imports: # modules to import for testing
    http://docs.continuum.io/conda/build.html

    View full-size slide

  30. Binstar.org (beta-code --- binstar in beta)
    Once you
    have built a
    conda
    package, you
    can share it
    with the
    world on
    binstar.org
    then
    conda install

    View full-size slide

  31. Adding Binstar channels
    $ conda config --add channels
    'http://conda.binstar.org/travis'
    $ conda config --add channels
    'http://conda.binstar.org/asmuerer'

    View full-size slide

  32. Useful aliases
    workon=‘source activate’
    workoff=‘source deactivate’
    virtualenv=‘conda create -p’

    View full-size slide

  33. Thanks!
    Aaron Meurer
    conda and binstar developer
    Sean Ross-Ross (binstar.org)
    Bryan Van de Ven (original conda author)
    Ilan Schnell (principal conda developer)

    View full-size slide