Slide 1

Slide 1 text

Packaging and Deployment with conda Travis E. Oliphant, PhD Continuum Analytics, Inc

Slide 2

Slide 2 text

What this tutorial is not •Not a re-hash of the history of packaging and deployment solutions in Python (though I’ve lived and suffered through all of it). •Not a general comparison of all the ways you can currently deploy and package Python (though I know something about most of them and have used in earnest many of them). •Not a commercial for anything Continuum is selling --- conda and binstar and binstar.org are free.

Slide 3

Slide 3 text

Why this tutorial •Packaging is a critical part of software. •Poor packaging and deployment tools in Python have left to software engineering mistakes I’ve made in SciPy and NumPy. •Poor packaging and deployment solutions are everywhere in open source and industry and lead to software engineering mistakes with poorly factored code: •hard to test •hard to debug •hard to maintain

Slide 4

Slide 4 text

Case Study: SciPy There was this thing called the Internet and one could make a web-page and put code up on it and people started using it ... Facebook for Hackers I started SciPy in 1999 while I was in grad- school at the Mayo Clinic (it was called Multipack back then)

Slide 5

Slide 5 text

Case Study: SciPy Packaging circa 1999: Source tar ball and make file (it was all about build) (with pip it still is...) SciPy is basically a bunch of C/C++/Fortran routines with Python interfaces Observation: Popularity of Multipack (Early SciPy) grew significantly when Robert Kern made pre- built binaries for Windows

Slide 6

Slide 6 text

Case Study: SciPy •Getting people to try your cool stuff -- hard if they have to “build” first (our internal optimizer for cognitive ease) •Getting a suitable build environment with a C/C++ and Fortran compiler can be difficult (still at the root of why pip install scipy does not work well). •Build is harder when there are dependencies (combinatorial explosion of possible versions, APIs, etc.)

Slide 7

Slide 7 text

Case Study: SciPy • Difficulty of producing binaries plus the desire to avoid the dependency chain and lack of broad packaging solutions led to early SciPy being a “distribution” instead of separate inter-related libraries. • There were (and are) too many different projects in SciPy (projects need 1-5 core contributors for communication dynamic reasons related to team-sizes)

Slide 8

Slide 8 text

Case Study: NumPy NumPy started in 2005 while I was teaching at BYU (it was a merger of Numeric and Numarray) NumPy ABI has not changed officially since 1.0 came out in 2006 Presumably extension modules (SciPy, scikit-learn, matplotlib, etc.) compiled against NumPy 1.0 will still work on NumPy 1.8 This was not a design goal!!!

Slide 9

Slide 9 text

Case Study: NumPy This was a point of some contention and community difficulty when date-time was added in version 1.4 (impossible without changing the ABI) but not really settled until version 1.7 The fundamental reason was a user-driven obsession with keeping ABI compatibility. Windows users lacked useful packaging solution in face of NumPy-Stack

Slide 10

Slide 10 text

NumPy Stack (cry for conda...) NumPy SciPy Pandas Matplotlib scikit-learn scikit-image statsmodels PyTables OpenCV Cython Numba SymPy NumExpr astropy BioPython GDAL PySAL ... many many more ...

Slide 11

Slide 11 text

Conda helps dramatically Setup a test environment $ conda update conda $ conda create -n test python pip $ source activate test Try to Install via pip Try to Install via conda (test)$ pip install scikit-learn (test)$ conda install scikit-learn

Slide 12

Slide 12 text

Fundamental Principles •Complex things are built out of simple things •Fundamental principle of software engineering is “separation of concerns” (modularity) •Reusability is enhanced when you “do one thing and do it well” •To deploy you need to bring the pieces back together •This all means you need a good packaging system

Slide 13

Slide 13 text

System Packaging solutions yum (rpm) apt-get (dpkg) Linux OSX macports homebrew Windows ?? Cross-platform conda With virtual environments conda provides a modern, cross- platform, system-level packaging and deployment solution

Slide 14

Slide 14 text

Conda Features • Excellent support for real system-level environments (lighter weight than docker.io) • Minimizes code-copies (uses hard/soft links if possible) • Dependency solver using fast native-code satisfiability solver (SAT solver) • Simple format binary tar-ball + meta-data • Meta-data allows static analysis of dependencies • Easy to create multiple “channels” which are repositories for binary packages • User installable (no root privileges needed) • Can still use standard tools like pip and virtualenv --- conda fills in where they fail.

Slide 15

Slide 15 text

First steps $ conda create -n py3k python=3.3 $ source activate py3k Create an environment Install IPython notebook (py3k) $ conda install ipython-notebook $ conda create -n py3k python=3.3 ipython-notebook $ source activate py3k All in One

Slide 16

Slide 16 text

Anaconda installation ROOT_DIR The directory that Anaconda was installed into; for example, /opt/Anaconda or C:\Anaconda /pkgs Also referred to as PKGS_DIR. This directory contains exploded packages, ready to be linked in conda environments. Each package resides in a subdirectory corresponding to its canonical name. /envs The system location for additional conda environments to be created. the default, or root, environment /bin /include /lib /share

Slide 17

Slide 17 text

Look at conda package --- a simple .tar.bz2 http://docs.continuum.io/conda/intro.html

Slide 18

Slide 18 text

Environments One honking great idea! Let’s do more of those! Easy to make Easy to throw away Uses: • Testing (python 2.6, 2.7, 3.3) • Development • Trying new packages from PyPI • Separating deployed apps with different dependency needs • Trying new versions of Python • Reproducing someone’s work conda create -h

Slide 19

Slide 19 text

conda info -e conda info

Slide 20

Slide 20 text

conda install Uses pip install if conda package is not found! http://repo.continuum.io/pkgs/dev Experimental or developmental versions of packages http://repo.continuum.io/pkgs/gpl GPL licensed packages http://repo.continuum.io/pkgs/free non GPL open source packages Default package repositories (configurable)

Slide 21

Slide 21 text

conda search

Slide 22

Slide 22 text

conda list also includes packages installed via pip!

Slide 23

Slide 23 text

conda update

Slide 24

Slide 24 text

conda remove

Slide 25

Slide 25 text

conda config # This is a sample .condarc file # channel locations. These override conda defaults, i.e., conda will # search *only* the channels listed here, in the order given. Use "default" to # automatically include all default channels. channels: - defaults - http://some.custom/channel # Proxy settings # http://[username]:[password]@[server]:[port] proxy_servers: http: http://user:[email protected]:8080 https: https://user:[email protected]:8080 envs_dirs: - /opt/anaconda/envs - /home/joe/my-envs pkg_dirs: - /home/joe/user-pkg-cache - /opt/system/pkgs changeps1: False # binstar.org upload (not defined here means ask) binstar_upload: True

Slide 26

Slide 26 text

conda package -u conda package --pkg-name bulk --pkg-version 0.1 Untracked files Easy way to install into an environment using anything (pip, make, setup.py, etc.) and then package up all of it into a binary tar-ball deployable via conda install .tar.bz2 pickle for binary code!

Slide 27

Slide 27 text

conda build --build-recipe Building new packages 1) 2) conda build

Slide 28

Slide 28 text

Conda Recipe == A directory build.sh BASH build commands (POSIX) bld.bat CMD build commands (Win) meta.yaml extended yaml declarative meta-data Required Optional run_test.py will be executed during test phase *.patch patch-files for the source * any other resources needed by build but not included in sources described in meta.yaml file

Slide 29

Slide 29 text

MetaData package: name: # name of package version: # version of package about: home: # home-page license: # license # All optional from here.... source: fn: # filename of source url: # url of source md5: # hash of source # or from git: git_url: git_tag: patches: # list of patches to source - fix.patch build: entry_points: # entry-points (binary commands or scripts) - name = module:function number: # defaults to 0 requirements: # lists of requirements build: # requirements for build (as a list) run: # requirements for running (as a list) test: requires: # list of requirements for testing commands: # commands to run for testing (entry-points) imports: # modules to import for testing http://docs.continuum.io/conda/build.html

Slide 30

Slide 30 text

Binstar.org (beta-code --- binstar in beta) Once you have built a conda package, you can share it with the world on binstar.org then conda install

Slide 31

Slide 31 text

Adding Binstar channels $ conda config --add channels 'http://conda.binstar.org/travis' $ conda config --add channels 'http://conda.binstar.org/asmuerer'

Slide 32

Slide 32 text

Useful aliases workon=‘source activate’ workoff=‘source deactivate’ virtualenv=‘conda create -p’

Slide 33

Slide 33 text

Thanks! Aaron Meurer conda and binstar developer Sean Ross-Ross (binstar.org) Bryan Van de Ven (original conda author) Ilan Schnell (principal conda developer)