Slide 1

Slide 1 text

Effectively using Open Source with conda Travis E. Oliphant, PhD Continuum Analytics, Inc

Slide 2

Slide 2 text

The Opportunity • Millions of projects that can be used in the enterprise • Not enough to just adopt once — these projects change rapidly • Effective use requires a plan for managing updates

Slide 3

Slide 3 text

The Challenge Separation of Concerns leads to granular libraries with often deep dependencies

Slide 4

Slide 4 text

The Challenge • Different “entry-points” (end-user applications or scripts) can have different dependencies. Often many of the dependencies are shared but a few applications need different versions of some packages. • Not specific to any particular language or ecosystem. Python, Ruby, Node.Js, C/C++, .NET, Java, all have the same problem: How do you manage software life-cycle effectively? • Production deployments need stability. IT managers want ease of deployment and testing. Developers want agility and ease of development.

Slide 5

Slide 5 text

The Challenge How can developers and domain experts in an organization quickly and easily take advantage of the latest software developments yet still have stable production deployments of complex software? You cannot take full advantage of the pace of open-source development if you don’t address this!

Slide 6

Slide 6 text

Case Study: SciPy There was this thing called the Internet and one could make a web-page and put code up on it and people started using it ... Facebook for Hackers I started SciPy in 1999 while I was in grad- school at the Mayo Clinic (it was called Multipack back then)

Slide 7

Slide 7 text

Case Study: SciPy Packaging circa 1999: Source tar ball and make file (users had to build) SciPy is basically a bunch of C/C++/Fortran routines with Python interfaces Observation: Popularity of Multipack (Early SciPy) grew significantly when Robert Kern made pre- built binaries for Windows

Slide 8

Slide 8 text

Case Study: SciPy • Difficulty of producing binaries plus the desire to avoid the dependency chain and lack of broad packaging solutions led to early SciPy being a “distribution” instead of separate inter-related libraries. • There were (and are) too many different projects in SciPy (projects need 1-5 core contributors for communication dynamic reasons related to team-sizes)

Slide 9

Slide 9 text

Case Study: NumPy I started writing NumPy in 2005 while I was teaching at BYU (it was a merger of Numeric and Numarray) NumPy ABI has not changed “officially” since 1.0 came out in 2006 Presumably extension modules (SciPy, scikit-learn, matplotlib, etc.) compiled against NumPy 1.0 will still work on NumPy 1.8.1 This was not a design goal!!!

Slide 10

Slide 10 text

Case Study: NumPy This was a point of some contention and community difficulty when date-time was added in version 1.4 (impossible without changing the ABI in some way) but not really settled until version 1.7 The fundamental reason was a user-driven obsession with keeping ABI compatibility. Windows users lacked useful packaging solution in face of NumPy-Stack

Slide 11

Slide 11 text

NumPy Stack (cry for conda...) NumPy SciPy Pandas Matplotlib scikit-learn scikit-image statsmodels PyTables OpenCV Cython Numba SymPy NumExpr astropy BioPython GDAL PySAL ... many many more ...

Slide 12

Slide 12 text

Fundamental Principles •Complex things are built out of simple things •Fundamental principle of software engineering is “separation of concerns” (modularity) •Reusability is enhanced when you “do one thing and do it well” •But, to deploy you need to bring the pieces back together. ! •This means you need a good packaging system for binary artifacts — with multiple-environments.

Slide 13

Slide 13 text

Continuum Solutions (Free) Conda binstar.org Anaconda Free all-in-one distribution of Python for Analytics and Visualization • numpy, scipy, ipython • matplotlib, bokeh, • pandas, statsmodels, scikit-learn • many, many more… 100+ Miniconda Python + conda — with these you can install exactly what you want… • Binary repository of packages (public) • Multiple package types • Free public build queue • Current focus on: • Python pypi-compatible packages (source distributions) • conda packages (binary distributions) $ conda install anaconda • Cross-platform package manager • Dependency management (uses SAT solver to resolve all dependencies) • System-level virtual environments (more flexible than virtualenv)

Slide 14

Slide 14 text

Continuum Solutions (Premium) Anaconda Server • Binary repository for private package Premium features: • hosting of private packages (public packages are free) • access to priority build queue • $10 / month (individuals) • 25 private packages • 5 GB disk space • $50 / month (organizations) • 200 private packages • 30 GB disk space • right to have private packages in organizations • $1500 / year • unlimited private packages • 100 GB of disk space binstar.org • Internal mirror of public repositories • Mix private internal packages with public repositories • Build customized versions of Anaconda installers • Environment to .exe and .rpm tools • Comprehensive licensing • Comprehensive support • On-premise version of binstar.org

Slide 15

Slide 15 text

System Packaging solutions yum (rpm) apt-get (dpkg) Linux OSX macports homebrew Windows chocolatey npackd Cross-platform conda With virtual environments conda provides a modern, cross- platform, system-level packaging and deployment solution

Slide 16

Slide 16 text

Conda Features • Excellent support for “system-level” environments (like having mini VMs but much lighter weight than docker.io) • Minimizes code-copies (uses hard/soft links if possible) • Dependency solver using fast satisfiability solver (SAT solver) • Simple format binary tar-ball + meta-data • Meta-data allows static analysis of dependencies • Easy to create multiple “channels” which are repositories for binary packages • User installable (no root privileges needed) • Can still use tools like pip --- conda fills in where they fail.

Slide 17

Slide 17 text

Examples Setup a test environment $ conda update conda $ conda create -n test python pip $ source activate test Install another package (test)$ conda install scikit-learn $ activate test Windows

Slide 18

Slide 18 text

First steps $ conda create -n py3k python=3.3 $ source activate py3k Create an environment Install IPython notebook (py3k) $ conda install ipython-notebook $ conda create -n py3k python=3.3 ipython-notebook $ source activate py3k All in One

Slide 19

Slide 19 text

Anaconda installation ROOT_DIR! The directory that Anaconda was installed into; for example, /opt/Anaconda or C:\Anaconda! /pkgs! Also referred to as PKGS_DIR. This directory contains exploded packages, ready to be linked in conda environments. Each package resides in a subdirectory corresponding to its canonical name.! /envs! The system location for additional conda environments to be created.! ! the default, or root, environment! /bin! /include! /lib! /share

Slide 20

Slide 20 text

Look at conda package --- a simple .tar.bz2 http://docs.continuum.io/conda/intro.html

Slide 21

Slide 21 text

Anatomy of unpacked conda package /lib /include /bin /man /info files index.json bzipped tarfile of all the files comprising the package at the full-paths they would be installed to relative to a “system” install or “chroot jail” an environment is just a “union” of these paths All conda packages have this info directory which contains meta-data for tracked files, dependency information, etc.

Slide 22

Slide 22 text

Environments One honking great idea! Let’s do more of those! Easy to make Easy to throw away Uses: • Testing (python 2.6, 2.7, 3.3) • Development • Trying new packages from PyPI • Separating deployed apps with different dependency needs • Trying new versions of Python • Reproducing someone’s work conda create -h

Slide 23

Slide 23 text

conda info -e Getting System information Basic info conda info Named-environment info conda info --all System info conda info --system

Slide 24

Slide 24 text

conda install -n py3k scipy pip http://repo.continuum.io/pkgs/dev Experimental or developmental versions of packages http://repo.continuum.io/pkgs/gpl GPL licensed packages http://repo.continuum.io/pkgs/free non GPL open source packages Default package repositories (configurable) Installing packages

Slide 25

Slide 25 text

How it works Channel 1 Channel 2 Channel N metadata metadata metadata conda merged metadata l l l

Slide 26

Slide 26 text

Create channels • Create a directory of conda packages • Run conda index • Either use file:///path/to/dir in .condarc or use simple web server on the /path/to/dir Option 1 Option 2 Use binstar.org (also available as on-premise solution with Anaconda Server)

Slide 27

Slide 27 text

Binstar.org — channels (request invite) conda install -c ! will install from binstar channel ! or you can add channel to your config file free for public packages

Slide 28

Slide 28 text

conda list also includes packages installed via pip! List Installed packages conda create -n py3k scipy pip source activate py3k pip install pint $ conda list # packages in environment at /Users/travis/anaconda/envs/py3k: # numpy 1.8.1 py27_0 openssl 1.0.1g 0 pint 0.4.2 pip 1.5.4 py27_0 python 2.7.6 1 readline 6.2 2 scipy 0.13.3 np18py27_0 setuptools 3.1 py27_0 sqlite 3.7.13 1 tk 8.5.13 1 wsgiref 0.1.2 zlib 1.2.7 1 Output

Slide 29

Slide 29 text

Update a package to latest conda update pandas get the latest pandas from the channels you are subscribed to conda update anaconda change to the latest released anaconda including its specific dependencies this can downgrade packages if they are newer than those in the “released” Anaconda conda update --all To update all the packages in an environment to the latest versions use the --all option

Slide 30

Slide 30 text

conda search Search for a package Find packages and channels they are in conda search --outdated sympy Only show packages matching regex that are installed but outdated conda search typo typogrify * 2.0.0 py27_0 http://conda.binstar.org/travis/osx-64/ 2.0.0 py33_1 http://conda.binstar.org/asmeurer/osx-64/ 2.0.0 py26_1 http://conda.binstar.org/asmeurer/osx-64/ sympy 0.7.1 py27_0 defaults ! 0.7.4 py26_0 defaults 0.7.4.1 py33_0 defaults * 0.7.4.1 py27_0 defaults 0.7.4.1 py26_0 defaults 0.7.5 py34_0 defaults 0.7.5 py33_0 defaults l l l l l l

Slide 31

Slide 31 text

conda remove -n py3k scipy matplotlib Removing files and environments Removing Packages Removing Environment conda remove -n py3k --all Note: packages are just “unlinked” from environment. All the files are still available unpacked in a package cache. Removing unused packages conda clean -t conda clean -p Remove unused tarballs Remove unused directories

Slide 32

Slide 32 text

conda package -u conda package --pkg-name bulk --pkg-version 0.1 Untracked Files Easy way to install into an environment using anything (pip, make, setup.py, etc.) and then package up all of it into a binary tar-ball deployable via conda install .tar.bz2 ! pickle for binary code!

Slide 33

Slide 33 text

# This is a sample .condarc file ! # channel locations. These override conda defaults, i.e., conda will # search *only* the channels listed here, in the order given. Use "default" to # automatically include all default channels. ! channels: - defaults - http://some.custom/channel ! # Proxy settings # http://[username]:[password]@[server]:[port] proxy_servers: http: http://user:[email protected]:8080 https: https://user:[email protected]:8080 ! envs_dirs: - /opt/anaconda/envs - /home/joe/my-envs ! pkg_dirs: - /home/joe/user-pkg-cache - /opt/system/pkgs ! changeps1: False ! # binstar.org upload (not defined here means ask) binstar_upload: True Conda configuration Scripting interface conda config —add KEY VALUE conda config —remove-key KEY conda config —get KEY conda config —set KEY BOOL conda config —remove KEY VALUE

Slide 34

Slide 34 text

conda skeleton pypi Building new packages conda build Option 1 Option 2 conda pipbuild conda install conda-build

Slide 35

Slide 35 text

Conda Recipe is a directory build.sh BASH build commands (POSIX) bld.bat CMD build commands (Win) meta.yaml extended yaml declarative meta-data Required Optional run_test.py will be executed during test phase *.patch patch-files for the source * any other resources needed by build but not included in sources described in meta.yaml file

Slide 36

Slide 36 text

Recipe MetaData package: name: # name of package version: # version of package about: home: # home-page license: # license ! # All optional from here.... source: fn: # filename of source url: # url of source md5: # hash of source # or from git: git_url: git_tag: patches: # list of patches to source - fix.patch build: entry_points: # entry-points (binary commands or scripts) - name = module:function number: # defaults to 0 requirements: # lists of requirements build: # requirements for build (as a list) run: # requirements for running (as a list) test: requires: # list of requirements for testing commands: # commands to run for testing (entry-points) imports: # modules to import for testing http://docs.continuum.io/conda/build.html

Slide 37

Slide 37 text

Converting to another platform Conda packages are specific to a particular platform. However, if there are no platform- specific binary files in a package, it can be converted automatically to a package that can be installed on another platform. conda convert --output-dir win32 --platform win-32 Example

Slide 38

Slide 38 text

Binstar.org (request invite) Once you have built a conda package, you can share it with the world on binstar.org ! conda install -c free for public packages

Slide 39

Slide 39 text

Binstar $ conda config --add channels 'http://conda.binstar.org/travis' $ conda config --add channels 'http://conda.binstar.org/asmuerer' Adding channels Uploading packages binstar upload /full/path/to/package.tar.bz2 binstar register /full/path/to/package.tar.bz2 if package never uploaded before

Slide 40

Slide 40 text

Binstar Package Types Permissions Description Private Only people given permission can see this package. Personal Everyone will be able to see this package in your user repository. Publish This package will be published in the global public repository.

Slide 41

Slide 41 text

Useful aliases workon=‘source activate’ workoff=‘source deactivate’

Slide 42

Slide 42 text

• Cross-platform Tested and Supported Python Distribution • Enterprise Python Deployment • Private, Secure On-premise package repository • Comprehensive Licensing • Customized Installers and Mirrors • Additional Products • Enhanced Support • Optional, On-premise binstar.org

Slide 43

Slide 43 text

Thanks! Aaron Meurer conda and binstar developer Sean Ross-Ross (principal binstar.org) Bryan Van de Ven (original conda author) Ilan Schnell (principal conda developer)