Snakes in a Buildpack: Deploying Data Science in Cloud Foundry

Ian Huston
September 27, 2016

Ian Huston & James Wen, Pivotal

Presented first at CF Summit Frankfurt 2016

The Cloud Foundry Buildpacks team set out on a journey in February to go where no official buildpack has gone before: Data Science. Working with those most elusive creatures, data scientists, we toiled in the deepest darkest crevices of our stacks and buildpacks to find a way to easily support the PyData ecosystem. We discovered a package manager called Conda and migrated it to the buildpacks’ zoo, where it now lives in glorious captivity.

Join us as we share our story with you! From idea inception to experiment after experiment after experiment after… you get the idea. We’ll share our initial working hypothesis and subsequent surprising discoveries, dive into the innards of buildpacks and stacks and detail what it was like working with the Pivotal Data Science team as our first customer. Finally, we’ll `cf push` a python data science app with no vendored dependencies!

About Ian Huston
Ian Huston is a Senior Data Scientist at Pivotal Labs. Before joining Pivotal, Ian used Python to create and destroy baby universes inside numerical simulations of the inflationary phase of cosmology.

About James Wen
James Wen is the Team Lead (Anchor) of the Cloud Foundry Buildpacks team at Pivotal in New York City. He currently maintains and works on the Cloud Foundry system buildpacks, buildpack tooling, stacks, and the extensive automation behind it all. In his free time, James serves as a core maintainer and contributor to Bundler and also loves to rock climb, whether on plastic or real rock.

  1. Other Attempted Solutions Kenneth Reitz Buildpack • https://github.com/kennethreitz/conda-buildpack • Looks

    unmaintained, Heroku-specific, didn’t work with CF Continuum’s Conda buildpack • https://github.com/conda/conda-buildpack • No updates for 2 years, Heroku specific, would need downstream changes Just use Pip! • https://jakevdp.github.io/blog/2016/08/25/co nda-myths-and-misconceptions/ Docker? • Don’t want responsibility of maintaining OS layer!
  2. Containers vs Buildpacks runtime layer OS image application layer Container

    (e.g. Docker) system brings fixed host OS Kernel * Devs may bring a custom buildpack runtime layer* OS image application layer Buildpack App container System Provides Dev Provides system brings fixed host OS Kernel
  3. Experiments - Use miniconda in the CF python buildpack -

    Vendor anaconda - Add data science dependencies to the rootfs Solution: Port over Ian’s conda buildpack and maintain it as a code path in the current python buildpack.
  4. Improvement/Fixes needed after Lock miniconda to a version instead of

    downloading latest Miniconda was always installing python 3 Miniconda needed to save and load from natural app cache Suppress miniconda progress bar output (massive staging logs)
  5. What can we do now? Predict time to delivery for

    an international courier company Order the right components for an European car manufacturer Deliver warnings of dangerous road conditions while you drive
  6. Hobbyist: OMSCS Machine Learning Course - Taking a Machine Learning

    class for the GTech OMSCS program - HW Assignment = Applying 5 supervised machine learning models to 2 large data sets
  7. Hobbyist: OMSCS Machine Learning Course - For the experiments/analyses, using

    Pivotal Web Services > running on my dinky 4 year old macbook - Ran the experiments by wrapping them in minimal Flask apps
  8. Future of Conda in the Python Buildpack Your Feedback? -

    Can’t run this functionality in air-gapped environment - Conda cannot vendor your packages (i.e. something like `bundle package`) - Reduce size of end droplet? - Apps using conda often end up large due to slew of dependencies Vendor miniconda (cached buildpack)
  9. Where can I get it? Official CF Python Buildpack from

    v1.5.6: https://github.com/cloudfoundry/python-buildpack
  10. Getting your wonderful ideas into Cloud Foundry - Open up

    discussion on cf-dev mailing list - Open up discussion on open source Cloud Foundry slack - Do the CF Dojo program and gain the skills to work on CF full-time (or just join Pivotal) - Create a feature narrative and get feedback