$30 off During Our Annual Pro Sale. View Details »

Snakes in a Buildpack: Deploying Data Science in Cloud Foundry

Ian Huston
September 27, 2016

Snakes in a Buildpack: Deploying Data Science in Cloud Foundry

Ian Huston & James Wen, Pivotal

Presented first at CF Summit Frankfurt 2016

The Cloud Foundry Buildpacks team set out on a journey in February to go where no official buildpack has gone before: Data Science. Working with those most elusive creatures, data scientists, we toiled in the deepest darkest crevices of our stacks and buildpacks to find a way to easily support the PyData ecosystem. We discovered a package manager called Conda and migrated it to the buildpacks’ zoo, where it now lives in glorious captivity.

Join us as we share our story with you! From idea inception to experiment after experiment after experiment after… you get the idea. We’ll share our initial working hypothesis and subsequent surprising discoveries, dive into the innards of buildpacks and stacks and detail what it was like working with the Pivotal Data Science team as our first customer. Finally, we’ll `cf push` a python data science app with no vendored dependencies!

About Ian Huston
Ian Huston is a Senior Data Scientist at Pivotal Labs. Before joining Pivotal, Ian used Python to create and destroy baby universes inside numerical simulations of the inflationary phase of cosmology.

About James Wen
James Wen is the Team Lead (Anchor) of the Cloud Foundry Buildpacks team at Pivotal in New York City. He currently maintains and works on the Cloud Foundry system buildpacks, buildpack tooling, stacks, and the extensive automation behind it all. In his free time, James serves as a core maintainer and contributor to Bundler and also loves to rock climb, whether on plastic or real rock.

Ian Huston

September 27, 2016
Tweet

More Decks by Ian Huston

Other Decks in Technology

Transcript

  1. Snakes in a Buildpack
    By Ian Huston and James Wen

    View Slide

  2. Ian has a problem
    Hard to deploy apps using great Python data science packages

    View Slide

  3. Why is this important?
    #productsNOTpowerpoints

    View Slide

  4. Current Solution
    Community Buildpack:
    https://github.com/ihuston/python-conda-buildpack
    Not tested
    No CI pipeline
    No time for maintenance

    View Slide

  5. Other Attempted Solutions
    Kenneth Reitz Buildpack
    ● https://github.com/kennethreitz/conda-buildpack
    ● Looks unmaintained, Heroku-specific, didn’t
    work with CF
    Continuum’s Conda buildpack
    ● https://github.com/conda/conda-buildpack
    ● No updates for 2 years, Heroku specific,
    would need downstream changes
    Just use Pip!
    ● https://jakevdp.github.io/blog/2016/08/25/co
    nda-myths-and-misconceptions/
    Docker?
    ● Don’t want responsibility of maintaining OS
    layer!

    View Slide

  6. Containers vs Buildpacks
    runtime layer
    OS image
    application layer
    Container (e.g. Docker)
    system brings
    fixed host OS
    Kernel
    * Devs may bring a custom
    buildpack
    runtime layer*
    OS image
    application layer
    Buildpack
    App container
    System Provides
    Dev Provides
    system brings
    fixed host OS
    Kernel

    View Slide

  7. Experiments
    - Use miniconda in the CF python buildpack
    - Vendor anaconda
    - Add data science dependencies to the rootfs
    Solution: Port over Ian’s conda buildpack and maintain it as a code
    path in the current python buildpack.

    View Slide

  8. Improvement/Fixes needed after
    Lock miniconda to a version instead of
    downloading latest
    Miniconda was always installing python 3
    Miniconda needed to save and load from
    natural app cache
    Suppress miniconda progress bar output
    (massive staging logs)

    View Slide

  9. What can we do now?
    Predict time to delivery for an
    international courier company
    Order the right components
    for an European car
    manufacturer
    Deliver warnings of
    dangerous road conditions
    while you drive

    View Slide

  10. Hobbyist: OMSCS Machine Learning Course
    - Taking a Machine Learning class for the GTech OMSCS program
    - HW Assignment = Applying 5 supervised machine learning models to 2 large data sets

    View Slide

  11. Hobbyist: OMSCS Machine Learning Course
    - For the experiments/analyses, using Pivotal Web Services > running on my dinky 4 year old macbook
    - Ran the experiments by wrapping them in minimal Flask apps

    View Slide

  12. Future of Conda in the Python Buildpack
    Your Feedback?
    - Can’t run this functionality in air-gapped environment
    - Conda cannot vendor your packages (i.e. something
    like `bundle package`)
    - Reduce size of end droplet?
    - Apps using conda often end up large due to slew of
    dependencies
    Vendor miniconda (cached buildpack)

    View Slide

  13. Where can I get it?
    Official CF Python Buildpack from v1.5.6:
    https://github.com/cloudfoundry/python-buildpack

    View Slide

  14. Getting your wonderful ideas into Cloud Foundry
    - Open up discussion on cf-dev mailing list - Open up discussion on open source Cloud Foundry slack
    - Do the CF Dojo program and gain the skills to
    work on CF full-time (or just join Pivotal)
    - Create a feature narrative and get feedback

    View Slide

  15. Any Questions?
    James Wen @rochesterinnyc
    Ian Huston @ianhuston
    https://github.com/cloudfoundry/python-buildpack

    View Slide