PyData London 2015 - Getting Started with Cloud Foundry for Data Science

PyData London 2015 - Getting Started with Cloud Foundry for Data Science

Tutorial given at PyData London 2015

Cloud Foundry is an open source Platform-as-a-Service that can be used to easily deliver data driven applications. In this tutorial we will learn how to push an application to a real CF instance, how to connect to managed data services like Redis and how to deploy PyData and R projects using community buildpacks.

41d2c569bbfbec97e0ab6fd2a8c261b7?s=128

Ian Huston

June 19, 2015
Tweet

Transcript

  1. Getting Started with Cloud Foundry •  Ian Huston, Data Scientist

    Ian Huston, Data Scientist PyData London 2015
  2. 2 © Copyright 2015 Pivotal. All rights reserved. Who am

    I? Ÿ  Ian Huston Ÿ  @ianhuston Ÿ  www.ianhuston.net Ÿ  Talk resources:
 https://github.com/ihuston/ python-cf-examples Ÿ  Data Scientist Ÿ  Use PyData stack for predictive analytics and machine learning Ÿ  Previously a theoretical physicist using Python for numerical simulations & HPC
  3. 3 © Copyright 2015 Pivotal. All rights reserved. Who are

    Pivotal? OPEN DATA PLATFORM Pivotal Big Data Suite
  4. 4 © Copyright 2015 Pivotal. All rights reserved. Aims for

    this tutorial Goals: Ÿ  Understand how to use CF as a developer/data scientist Ÿ  Understand how to push PyData apps and bind to services Anti-goals: Ÿ  Understand CF from an operations viewpoint Ÿ  Install & run a full Cloud Foundry installation
  5. 5 © Copyright 2015 Pivotal. All rights reserved. Plan Ÿ 

    What is Cloud Foundry? Ÿ  Getting started with a hosted CF instance Ÿ  Pushing your first app Ÿ  What are buildpacks? Ÿ  Using PyData packages Ÿ  Using data services like Redis Ÿ  Putting it all together
  6. 6 © Copyright 2015 Pivotal. All rights reserved. Important Resources

    Clone this repo: https://github.com/ihuston/python-cf-examples
 OR download and unpack this file: http://tinyurl.com/cf-pydata
  7. What is Cloud Foundry? http://cloudfoundry.org Open Source Multi-Cloud Platform Simple

    App Deployment, Scaling & Availability
  8. Cloud Applications Haiku Here is my source code Run it

    on the cloud for me I do not care how. -  Onsi Fakhouri @onsijoe
  9. How can data scientists use CF?

  10. Data Services Easy control of incoming data

  11. Distributed computation

  12. Data Driven Applications

  13. 13 © Copyright 2015 Pivotal. All rights reserved. Multi-cloud Ÿ 

    Same applications running across different cloud providers: Private Public Hosted
  14. 14 © Copyright 2015 Pivotal. All rights reserved. Cloud Foundry

    Foundation PLATINUM GOLD SILVER See relative contributions at http://dashboard.cloudfoundry.org/
  15. 15 © Copyright 2015 Pivotal. All rights reserved. Hosted options

    Ÿ  IBM Bluemix Ÿ  Anynines Ÿ  HP Helion Development Platform Ÿ  Activestate Stackato Ÿ  CenturyLink AppFog
  16. 16 © Copyright 2015 Pivotal. All rights reserved. Getting Started

    on Pivotal Web Services Ÿ  Go to http://run.pivotal.io Ÿ  Trial lasts for 60 days, no credit card required Ÿ  Click ‘Sign Up’ and enter your email Ÿ  Open confirmation email and verify registration Ÿ  Complete SMS verification check Ÿ  Choose an organisation name (your name/handle is fine)
  17. 17 © Copyright 2015 Pivotal. All rights reserved. Downloading the

    Command Line Interface Ÿ  Register at http://run.pivotal.io Ÿ  In the CF management website click ‘Tools’ and download appropriate package Ÿ  Or go to https://github.com/cloudfoundry/cli Ÿ  Or use homebrew on OSX: $  brew  tap  pivotal/tap   $  brew  install  cloudfoundry-­‐cli   Ÿ  Provides the cf command, your interface to Cloud Foundry
  18. 18 © Copyright 2015 Pivotal. All rights reserved. cf push

    Challenge 1: Pushing your first app
  19. 19 © Copyright 2015 Pivotal. All rights reserved. Logging in

    to PWS Ÿ  Choose PWS API endpoint: 
    $  cf  api  https://api.run.pivotal.io   Ÿ  Login:      $  cf  login   Ÿ  Enter your email address and password Ÿ  You should see output like:
 API  endpoint:  https://api.run.pivotal.io  (API  version:2.28.0)   User:                      YOUR_USER_NAME   Org:                        YOUR_ORG_NAME   Space:                    development  
  20. 20 © Copyright 2015 Pivotal. All rights reserved. Challenge 1:

    Pushing your first application Ÿ  Choose PWS API endpoint: $  cf  api  https://api.run.pivotal.io   Ÿ  Login:  $  cf  login Ÿ  Code directory: 01-­‐simple-­‐python-­‐app   Ÿ  Deploy with $  cf  push Ÿ  Check it worked at http://myapp-RANDOM-WORDS.cfapps.io Ÿ  Turn off your app when finished with cf  stop  myapp (but not yet!)
  21. 21 © Copyright 2015 Pivotal. All rights reserved. Simple Flask

    App Demo Ÿ  Simple one page “Hello World” web app Ÿ  Video: https://www.youtube.com/watch?v=QOfD6tnoAB8 Ÿ  Demonstrates: –  Installation of requirements –  Scaling properties Ÿ  Need to Provide: –  App files –  Dependencies listed in requirements.txt file –  Optional manifest.yml file with configuration for deployment
  22. C F R O" U" T" E" R 2. Set

    up domain Cloud Controller Instance 1. Upload code 4. Copy app into containerised instances 3. Install Python & Dependencies 5. Start app and accept connections Send request to URL WHAT JUST HAPPENED? Source Code Instance $  cf  push   Browser 5. Load balance between instances
  23. 23 © Copyright 2015 Pivotal. All rights reserved. What just

    happened? 1.  Application code is uploaded to CF 2.  Domain URL is set up ready for routing 3.  Cloud controller builds application in container: –  Python interpreter selected –  Dependencies installed with pip 4.  Container is replicated to provide instances 5.  App starts and Router load balances requests Ÿ  See what’s happening using logs: $  cf  logs  myapp  -­‐-­‐recent      
  24. 24 © Copyright 2015 Pivotal. All rights reserved. Python on

    Cloud Foundry Ÿ  First class language (with Go, Java, Ruby, Node.js, PHP) Ÿ  Automatic app type detection –  Looks for requirements.txt or setup.py Ÿ  Buildpack takes care of –  Detecting that a Python app is being pushed –  Installing Python interpreter –  Installing packages in requirements.txt using pip –  Starting web app as requested (e.g. python myapp.py)
  25. 25 © Copyright 2015 Pivotal. All rights reserved. Scaling memory

    and instances Ÿ  We can scale our application as needed in terms of memory and number of instances: $  cf  scale  myapp  –i  5   $  cf  scale  myapp  –m  256M   Ÿ  Check app in browser to see different ports being used. Ÿ  Scale back down with $  cf  scale  myapp  –i  1  
  26. 26 © Copyright 2015 Pivotal. All rights reserved. How does

    this work? Ÿ  Containerised application is cached and ready to be deployed. Ÿ  Scaling number of instances replicates container and load balances requests across all instances. Ÿ  Scaling memory requires restarting app. Ÿ  Auto-scaling is also possible.
  27. 27 © Copyright 2015 Pivotal. All rights reserved. Buildpacks Challenge

    2: Pushing PyData apps
  28. 28 © Copyright 2015 Pivotal. All rights reserved. What are

    buildpacks? Ÿ  Idea and format from Heroku Ÿ  Responsible for doing whatever is necessary to get your app running. Ÿ  Buildpacks take care of –  Detecting which type of application is being pushed –  Installing the appropriate run-time –  Installing required dependencies or other artifacts –  Starting the application as requested Ÿ  Official buildpacks for Python, Java, Node.js, Go, Ruby, PHP & for static websites and running binaries
  29. 29 © Copyright 2015 Pivotal. All rights reserved. Containers vs

    Buildpacks runtime layer OS image application layer Container (e.g. Docker) system brings fixed host OS Kernel * Devs may bring a custom buildpack runtime layer* OS image application layer Buildpack App container System Provides Dev Provides system brings fixed host OS Kernel
  30. 30 © Copyright 2015 Pivotal. All rights reserved. Custom buildpacks

    Ÿ  Instead of using an official buildpack you can use any custom buildpack installed on your CF or available on Github. Ÿ  Only 3 shell scripts needed: – detect – compile – release Ÿ  Specify buildpack with -­‐b  or in manifest.yml
  31. 31 © Copyright 2015 Pivotal. All rights reserved. Community Buildpacks

    https://github.com/cloudfoundry-community/ cf-docs-contrib/wiki/Buildpacks Ÿ  Lots of languages: 
 Clojure, Haskell, .NET, Erlang, Elixir, etc. Ÿ  RShiny app buildpack: 
 https://github.com/alexkago/cf-buildpack-r Ÿ  Can also use some Heroku buildpacks without modification.
  32. 32 © Copyright 2015 Pivotal. All rights reserved. Official Python

    buildpack ü  Great for simple pip based requirements ü  Well tested and officially maintained ü  Covers both Python 2 and 3 ✗ Suffers from the Python Packaging Problem: -  Hard to build packages with C, C++ or Fortran extensions -  Complicated local configuration of libraries and paths needed -  Takes a long time to build main PyData packages from source
  33. 33 © Copyright 2015 Pivotal. All rights reserved. Using conda

    for package management Ÿ  http://conda.pydata.org Ÿ  Benefits: –  Uses precompiled binary packages –  No fiddling with Fortran or C compilers and library paths –  Known good combinations of main package versions –  Really simple environment management (better than virtualenv) –  Easy to run Python 2 and 3 side-by-side Go try it out if you haven’t already!
  34. 34 © Copyright 2015 Pivotal. All rights reserved. How to

    use the conda buildpack https://github.com/ihuston/python-conda-buildpack Ÿ  Specify as a custom buildpack when pushing app with manifest or -­‐b command line option. Ÿ  Export your current environment to a environment.yml file Ÿ  Or write requirements.txt (pip) and conda_requirements.txt Ÿ  Send me feedback & pull requests!
  35. 35 © Copyright 2015 Pivotal. All rights reserved. Challenge 2:

    Pushing a PyData app Ÿ  Code directory: 02-­‐pydata-­‐spyre-­‐app   Ÿ  Spyre – Adam Hajari https://github.com/adamhajari/spyre Ÿ  This app uses the Pydata buildpack to install Matplotlib, NumPy and more. Ÿ  Spyre provides a simple way to build interactive web based visualisations similar to Rshiny.
  36. 36 © Copyright 2015 Pivotal. All rights reserved. Using Services

    Challenge 3: Using Redis in an app
  37. 37 © Copyright 2015 Pivotal. All rights reserved. Services These

    are available on Pivotal CF (Pivotal’s packaged Cloud Foundry offering). See http://run.pivotal.io for the services available on Pivotal Web Services.
  38. 38 © Copyright 2015 Pivotal. All rights reserved. Cloud Native

    Applications Ÿ  Suitable for deployment on cloud platforms Ÿ  Can scale up easily without changes Ÿ  Follow these rules: http://12factor.net Ÿ  In a nutshell: –  Apps are stateless processes –  Easy to create and destroy apps with no side-effects –  Persistent state handled through backing services –  Interact with services through port binding
  39. 39 © Copyright 2015 Pivotal. All rights reserved. Challenge 3:

    Binding a service to an app Ÿ  Create a free Redis service: $  cf  cs  rediscloud  30mb  myredis   Ÿ  Bind to our earlier app: $  cf  bind-­‐service  myapp  myredis   Ÿ  See how apps find credentials in environmental variables: $  cf  env  myapp  
  40. 40 © Copyright 2015 Pivotal. All rights reserved. Challenge 3:

    Using Services in an app Ÿ  Code directory: 03-­‐services-­‐redis   Ÿ  This app binds to a Redis service if available and counts the number of hits on the app homepage. Ÿ  Add keys and values by adding /keyname/value to URL. Ÿ  Data is persisted in Redis, will survive deleting and restarting the app. Ÿ  Multiple instances all access the same service.
  41. User Provided Services   How to add User Provided Services:

    Standalone Hadoop or Apache Spark cluster, " Big Data System, RDBMS etc. $  cf  cups  SERVICE_INSTANCE  -­‐p   "host,  port,  username,  password"       Data Service App App App App App App
  42. 42 © Copyright 2015 Pivotal. All rights reserved. Putting it

    all together Challenge 4: Build your own prediction API
  43. R E S T A P I Send data as

    JSON Data Ingest Model Create Model Redis Kicking off periodic retraining Save training data Save model object Send JSON data without label Receive prediction from trained model instance Deployed at: http://dsoncf.cfapps.io Code: https://github.com/alexkago/ds-cfpylearning PREDICTION API ARCHITECTURE $  cf  create-­‐service   rediscloud   PLAN_NAME   INSTANCE_NAME  
  44. 44 © Copyright 2015 Pivotal. All rights reserved. Final challenge:

    Build your own predictive API Ÿ  Code directory: 04-­‐learning-­‐api   Ÿ  Simple Flask + scikit-learn based machine learning API Ÿ  Push the application and go to the app URL to see instructions on how to use. Ÿ  Model is built in scikit-learn and persisted in Redis Ÿ  Simplified version of this project by Alex Kagoshima:
 https://github.com/alexkago/ds-cfpylearning
  45. 45 © Copyright 2015 Pivotal. All rights reserved. Resources Ÿ 

    Docs: http://docs.cloudfoundry.org Ÿ  CF Summit videos: http://cfsummit.com Ÿ  Join the CF community: http://cloudfoundry.org Ÿ  CF meetups: http://cloud-foundry.meetup.com Ÿ  Don’t forget to stop your apps with cf  stop  myapp.
  46. 46 © Copyright 2015 Pivotal. All rights reserved. Show off

    your data science related Cloud Foundry apps: Twitter: @dsoncf http://dsoncf.com
  47. 47 © Copyright 2015 Pivotal. All rights reserved. @ianhuston