Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PyData London 2015 - Getting Started with Cloud Foundry for Data Science

PyData London 2015 - Getting Started with Cloud Foundry for Data Science

Tutorial given at PyData London 2015

Cloud Foundry is an open source Platform-as-a-Service that can be used to easily deliver data driven applications. In this tutorial we will learn how to push an application to a real CF instance, how to connect to managed data services like Redis and how to deploy PyData and R projects using community buildpacks.

Ian Huston

June 19, 2015
Tweet

More Decks by Ian Huston

Other Decks in Technology

Transcript

  1. Getting Started with
    Cloud Foundry
    •  Ian Huston, Data Scientist
    Ian Huston, Data Scientist
    PyData London 2015

    View Slide

  2. 2
    © Copyright 2015 Pivotal. All rights reserved.
    Who am I?
    Ÿ  Ian Huston
    Ÿ  @ianhuston
    Ÿ  www.ianhuston.net
    Ÿ  Talk resources:

    https://github.com/ihuston/
    python-cf-examples
    Ÿ  Data Scientist
    Ÿ  Use PyData stack for
    predictive analytics and
    machine learning
    Ÿ  Previously a theoretical
    physicist using Python for
    numerical simulations & HPC

    View Slide

  3. 3
    © Copyright 2015 Pivotal. All rights reserved.
    Who are Pivotal?
    OPEN DATA
    PLATFORM
    Pivotal
    Big Data Suite

    View Slide

  4. 4
    © Copyright 2015 Pivotal. All rights reserved.
    Aims for this tutorial
    Goals:
    Ÿ  Understand how to use CF as a developer/data scientist
    Ÿ  Understand how to push PyData apps and bind to services
    Anti-goals:
    Ÿ  Understand CF from an operations viewpoint
    Ÿ  Install & run a full Cloud Foundry installation

    View Slide

  5. 5
    © Copyright 2015 Pivotal. All rights reserved.
    Plan
    Ÿ  What is Cloud Foundry?
    Ÿ  Getting started with a hosted CF instance
    Ÿ  Pushing your first app
    Ÿ  What are buildpacks?
    Ÿ  Using PyData packages
    Ÿ  Using data services like Redis
    Ÿ  Putting it all together

    View Slide

  6. 6
    © Copyright 2015 Pivotal. All rights reserved.
    Important Resources
    Clone this repo:
    https://github.com/ihuston/python-cf-examples


    OR download and unpack this file:
    http://tinyurl.com/cf-pydata

    View Slide

  7. What is Cloud Foundry?
    http://cloudfoundry.org

    Open Source
    Multi-Cloud Platform

    Simple App Deployment,
    Scaling & Availability

    View Slide

  8. Cloud Applications Haiku

    Here is my source code
    Run it on the cloud for me
    I do not care how.
    -  Onsi Fakhouri
    @onsijoe

    View Slide

  9. How can data scientists use CF?

    View Slide

  10. Data Services
    Easy control of incoming data

    View Slide

  11. Distributed computation

    View Slide

  12. Data Driven Applications

    View Slide

  13. 13
    © Copyright 2015 Pivotal. All rights reserved.
    Multi-cloud
    Ÿ  Same applications running across different cloud providers:
    Private Public Hosted

    View Slide

  14. 14
    © Copyright 2015 Pivotal. All rights reserved.
    Cloud Foundry Foundation
    PLATINUM
    GOLD
    SILVER
    See relative contributions at http://dashboard.cloudfoundry.org/

    View Slide

  15. 15
    © Copyright 2015 Pivotal. All rights reserved.
    Hosted options
    Ÿ  IBM Bluemix
    Ÿ  Anynines
    Ÿ  HP Helion Development
    Platform
    Ÿ  Activestate Stackato
    Ÿ  CenturyLink AppFog

    View Slide

  16. 16
    © Copyright 2015 Pivotal. All rights reserved.
    Getting Started on Pivotal Web Services
    Ÿ  Go to http://run.pivotal.io
    Ÿ  Trial lasts for 60 days, no credit card required
    Ÿ  Click ‘Sign Up’ and enter your email
    Ÿ  Open confirmation email and verify registration
    Ÿ  Complete SMS verification check
    Ÿ  Choose an organisation name (your name/handle is fine)

    View Slide

  17. 17
    © Copyright 2015 Pivotal. All rights reserved.
    Downloading the Command Line Interface
    Ÿ  Register at http://run.pivotal.io
    Ÿ  In the CF management website click ‘Tools’ and download
    appropriate package
    Ÿ  Or go to https://github.com/cloudfoundry/cli
    Ÿ  Or use homebrew on OSX:
    $  brew  tap  pivotal/tap  
    $  brew  install  cloudfoundry-­‐cli  
    Ÿ  Provides the cf command, your interface to Cloud Foundry

    View Slide

  18. 18
    © Copyright 2015 Pivotal. All rights reserved.
    cf push
    Challenge 1: Pushing your first app

    View Slide

  19. 19
    © Copyright 2015 Pivotal. All rights reserved.
    Logging in to PWS
    Ÿ  Choose PWS API endpoint: 

     
     $  cf  api  https://api.run.pivotal.io  
    Ÿ  Login:  
     
     $  cf  login  
    Ÿ  Enter your email address and password
    Ÿ  You should see output like:

    API  endpoint:  https://api.run.pivotal.io  (API  version:2.28.0)  
    User:                      YOUR_USER_NAME  
    Org:                        YOUR_ORG_NAME  
    Space:                    development  

    View Slide

  20. 20
    © Copyright 2015 Pivotal. All rights reserved.
    Challenge 1: Pushing your first application
    Ÿ  Choose PWS API endpoint: $  cf  api  https://api.run.pivotal.io  
    Ÿ  Login:
     $  cf  login
    Ÿ  Code directory: 01-­‐simple-­‐python-­‐app  
    Ÿ  Deploy with
    $  cf  push
    Ÿ  Check it worked at http://myapp-RANDOM-WORDS.cfapps.io
    Ÿ  Turn off your app when finished with cf  stop  myapp (but not yet!)

    View Slide

  21. 21
    © Copyright 2015 Pivotal. All rights reserved.
    Simple Flask App Demo
    Ÿ  Simple one page “Hello World” web app
    Ÿ  Video: https://www.youtube.com/watch?v=QOfD6tnoAB8
    Ÿ  Demonstrates:
    –  Installation of requirements
    –  Scaling properties
    Ÿ  Need to Provide:
    –  App files
    –  Dependencies listed in requirements.txt file
    –  Optional manifest.yml file with configuration for deployment

    View Slide

  22. C
    F

    R
    O"
    U"
    T"
    E"
    R
    2. Set up domain
    Cloud
    Controller
    Instance
    1. Upload code
    4. Copy app into
    containerised
    instances
    3. Install Python
    &
    Dependencies
    5. Start app
    and accept
    connections
    Send request to URL
    WHAT JUST
    HAPPENED?
    Source
    Code
    Instance
    $  cf  push  
    Browser
    5. Load balance
    between
    instances

    View Slide

  23. 23
    © Copyright 2015 Pivotal. All rights reserved.
    What just happened?
    1.  Application code is uploaded to CF
    2.  Domain URL is set up ready for routing
    3.  Cloud controller builds application in container:
    –  Python interpreter selected
    –  Dependencies installed with pip
    4.  Container is replicated to provide instances
    5.  App starts and Router load balances requests
    Ÿ  See what’s happening using logs: $  cf  logs  myapp  -­‐-­‐recent      

    View Slide

  24. 24
    © Copyright 2015 Pivotal. All rights reserved.
    Python on Cloud Foundry
    Ÿ  First class language (with Go, Java, Ruby, Node.js, PHP)
    Ÿ  Automatic app type detection
    –  Looks for requirements.txt or setup.py
    Ÿ  Buildpack takes care of
    –  Detecting that a Python app is being pushed
    –  Installing Python interpreter
    –  Installing packages in requirements.txt using pip
    –  Starting web app as requested (e.g. python myapp.py)

    View Slide

  25. 25
    © Copyright 2015 Pivotal. All rights reserved.
    Scaling memory and instances
    Ÿ  We can scale our application as needed in terms of memory
    and number of instances:
    $  cf  scale  myapp  –i  5  
    $  cf  scale  myapp  –m  256M  
    Ÿ  Check app in browser to see different ports being used.
    Ÿ  Scale back down with $  cf  scale  myapp  –i  1  

    View Slide

  26. 26
    © Copyright 2015 Pivotal. All rights reserved.
    How does this work?
    Ÿ  Containerised application is cached and ready to be
    deployed.
    Ÿ  Scaling number of instances replicates container and load
    balances requests across all instances.
    Ÿ  Scaling memory requires restarting app.
    Ÿ  Auto-scaling is also possible.

    View Slide

  27. 27
    © Copyright 2015 Pivotal. All rights reserved.
    Buildpacks
    Challenge 2: Pushing PyData apps

    View Slide

  28. 28
    © Copyright 2015 Pivotal. All rights reserved.
    What are buildpacks?
    Ÿ  Idea and format from Heroku
    Ÿ  Responsible for doing whatever is necessary to get your
    app running.
    Ÿ  Buildpacks take care of
    –  Detecting which type of application is being pushed
    –  Installing the appropriate run-time
    –  Installing required dependencies or other artifacts
    –  Starting the application as requested
    Ÿ  Official buildpacks for Python, Java, Node.js, Go, Ruby,
    PHP & for static websites and running binaries

    View Slide

  29. 29
    © Copyright 2015 Pivotal. All rights reserved.
    Containers vs Buildpacks
    runtime layer
    OS image
    application layer
    Container (e.g. Docker)
    system brings fixed
    host OS Kernel
    * Devs may bring a custom
    buildpack
    runtime layer*
    OS image
    application layer
    Buildpack
    App container
    System Provides
    Dev Provides
    system brings fixed
    host OS Kernel

    View Slide

  30. 30
    © Copyright 2015 Pivotal. All rights reserved.
    Custom buildpacks
    Ÿ  Instead of using an official buildpack you can use any
    custom buildpack installed on your CF or available on
    Github.
    Ÿ  Only 3 shell scripts needed:
    – detect
    – compile
    – release
    Ÿ  Specify buildpack with -­‐b  or in manifest.yml

    View Slide

  31. 31
    © Copyright 2015 Pivotal. All rights reserved.
    Community Buildpacks
    https://github.com/cloudfoundry-community/
    cf-docs-contrib/wiki/Buildpacks
    Ÿ  Lots of languages: 

    Clojure, Haskell, .NET, Erlang, Elixir, etc.
    Ÿ  RShiny app buildpack: 

    https://github.com/alexkago/cf-buildpack-r
    Ÿ  Can also use some Heroku buildpacks without modification.

    View Slide

  32. 32
    © Copyright 2015 Pivotal. All rights reserved.
    Official Python buildpack
    ü  Great for simple pip based requirements
    ü  Well tested and officially maintained
    ü  Covers both Python 2 and 3
    ✗ Suffers from the Python Packaging Problem:
    -  Hard to build packages with C, C++ or Fortran extensions
    -  Complicated local configuration of libraries and paths needed
    -  Takes a long time to build main PyData packages from source

    View Slide

  33. 33
    © Copyright 2015 Pivotal. All rights reserved.
    Using conda for package management
    Ÿ  http://conda.pydata.org
    Ÿ  Benefits:
    –  Uses precompiled binary packages
    –  No fiddling with Fortran or C compilers and library paths
    –  Known good combinations of main package versions
    –  Really simple environment management (better than virtualenv)
    –  Easy to run Python 2 and 3 side-by-side
    Go try it out if you haven’t already!

    View Slide

  34. 34
    © Copyright 2015 Pivotal. All rights reserved.
    How to use the conda buildpack
    https://github.com/ihuston/python-conda-buildpack
    Ÿ  Specify as a custom buildpack when pushing app with
    manifest or -­‐b command line option.
    Ÿ  Export your current environment to a environment.yml file
    Ÿ  Or write requirements.txt (pip) and conda_requirements.txt
    Ÿ  Send me feedback & pull requests!

    View Slide

  35. 35
    © Copyright 2015 Pivotal. All rights reserved.
    Challenge 2: Pushing a PyData app
    Ÿ  Code directory: 02-­‐pydata-­‐spyre-­‐app  
    Ÿ  Spyre – Adam Hajari https://github.com/adamhajari/spyre
    Ÿ  This app uses the Pydata buildpack to install Matplotlib,
    NumPy and more.
    Ÿ  Spyre provides a simple way to build interactive web based
    visualisations similar to Rshiny.

    View Slide

  36. 36
    © Copyright 2015 Pivotal. All rights reserved.
    Using Services
    Challenge 3: Using Redis in an app

    View Slide

  37. 37
    © Copyright 2015 Pivotal. All rights reserved.
    Services
    These are available on Pivotal CF (Pivotal’s packaged Cloud Foundry offering).
    See http://run.pivotal.io for the services available on Pivotal Web Services.

    View Slide

  38. 38
    © Copyright 2015 Pivotal. All rights reserved.
    Cloud Native Applications
    Ÿ  Suitable for deployment on cloud platforms
    Ÿ  Can scale up easily without changes
    Ÿ  Follow these rules: http://12factor.net
    Ÿ  In a nutshell:
    –  Apps are stateless processes
    –  Easy to create and destroy apps with no side-effects
    –  Persistent state handled through backing services
    –  Interact with services through port binding

    View Slide

  39. 39
    © Copyright 2015 Pivotal. All rights reserved.
    Challenge 3: Binding a service to an app
    Ÿ  Create a free Redis service:
    $  cf  cs  rediscloud  30mb  myredis  
    Ÿ  Bind to our earlier app:
    $  cf  bind-­‐service  myapp  myredis  
    Ÿ  See how apps find credentials in environmental variables:
    $  cf  env  myapp  

    View Slide

  40. 40
    © Copyright 2015 Pivotal. All rights reserved.
    Challenge 3: Using Services in an app
    Ÿ  Code directory: 03-­‐services-­‐redis  
    Ÿ  This app binds to a Redis service if available and counts the
    number of hits on the app homepage.
    Ÿ  Add keys and values by adding /keyname/value to URL.
    Ÿ  Data is persisted in Redis, will survive deleting and
    restarting the app.
    Ÿ  Multiple instances all access the same service.

    View Slide

  41. User Provided Services
     
    How to add User Provided Services:
    Standalone Hadoop or Apache Spark cluster, "
    Big Data System, RDBMS etc.
    $  cf  cups  SERVICE_INSTANCE  -­‐p  
    "host,  port,  username,  password"    
     
    Data Service
    App App App
    App
    App
    App

    View Slide

  42. 42
    © Copyright 2015 Pivotal. All rights reserved.
    Putting it all together
    Challenge 4: Build your own
    prediction API

    View Slide

  43. R
    E
    S
    T

    A
    P
    I
    Send data as JSON
    Data
    Ingest
    Model
    Create Model
    Redis
    Kicking off
    periodic
    retraining
    Save training
    data
    Save model
    object
    Send JSON data
    without label
    Receive prediction
    from trained model
    instance
    Deployed at:
    http://dsoncf.cfapps.io
    Code:
    https://github.com/alexkago/ds-cfpylearning
    PREDICTION API
    ARCHITECTURE
    $  cf  create-­‐service  
    rediscloud  
    PLAN_NAME  
    INSTANCE_NAME  

    View Slide

  44. 44
    © Copyright 2015 Pivotal. All rights reserved.
    Final challenge: Build your own predictive API
    Ÿ  Code directory: 04-­‐learning-­‐api  
    Ÿ  Simple Flask + scikit-learn based machine learning API
    Ÿ  Push the application and go to the app URL to see
    instructions on how to use.
    Ÿ  Model is built in scikit-learn and persisted in Redis
    Ÿ  Simplified version of this project by Alex Kagoshima:

    https://github.com/alexkago/ds-cfpylearning

    View Slide

  45. 45
    © Copyright 2015 Pivotal. All rights reserved.
    Resources
    Ÿ  Docs: http://docs.cloudfoundry.org
    Ÿ  CF Summit videos: http://cfsummit.com
    Ÿ  Join the CF community: http://cloudfoundry.org
    Ÿ  CF meetups: http://cloud-foundry.meetup.com
    Ÿ  Don’t forget to stop your apps with cf  stop  myapp.

    View Slide

  46. 46
    © Copyright 2015 Pivotal. All rights reserved.

    Show off your data
    science related Cloud
    Foundry apps:

    Twitter: @dsoncf
    http://dsoncf.com

    View Slide

  47. 47
    © Copyright 2015 Pivotal. All rights reserved.
    @ianhuston

    View Slide