Upgrade to Pro — share decks privately, control downloads, hide ads and more …

CF Summit - Data Science on Cloud Foundry

CF Summit - Data Science on Cloud Foundry

Talk by Ian Huston and Alexander Kagoshima at CFSummit 2015
Video: https://www.youtube.com/watch?v=n95hCVvuPKQ

Data Scientists frequently need to create applications that enable interactive data exploration, deliver predictive analytics APIs or simply publish results. Cloud Foundry provides an ideal platform for data scientists by making it easy to quickly deploy data driven apps backed by a variety of data stores. In this talk, Ian Huston will outline how to use Cloud Foundry for data science, describe how CF has been used in customer projects, explain why data services are essential, and discuss how community buildpacks enable data scientists to use their familiar R and Python Data packages with CF.

Ian Huston

May 11, 2015
Tweet

More Decks by Ian Huston

Other Decks in Technology

Transcript

  1. View Slide

  2. Data Science on Cloud Foundry
    Ian Huston @ianhuston
    Alexander Kagoshima @akagoshima

    View Slide

  3. Who are we?

    •  Data Scientists at Pivotal Labs
    •  Using Cloud Foundry since 2013
    •  Working with enterprises to get value out
    of their data

    View Slide

  4. Image by Drew Conway: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

    View Slide

  5. Data Scientist (n.):

    Person who is better at statistics than any
    software engineer and better at software
    engineering than any statistician.

    - Josh Wills

    View Slide

  6. Typical Projects
    Risk
    Analysis
    Predictive
    Maintenance
    Understanding
    Your Customer

    View Slide

  7. View Slide

  8. Data Services
    Easy control of incoming data

    View Slide

  9. Data Services
    Bind and scale system services
    –  Databases, NoSQL, message queues etc.
    $  cf  create-­‐service  rediscloud  PLAN_NAME  
    INSTANCE_NAME  
    $  cf  bind-­‐service  APP_NAME  INSTANCE_NAME  
     
    Add User Provided Services
    –  Standalone Hadoop or Apache Spark cluster,
    Big Data System
    $  cf  cups  SERVICE_INSTANCE  -­‐p  "host,  
    port,  username,  password"    
     
    Data Service
    App App App
    App
    App
    App

    View Slide

  10. Deploy a Model Prediction API
    Control distributed computation

    View Slide

  11. h"ps://github.com/ihuston/python-­‐conda-­‐buildpack  
    Install  PyData  packages  with  binary  builds  using  conda  

    View Slide

  12. h"ps://github.com/alexkago/cf-­‐buildpack-­‐r  
    R  interpreter  and  package  setup,  ready  for  RShiny  

    View Slide

  13. Siloed
    Data
    Siloed
    Systems
    Distributed
    Big Data
    Platform
    HOW TO
    DEPLOY
    MODELS? Data Extract
    ?
    (Model
    development
    happens here!)
    (Business
    needs model
    predictions
    here!)

    View Slide

  14. App
    App
    App
    App
    App
    Big Data Platform
    Big Data Storage

    View Slide

  15. R
    E
    S
    T

    A
    P
    I
    Send data as JSON
    Data
    Ingest
    Model
    Create Model
    Redis
    Kicking off
    periodic
    retraining
    Save training
    data
    Save model
    object
    Send JSON data
    without label
    Receive prediction
    from trained model
    instance
    Deployed at:
    http://dsoncf.cfapps.io
    Code:
    https://github.com/pivotalsoftware/ds-cfpylearning
    PREDICTION API
    ARCHITECTURE
    $  cf  create-­‐service  
    rediscloud  
    PLAN_NAME  
    INSTANCE_NAME  

    View Slide

  16. MODEL
    INTERFACE

    View Slide

  17. Data Driven Applications

    View Slide

  18. SIMPLE HTML + JS
    MODEL
    PREDICTIONS
    http://ds-demo-transport.cfapps.io

    View Slide

  19. RSHINY APP
    INTERACTIVE
    EXPLORATION
    https://ak-insurance-demo.cfapps.io:4443/  

    View Slide


  20. Show off your data
    science related Cloud
    Foundry apps:

    Twitter: @dsoncf
    http://dsoncf.com

    View Slide

  21. @ianhuston
    @akagoshima

    View Slide

  22. R
    E
    S
    T

    A
    P
    I
    Send data as JSON
    Data
    Ingest
    Model
    Create Model
    Redis
    Kicking off
    periodic
    retraining
    Save training
    data
    Save model
    object
    Send JSON data
    without label
    Receive prediction
    from trained model
    instance
    Deployed at:
    http://dsoncf.cfapps.io
    Code:
    https://github.com/pivotalsoftware/ds-cfpylearning
    Visualization
    PREDICTION API
    ARCHITECTURE

    View Slide

  23. Data Services
    Bind and scale system services
    –  Databases, NoSQL, message queues etc.
    $  cf  create-­‐service  rediscloud  PLAN_NAME  INSTANCE_NAME  
    $  cf  bind-­‐service  APP_NAME  INSTANCE_NAME  
     
    Add User Provided Services
    –  Standalone Hadoop or Apache Spark cluster, Big Data System
    $  cf  cups  SERVICE_INSTANCE  -­‐p  "host,  port,  username,  
    password"    
     

    View Slide