CF Summit - Data Science on Cloud Foundry

CF Summit - Data Science on Cloud Foundry

Talk by Ian Huston and Alexander Kagoshima at CFSummit 2015
Video: https://www.youtube.com/watch?v=n95hCVvuPKQ

Data Scientists frequently need to create applications that enable interactive data exploration, deliver predictive analytics APIs or simply publish results. Cloud Foundry provides an ideal platform for data scientists by making it easy to quickly deploy data driven apps backed by a variety of data stores. In this talk, Ian Huston will outline how to use Cloud Foundry for data science, describe how CF has been used in customer projects, explain why data services are essential, and discuss how community buildpacks enable data scientists to use their familiar R and Python Data packages with CF.

41d2c569bbfbec97e0ab6fd2a8c261b7?s=128

Ian Huston

May 11, 2015
Tweet

Transcript

  1. None
  2. Data Science on Cloud Foundry Ian Huston @ianhuston Alexander Kagoshima

    @akagoshima
  3. Who are we? •  Data Scientists at Pivotal Labs • 

    Using Cloud Foundry since 2013 •  Working with enterprises to get value out of their data
  4. Image by Drew Conway: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

  5. Data Scientist (n.): Person who is better at statistics than

    any software engineer and better at software engineering than any statistician. - Josh Wills
  6. Typical Projects Risk Analysis Predictive Maintenance Understanding Your Customer

  7. None
  8. Data Services Easy control of incoming data

  9. Data Services Bind and scale system services –  Databases, NoSQL,

    message queues etc. $  cf  create-­‐service  rediscloud  PLAN_NAME   INSTANCE_NAME   $  cf  bind-­‐service  APP_NAME  INSTANCE_NAME     Add User Provided Services –  Standalone Hadoop or Apache Spark cluster, Big Data System $  cf  cups  SERVICE_INSTANCE  -­‐p  "host,   port,  username,  password"       Data Service App App App App App App
  10. Deploy a Model Prediction API Control distributed computation

  11. h"ps://github.com/ihuston/python-­‐conda-­‐buildpack   Install  PyData  packages  with  binary  builds  using  conda

     
  12. h"ps://github.com/alexkago/cf-­‐buildpack-­‐r   R  interpreter  and  package  setup,  ready  for  RShiny

     
  13. Siloed Data Siloed Systems Distributed Big Data Platform HOW TO

    DEPLOY MODELS? Data Extract ? (Model development happens here!) (Business needs model predictions here!)
  14. App App App App App Big Data Platform Big Data

    Storage
  15. R E S T A P I Send data as

    JSON Data Ingest Model Create Model Redis Kicking off periodic retraining Save training data Save model object Send JSON data without label Receive prediction from trained model instance Deployed at: http://dsoncf.cfapps.io Code: https://github.com/pivotalsoftware/ds-cfpylearning PREDICTION API ARCHITECTURE $  cf  create-­‐service   rediscloud   PLAN_NAME   INSTANCE_NAME  
  16. MODEL INTERFACE

  17. Data Driven Applications

  18. SIMPLE HTML + JS MODEL PREDICTIONS http://ds-demo-transport.cfapps.io

  19. RSHINY APP INTERACTIVE EXPLORATION https://ak-insurance-demo.cfapps.io:4443/  

  20. Show off your data science related Cloud Foundry apps: Twitter:

    @dsoncf http://dsoncf.com
  21. @ianhuston @akagoshima

  22. R E S T A P I Send data as

    JSON Data Ingest Model Create Model Redis Kicking off periodic retraining Save training data Save model object Send JSON data without label Receive prediction from trained model instance Deployed at: http://dsoncf.cfapps.io Code: https://github.com/pivotalsoftware/ds-cfpylearning Visualization PREDICTION API ARCHITECTURE
  23. Data Services Bind and scale system services –  Databases, NoSQL,

    message queues etc. $  cf  create-­‐service  rediscloud  PLAN_NAME  INSTANCE_NAME   $  cf  bind-­‐service  APP_NAME  INSTANCE_NAME     Add User Provided Services –  Standalone Hadoop or Apache Spark cluster, Big Data System $  cf  cups  SERVICE_INSTANCE  -­‐p  "host,  port,  username,   password"