Operationalizing Data Science using Cloud Foundry

Operationalizing Data Science using Cloud Foundry

SpringOne Platform 2016

Video: https://www.youtube.com/watch?v=SczNclMttcc


Lawrence Spracklen

August 03, 2016


  1. Operationalizing Data Science using Cloud Foundry Alpine Data Lawrence Spracklen

  2. 2 Alpine Data

  3. 3 Operationalization •  What happens after the models are created?

    •  How does the business benefit from the insights? •  Operationalization is frequently the weak link –  Operationalizing PowerPoint? –  Hand rolled scoring flows?
  4. 4 Barriers to Model Ops •  Scoring often performed on

    a different data source to training •  Batch training versus RT/stream scoring •  How frequently are models updated? •  How is performance monitored?
  5. 5 Define Act Transform Deploy Model Business Leader Employees and

    Customers Chorus 6 The Chorus Process
  6. 6 Turn-key solutions Train models Coordinate Govern Deploy RESTful Scoring

  7. 7 Pivotal BDS •  Provides support for high-performance SQL on

    both Hadoop and traditional data warehouses –  HDB/HAWQ and GreenPlum •  Alpine supports SQL & MADlib accelerated machine learning algorithms on both HAWQ and GPDB •  Alpine models trained on HAWQ can be scored on GPDB and vice versa
  8. 8 Cloud Foundry (CF) •  Models trained on HAWQ or

    GPDB may not be scored against these systems –  May not use the Hadoop cluster at all •  Need standalone scoring support –  Readily deployed, maintained and scaled to meet the requirements of specific customers •  CF provides an elegant way to deploy scalable scoring engines –  Across a variety of public and private clouds and datacenters •  Require execution framework agnostic way to specify models
  9. 9 PMML •  XML based predictive model interchange format – 

    Created in 1998 –  Version 4.3 just released •  Good for specifying many common model types •  Limited support for complex data preprocessing –  Can require companion scripts/code •  Broad PMML export support •  Limited import support
  10. 10 Turn-key model updates Conditionally push model to Cloud Foundary

    Scoring engine
  11. 11 Turnkey Model Ops 1) Launch CF scoring engine 2)

    Configure export 3) Score data Curl –X POST …
  12. 12 PFA •  Portable Format for analytics is the JSON-

    based successor to PMML –  Version 0.8.1 available •  Significant flexibility in encapsulating complex data pre- and post-processing
  13. 13 Complex scoring flows Push entire flow to Alpine Scoring

    Engine Trained Model Pre-processing Scoring Flow
  14. 14 PFA Support •  Not only model operators need to

    export PFA •  Process entire DAG from raw data input to final model output –  Synthetize PFA doc to represent the flow •  PFA is capable of representing many key operations –  Much richer than PMML •  Provides support for supplemental info to be leveraged by the scoring flows
  15. 15 Conclusions •  Operationalization of Data Science findings often overlooked

    •  Need easy model deployment to ensure maximum impact •  PFA makes it much simpler to deploy complex scoring flows •  Pivotal + Alpine Chorus provide turn-key model operationalization support
  16. 16 Questions? lawrence@alpinenow.com

  17. Additional information

  18. 18 External interaction Streaming RESTful Batch Hadoop DB Chorus Alpine

    IR JDBC PFA/SQL/PMML Internal External
  19. 19 Model lifecycles Sources Stores Models Engines Apps Actions