Slide 1

Slide 1 text

Operationalising Data Science on CF Ian Huston

Slide 2

Slide 2 text

Who am I? ● Data Scientist working with clients at Pivotal Labs ● Cloud Foundry user ● Community Buildpack writer @ianhuston ihuston

Slide 3

Slide 3 text

Everybody wants systems that are smarter, everybody wants systems that are more predictive, everybody wants everything scored, everybody wants to understand what’s the next best offer, next best opportunity, how to make things a little bit more efficient. Marc Benioff Forbes CIO Summit, March 7, 2016 “ ”

Slide 4

Slide 4 text

What’s the problem? If your Machine Learning model is not in production, it does not provide business value. A slide deck does not count as production!

Slide 5

Slide 5 text

Who has this problem? Data Scientists Want their model to make an impact Developers Want to add ML to their app CIOs/CDOs Want return on ‘Big Data’ investment Our Clients Want to implement their first ML models

Slide 6

Slide 6 text

Day 1 Problems Load and Transform Data Train the Predictive Model Connect to Incoming Data Apply Model Take Action Run the Model

Slide 7

Slide 7 text

‘Scoring As A Service’ Ingest Data Build Model elsewhere with offline data Serve Result or Take Action Apply Model Store Model

Slide 8

Slide 8 text

‘CF powered Learning’ Ingest Data Build Model in CF or elsewhere Serve Result or Take Action Apply Model Store Model Batch Update

Slide 9

Slide 9 text

In-Stream (Online) Learning Ingest Data Serve Result or Take Action Update & Apply Model

Slide 10

Slide 10 text

How can I do this? Build it: Spring Cloud Data Flow Marketplace data services Spring Boot Python ML microservices Use it: Initial offerings from GE, IBM, Alpine, Bosch

Slide 11

Slide 11 text

http://moves.cfapps.pez.pivotal.io/about

Slide 12

Slide 12 text

Day 2 Problems Which predictions were made with the old model or new? Do I need to continue serving the old predictions? Which library versions were used with old model and new? Has the data schema changed in the underlying system? Can I provide the right inputs for the different model versions? Can I replay the stream? Update the Model!

Slide 13

Slide 13 text

Using a Model Service ● Version control for model ● Parse data with varying schemas ● Serve appropriate model version based on consuming app ● Store underlying data for model re-training and reproducibility Ingest Data Model Service Serve Result or Take Action Apply Model Store Model

Slide 14

Slide 14 text

What’s Next for Data Science on CF? More data services More demos of data science/ML models Examples of successful projects Building blocks for building your own ML services dsoncf.com @ianhuston