Slide 1

Slide 1 text

[email protected] All opinions my own Data Products Data Products Or how to get models into production Data Science Meetup Luxembourg October 29th, 2014

Slide 2

Slide 2 text

Who am I? Who am I? I work as a Data Analytics Consultant for an IT Consultancy. Masters in Mathematics Specialized in Statistics and Machine Learning Recently I've been an analytics product architect

Slide 3

Slide 3 text

When I was a data science neophyte I When I was a data science neophyte I ran into this conversation... ran into this conversation...

Slide 4

Slide 4 text

But then I do laugh at things like this But then I do laugh at things like this

Slide 5

Slide 5 text

And probably looked liked this And probably looked liked this

Slide 6

Slide 6 text

Well Amazon have a celebrated recommendation engine... This is what I call a 'data product' So how do analysts serve the So how do analysts serve the business? business?

Slide 7

Slide 7 text

Analytics Project Lifecycle Analytics Project Lifecycle

Slide 8

Slide 8 text

To help the business most To help the business most I believe that data science offers the most value when the models are in production. Some of us call this a 'Data Product' In this talk I will explain how to use ScienceOps from Yhat to build a model in production Why should Amazon or Google get all the fun? Or competitive advantage?

Slide 9

Slide 9 text

The story goes a little like this The story goes a little like this (Stolen from yhat slides)

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

We all get this kind of conversation We all get this kind of conversation

Slide 12

Slide 12 text

It is hard to incorporate It is hard to incorporate data into day to data into day to day operations. day operations.

Slide 13

Slide 13 text

Hiring data scientists is hard... Hiring data scientists is hard...

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

Why? Why? Well as I discovered - Software developers especially web designers speak a different language.... Try explaining what an ODE is to a web developer.... It is like explaining CSS and Javascript to me :) I'm sure there is a book that could be written about this.... Any volunteers?

Slide 16

Slide 16 text

Write models in Ruby --> Turned out ruby doesn't have an ODE solver Possible Solutions (and their Possible Solutions (and their problems) problems) Port code to Java -----> Cross language validation PMML ----> Doesn't have great language support Batch Jobs -------> High maintenance and config More tools, more work, more time More tools, more work, more time

Slide 17

Slide 17 text

So what can we do? So what can we do? Teach Software Developers math.....

Slide 18

Slide 18 text

Or more realistically Or more realistically Use a product like ScienceOps from Yhat Key Tenets 1. Work with the tools you already know 2. Iterate quickly 3. Low touch 4. No rewriting code

Slide 19

Slide 19 text

In my use case the ML server was actually running an Ordinary Differential Equation. This is specific to ScienceOps by Yhat, undoubtedly there are other products on the market. An alternative would be to build your own server and expose it via a service. The schema I used The schema I used

Slide 20

Slide 20 text

Example Example

Slide 21

Slide 21 text

Wake Vortex Project Wake Vortex Project (This is what I've been working on for a while)

Slide 22

Slide 22 text

Result looks like this... Result looks like this...

Slide 23

Slide 23 text

Who are Y-hat? Who are Y-hat? Yhat (pronounced y-hat) is a data science technology company that provides tools and systems that allow enterprises to turn data insights into data-driven products. ScienceOps, Yhat's flagship product, is a data science operations system for managing predictive and advanced decision-making APIs and workflows. From product recommendation systems to credit scoring models and customer attrition estimators, ScienceOps lets data science teams go from insight to prototype to data-driven product efficiently and at scale. (They helped me during my project so I promised to plug them)

Slide 24

Slide 24 text

So if you wanna build a data product So if you wanna build a data product

Slide 25

Slide 25 text

Why is it so hard? Why is it so hard? Different set of tools between analysts and software developers With advanced mathematics such as ODE or statistics not all languages have the libraries to do that effectively. For example Ruby doesn't have good Stats libraries Producing tools is harder than producing a report but provides a lot more value.

Slide 26

Slide 26 text

Plus Explaining this to a Software Plus Explaining this to a Software Developer is hard! Developer is hard!

Slide 27

Slide 27 text

This conversation happened This conversation happened

Slide 28

Slide 28 text

The backlash against 'big data' The backlash against 'big data' 1) There is a bit of backlash against 'big data' and data science 2) A possible solution is producing results and data products are a good way to do that.... 3) Producing results allows you to give knowledge to customers. The goal of data science is to turn data into knowledge!

Slide 29

Slide 29 text

Lessons learned Lessons learned - I can write a model in Python and have it deployed! - Software Engineers aren't data scientists and shouldn't be expected to write models in code. - Models only provide value when they are in production - Getting information from stakeholders is really valuable in improving models. - A data scientist is often a 'translator' between business and developers.

Slide 30

Slide 30 text

Challenges Challenges Data Quality is a challenge for all data science projects, this is no exception. ScienceOps has a great free service but it is a challenge when they shut you down :( Products such as ScienceOps bring us a step closer in covering the gap in understanding between Mathematician (or data scientist) and Web Developer. Someone should write a book titled 'Probability Distributions for Software Engineers'

Slide 31

Slide 31 text

Successes Successes In a few months it was possible to have an analytics product in production, using information consumed from a variety of API's. I have no idea how else - maybe using PMML that I could deploy models.

Slide 32

Slide 32 text

But when you finish a data product But when you finish a data product

Slide 33

Slide 33 text

Other kinds of data science Products Other kinds of data science Products Credit risk modelling Customer attrition modelling Recommendation engines Airline delay analysis The list goes on....

Slide 34

Slide 34 text

You can email me if you wanna chat about data [email protected] Wanna learn more? Wanna learn more? For Yhat: www.yhathq.com They have a really good blog :)

Slide 35

Slide 35 text

Questions? Questions?