Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Putting Data Science Models into production

98c35e22a5c8c92bb066efb332e30991?s=47 springcoil
October 07, 2014

Putting Data Science Models into production

An exploration of how to use ScienceOps to get a data science model into production. Key words: Data Products, Data Science, Mathematical Modelling, Ordinary Differential Equations

98c35e22a5c8c92bb066efb332e30991?s=128

springcoil

October 07, 2014
Tweet

Transcript

  1. peadarcoyle@googlemail.com All opinions my own Data Products Data Products Or

    how to get models into production Data Science Meetup Luxembourg October 29th, 2014
  2. Who am I? Who am I? I work as a

    Data Analytics Consultant for an IT Consultancy. Masters in Mathematics Specialized in Statistics and Machine Learning Recently I've been an analytics product architect
  3. When I was a data science neophyte I When I

    was a data science neophyte I ran into this conversation... ran into this conversation...
  4. But then I do laugh at things like this But

    then I do laugh at things like this
  5. And probably looked liked this And probably looked liked this

  6. Well Amazon have a celebrated recommendation engine... This is what

    I call a 'data product' So how do analysts serve the So how do analysts serve the business? business?
  7. Analytics Project Lifecycle Analytics Project Lifecycle

  8. To help the business most To help the business most

    I believe that data science offers the most value when the models are in production. Some of us call this a 'Data Product' In this talk I will explain how to use ScienceOps from Yhat to build a model in production Why should Amazon or Google get all the fun? Or competitive advantage?
  9. The story goes a little like this The story goes

    a little like this (Stolen from yhat slides)
  10. None
  11. We all get this kind of conversation We all get

    this kind of conversation
  12. It is hard to incorporate It is hard to incorporate

    data into day to data into day to day operations. day operations.
  13. Hiring data scientists is hard... Hiring data scientists is hard...

  14. None
  15. Why? Why? Well as I discovered - Software developers especially

    web designers speak a different language.... Try explaining what an ODE is to a web developer.... It is like explaining CSS and Javascript to me :) I'm sure there is a book that could be written about this.... Any volunteers?
  16. Write models in Ruby --> Turned out ruby doesn't have

    an ODE solver Possible Solutions (and their Possible Solutions (and their problems) problems) Port code to Java -----> Cross language validation PMML ----> Doesn't have great language support Batch Jobs -------> High maintenance and config More tools, more work, more time More tools, more work, more time
  17. So what can we do? So what can we do?

    Teach Software Developers math.....
  18. Or more realistically Or more realistically Use a product like

    ScienceOps from Yhat Key Tenets 1. Work with the tools you already know 2. Iterate quickly 3. Low touch 4. No rewriting code
  19. In my use case the ML server was actually running

    an Ordinary Differential Equation. This is specific to ScienceOps by Yhat, undoubtedly there are other products on the market. An alternative would be to build your own server and expose it via a service. The schema I used The schema I used
  20. Example Example

  21. Wake Vortex Project Wake Vortex Project (This is what I've

    been working on for a while)
  22. Result looks like this... Result looks like this...

  23. Who are Y-hat? Who are Y-hat? Yhat (pronounced y-hat) is

    a data science technology company that provides tools and systems that allow enterprises to turn data insights into data-driven products. ScienceOps, Yhat's flagship product, is a data science operations system for managing predictive and advanced decision-making APIs and workflows. From product recommendation systems to credit scoring models and customer attrition estimators, ScienceOps lets data science teams go from insight to prototype to data-driven product efficiently and at scale. (They helped me during my project so I promised to plug them)
  24. So if you wanna build a data product So if

    you wanna build a data product
  25. Why is it so hard? Why is it so hard?

    Different set of tools between analysts and software developers With advanced mathematics such as ODE or statistics not all languages have the libraries to do that effectively. For example Ruby doesn't have good Stats libraries Producing tools is harder than producing a report but provides a lot more value.
  26. Plus Explaining this to a Software Plus Explaining this to

    a Software Developer is hard! Developer is hard!
  27. This conversation happened This conversation happened

  28. The backlash against 'big data' The backlash against 'big data'

    1) There is a bit of backlash against 'big data' and data science 2) A possible solution is producing results and data products are a good way to do that.... 3) Producing results allows you to give knowledge to customers. The goal of data science is to turn data into knowledge!
  29. Lessons learned Lessons learned - I can write a model

    in Python and have it deployed! - Software Engineers aren't data scientists and shouldn't be expected to write models in code. - Models only provide value when they are in production - Getting information from stakeholders is really valuable in improving models. - A data scientist is often a 'translator' between business and developers.
  30. Challenges Challenges Data Quality is a challenge for all data

    science projects, this is no exception. ScienceOps has a great free service but it is a challenge when they shut you down :( Products such as ScienceOps bring us a step closer in covering the gap in understanding between Mathematician (or data scientist) and Web Developer. Someone should write a book titled 'Probability Distributions for Software Engineers'
  31. Successes Successes In a few months it was possible to

    have an analytics product in production, using information consumed from a variety of API's. I have no idea how else - maybe using PMML that I could deploy models.
  32. But when you finish a data product But when you

    finish a data product
  33. Other kinds of data science Products Other kinds of data

    science Products Credit risk modelling Customer attrition modelling Recommendation engines Airline delay analysis The list goes on....
  34. You can email me if you wanna chat about data

    peadarcoyle@googlemail.com Wanna learn more? Wanna learn more? For Yhat: www.yhathq.com They have a really good blog :)
  35. Questions? Questions?