An exploration of how to use ScienceOps to get a data science model into production. Key words: Data Products, Data Science, Mathematical Modelling, Ordinary Differential Equations
Data Analytics Consultant for an IT Consultancy. Masters in Mathematics Specialized in Statistics and Machine Learning Recently I've been an analytics product architect
I believe that data science offers the most value when the models are in production. Some of us call this a 'Data Product' In this talk I will explain how to use ScienceOps from Yhat to build a model in production Why should Amazon or Google get all the fun? Or competitive advantage?
web designers speak a different language.... Try explaining what an ODE is to a web developer.... It is like explaining CSS and Javascript to me :) I'm sure there is a book that could be written about this.... Any volunteers?
an ODE solver Possible Solutions (and their Possible Solutions (and their problems) problems) Port code to Java -----> Cross language validation PMML ----> Doesn't have great language support Batch Jobs -------> High maintenance and config More tools, more work, more time More tools, more work, more time
an Ordinary Differential Equation. This is specific to ScienceOps by Yhat, undoubtedly there are other products on the market. An alternative would be to build your own server and expose it via a service. The schema I used The schema I used
a data science technology company that provides tools and systems that allow enterprises to turn data insights into data-driven products. ScienceOps, Yhat's flagship product, is a data science operations system for managing predictive and advanced decision-making APIs and workflows. From product recommendation systems to credit scoring models and customer attrition estimators, ScienceOps lets data science teams go from insight to prototype to data-driven product efficiently and at scale. (They helped me during my project so I promised to plug them)
Different set of tools between analysts and software developers With advanced mathematics such as ODE or statistics not all languages have the libraries to do that effectively. For example Ruby doesn't have good Stats libraries Producing tools is harder than producing a report but provides a lot more value.
1) There is a bit of backlash against 'big data' and data science 2) A possible solution is producing results and data products are a good way to do that.... 3) Producing results allows you to give knowledge to customers. The goal of data science is to turn data into knowledge!
in Python and have it deployed! - Software Engineers aren't data scientists and shouldn't be expected to write models in code. - Models only provide value when they are in production - Getting information from stakeholders is really valuable in improving models. - A data scientist is often a 'translator' between business and developers.
science projects, this is no exception. ScienceOps has a great free service but it is a challenge when they shut you down :( Products such as ScienceOps bring us a step closer in covering the gap in understanding between Mathematician (or data scientist) and Web Developer. Someone should write a book titled 'Probability Distributions for Software Engineers'
have an analytics product in production, using information consumed from a variety of API's. I have no idea how else - maybe using PMML that I could deploy models.