SF Data Mining January 26, 2015
Deploying predictive models
Nick Elprin
Domino Data Lab
dominodatalab.com
Slide 2
Slide 2 text
Who am I?
SF Data Mining January 26, 2015
• Founder of Domino Data Lab, a software platform for
enterprise data science
• Previously built analytical software at a big hedge fund
• BA, MS in computer science
Slide 3
Slide 3 text
Motivation
SF Data Mining January 26, 2015
Build predictive models
Build production
software systems
Different languages good for different tasks
Slide 4
Slide 4 text
Motivation
SF Data Mining January 26, 2015
Organizational design friction
Model improvements
Data Scientists Software Engineering
Delayed because of:
• Integration / porting of logic
• Out-of-phase release cycles
• Mismatched priorities
Slide 5
Slide 5 text
Solution
SF Data Mining January 26, 2015
Publish Consume
Data scientists create
predictive models and
publish them to Domino.
Domino provides a secure, low-
latency infrastructure for hosting
predictive models as web services
Developers can invoke models from
general purpose languages by
making simple HTTP calls
• Failover
• Security
• Logging
• Seamless updates
• etc
Slide 6
Slide 6 text
Demo
SF Data Mining January 26, 2015
Slide 7
Slide 7 text
Production concerns
SF Data Mining January 26, 2015
•Very low latency
•Zero-downtime upgrades
•High availability
•Reproducibility
•Logging
•Security
Slide 8
Slide 8 text
Best practices
SF Data Mining January 26, 2015
•Separate training, initialization, and prediction
•Make your prediction functions thread-safe
•Don’t mutate any shared state
•Leverage persistence/serialization tools (e.g., pickle)
Slide 9
Slide 9 text
Use cases
SF Data Mining January 26, 2015
•Lease / loan approval
•Recommendation systems
•Music, books, products, cars, etc
•Insurance
•Quoting premiums; claims estimates
Slide 10
Slide 10 text
SF Data Mining January 26, 2015
dominodatalab.com
blog.dominodatalab.com
@dominodatalab
Check us out
Webinar on parallel
programming in R and Python.
Jan 28, 10:30am
dominodatalab.com/webinar