Tania Allard, PhD @ixek
Developer Advocate @Microsoft
Google Developer expert ML -Tensorflow
Practical DevOps
for the busy Data
Scientist
OSCON 2019
http://bit.ly/OSCON-MLOps
Slide 2
Slide 2 text
2
Story time….
@ixek
Down the rabbit hole
Slide 3
Slide 3 text
3
@ixek
Slide 4
Slide 4 text
4
A common
story...
@ixek
Model / application to be
productised
R&D - develop, iterate
fast, usually local or
cloud
Magic
Is it live??
Slide 5
Slide 5 text
5
@ixek
Slide 6
Slide 6 text
6
Replacing the
magic
@ixek
Model/app ready to
productise
R&D - develop, iterate fast,
usually local or cloud
MLOPs, automation,
controlled deployment
Worry free deployment!
Wait and relax
Slide 7
Slide 7 text
7
@ixek
How skills are perceived
Slide 8
Slide 8 text
8
@ixek
Slide 9
Slide 9 text
9
@ixek
Slide 10
Slide 10 text
10
DevOps is the union of people, process, and
products to enable continuous delivery of value
into production
What is
DevOps
anyway?
@ixek
Slide 11
Slide 11 text
11
Sort of DevOps applied to data-intensive
applications.
Requires close collaboration between engineers,
data scientists, architects, data engineers and
Ops.
How does it
fit for DS?
@ixek
Slide 12
Slide 12 text
12
Story time…. The advice… getting
started with MLOps
Slide 13
Slide 13 text
13
@ixek
MlOps Aims to reduce the end-to-end cycle time
of data analytics/science from the origin of ideas
to the creation of data artifacts.
Slide 14
Slide 14 text
14
@ixek
What to
automate?
Establish
checkpoints
Find the low
hanging fruits
How stable and
robust are my
processes?
Devise a long
term strategy
What can I
readily
improve?
Where am I?
Can I count?
Getting
started
Slide 15
Slide 15 text
15
It’s all
madness
@ixek
Slide 16
Slide 16 text
No content
Slide 17
Slide 17 text
No content
Slide 18
Slide 18 text
18
@ixek
Practical steps
Slide 19
Slide 19 text
19
Keep everything in source control (data, code,
infrastructure) - but allow for experimentation
@ixek
Slide 20
Slide 20 text
20
@ixek
Slide 21
Slide 21 text
21
@ixek
Slide 22
Slide 22 text
22
Standardize and define your environments in
code (conda, pipfiles, Docker)
@ixek
Slide 23
Slide 23 text
23
Use canonical data sources - always know what
data you are using (where it comes and goes)
@ixek
Slide 24
Slide 24 text
24
@ixek
Slide 25
Slide 25 text
25
Automate wisely
@ixek
Slide 26
Slide 26 text
26
What and when to
automate?
@ixek
● What should we automate?
● Define success and failure metrics
● Go from simple to complex tasks
● Evaluate and monitor
Slide 27
Slide 27 text
27
https://xkcd.com/1205/
Slide 28
Slide 28 text
28
@ixek
Slide 29
Slide 29 text
29
Use pipelines for repeatability and explainability
@ixek
Slide 30
Slide 30 text
No content
Slide 31
Slide 31 text
No content
Slide 32
Slide 32 text
32
Deploy portable models
@ixek
Slide 33
Slide 33 text
33
@ixek
Slide 34
Slide 34 text
34
Test continuously and monitor production: push
left
@ixek
Slide 35
Slide 35 text
35
@ixek
Slide 36
Slide 36 text
36
Summary
@ixek
1. DataOps help create value and improve
end-to-end ML
2. Start by identifying the low-hanging fruits and
defining automation success
3. Choose the right tooling and processes
4. Leverage people and processes
5. Implement wisely