DevOps for Data Science
Experiences from building
a cloud-based data science pla6orm
...
Anand Chi)pothu
rorodata
Slide 2
Slide 2 text
Who is Speaking?
Anand Chi)pothu
@anandology
Co-founder and pla.orm architect of
@rorodata
Worked at Internet Archive & Strand Life
Sciences
Advanced programming courses at
@pipalacademy
Slide 3
Slide 3 text
DevOps for Data Science
Slide 4
Slide 4 text
Managing Data Science in Produc1on is Hard!
• The tools and prac0ces are not very mature
• Everyone ends up building their own solu0ons
• Building own solu0ons require careful system architecture and
complex devops
Slide 5
Slide 5 text
The Goal: The Effec-ve Data Science Team
• The data science team is self-sufficient to build end-to-end ML
applica8ons
• Not steep learning curve
Slide 6
Slide 6 text
How?
???
Slide 7
Slide 7 text
Inspira(on from Origami
Slide 8
Slide 8 text
Flapping Bird - Crease Pa1erns
Slide 9
Slide 9 text
Take Aways
• Focus on the experience
• Right-level of abstrac8on
Slide 10
Slide 10 text
No content
Slide 11
Slide 11 text
No content
Slide 12
Slide 12 text
Example: Heroku
To deploy a web applica.on:
$ git push heroku master
Slide 13
Slide 13 text
Case Studies
• Lauching notebooks
• Deploy ML models as an APIs
Slide 14
Slide 14 text
Case Study 1:
Launching Notebooks
Slide 15
Slide 15 text
Launching Notebooks - Challenges
• Switching between different compute needs
• Installing required so9ware dependencies
• Data storage
• GPU support
Slide 16
Slide 16 text
Abstrac(ons
• Project
• Run-me
• Instance Size
Slide 17
Slide 17 text
Project
Manages all the notebooks, data and the specified so4ware
dependencies.
Slide 18
Slide 18 text
Run$me
Base so'ware setup for the project.
• python3-tensorflow
• python3-keras
• python3-pytorch
Slide 19
Slide 19 text
Instance Size
Different instance sizes to pick from:
• S1 - 1 CPU core, 1 GB RAM
• S2 - 1 CPU core, 3.5 GB RAM
• M1 - 2 CPU core, 15 GB RAM
• X1 - 64 CPU core, 1024 GB RAM
• G1 - 4 CPU cores, 60GB RAM, 1 K100 GPU
Slide 20
Slide 20 text
How to specify addi/onal dependencies?
• runtime.txt: specifies the run.me
• environment.yml: conda environment file with python
dependencies
• requirements.txt: python dependencies to be installed
from pip
• apt.txt: system packages that need to be installed
• postBuild: script for custom needs
Wri$ng a Dockerfile is too low-level.
Slide 21
Slide 21 text
Behind the Scenes
• Two docker images are built for each project - One for CPU and
another for GPU
• Run@mes are also built using the same approach
• Manages compute instances
• Pools the compute resources to op@mize resource consump@on
• Uses a network file system to persist data and notebooks
• Automa@c endpoint and HTTPS management
Slide 22
Slide 22 text
Sample Usage (1/4)
Slide 23
Slide 23 text
Sample Usage (2/4)
Slide 24
Slide 24 text
Sample Usage (3/4)
Slide 25
Slide 25 text
Sample Usage (4/4)
Slide 26
Slide 26 text
Discussion
Slide 27
Slide 27 text
Case Study 2:
Deploying Machine Learning Models
Slide 28
Slide 28 text
Challenges
• Designing and documen/ng APIs
• Running the service and configuring URL endpoints
• Scale to meet the usage
• Client library to use the API
• Authen/ca/on
• Tracking the usage and performance
Slide 29
Slide 29 text
Tradi&onal Approach
Slide 30
Slide 30 text
Observa(ons
Running a Python func/on as an API is hard!
It doesn't have to be!!
Slide 31
Slide 31 text
Right-level of Abstrac3on!
Step 1: Write your func1on
# sq.py
def square(n):
"""compute square of a number.
"""
return n*n
Slide 32
Slide 32 text
Step 2: Run it as an API
$ firefly sq.square
http://127.0.0.1:8000/
...
Firefly is the open-source library that we built to solve that
problem.
Slide 33
Slide 33 text
Step 3: Use it
>>> import firefly
>>> api = firefly.Client("http://127.0.0.1:8000/")
>>> api.square(n=4)
16
Out-of the box client library to access the API.
Slide 34
Slide 34 text
What about ML models?
Write your predict func-on and run it as API.
# face_detection.py
import joblib
model = joblib.load("model.pkl")
def predict(image):
...
Slide 35
Slide 35 text
Integra(on
Write a config file in the project specifying what services to run.
services:
- name: api
function: face_detection.predict
size: S2
Slide 36
Slide 36 text
Need Javascript support?
services:
- name: api
function: face_detection.predict
size: S2
cors_allow_origins: *
Slide 37
Slide 37 text
Need to scale up?
services:
- name: api
function: face_detection.predict
size: S2
cors_allow_origins: *
scale: 4
Slide 38
Slide 38 text
The Push Bu)on
The deploy command submits the code to pla3orm and it starts the
requires services in that project.
$ roro deploy
...
Slide 39
Slide 39 text
Behind the Scenes
• It builts the docker images
• Starts the specified services
• Provides URL endpoints with HTTPS
Slide 40
Slide 40 text
Discussion
Slide 41
Slide 41 text
Good Design is Invisible!
Slide 42
Slide 42 text
Summary
• Making the data science team self-sufficient is key to their
produc9vity
• Op9mize for developer experience
• Right-level of abstrac9on is the key!