DevOps for Data Science

DevOps for Data Science Experiences from building a cloud-based data
science pla6orm ... Anand Chi)pothu rorodata

Who is Speaking? Anand Chi)pothu @anandology Co-founder and pla.orm architect
of @rorodata Worked at Internet Archive & Strand Life Sciences Advanced programming courses at @pipalacademy

Managing Data Science in Produc1on is Hard! • The tools
and prac0ces are not very mature • Everyone ends up building their own solu0ons • Building own solu0ons require careful system architecture and complex devops

The Goal: The Eﬀec-ve Data Science Team • The data
science team is self-suﬃcient to build end-to-end ML applica8ons • Not steep learning curve

How? ???

Inspira(on from Origami

Flapping Bird - Crease Pa1erns

Take Aways • Focus on the experience • Right-level of
abstrac8on

Example: Heroku To deploy a web applica.on: $ git push
heroku master

Case Studies • Lauching notebooks • Deploy ML models as
an APIs

Case Study 1: Launching Notebooks

Launching Notebooks - Challenges • Switching between diﬀerent compute needs
• Installing required so9ware dependencies • Data storage • GPU support

Abstrac(ons • Project • Run-me • Instance Size

Project Manages all the notebooks, data and the speciﬁed so4ware
dependencies.

Run$me Base so'ware setup for the project. • python3-tensorﬂow •
python3-keras • python3-pytorch

Instance Size Diﬀerent instance sizes to pick from: • S1
- 1 CPU core, 1 GB RAM • S2 - 1 CPU core, 3.5 GB RAM • M1 - 2 CPU core, 15 GB RAM • X1 - 64 CPU core, 1024 GB RAM • G1 - 4 CPU cores, 60GB RAM, 1 K100 GPU

How to specify addi/onal dependencies? • runtime.txt: speciﬁes the run.me
• environment.yml: conda environment ﬁle with python dependencies • requirements.txt: python dependencies to be installed from pip • apt.txt: system packages that need to be installed • postBuild: script for custom needs Wri$ng a Dockerfile is too low-level.

Behind the Scenes • Two docker images are built for
each project - One for CPU and another for GPU • Run@mes are also built using the same approach • Manages compute instances • Pools the compute resources to op@mize resource consump@on • Uses a network ﬁle system to persist data and notebooks • Automa@c endpoint and HTTPS management

Sample Usage (1/4)

Sample Usage (2/4)

Sample Usage (3/4)

Sample Usage (4/4)

Discussion

Case Study 2: Deploying Machine Learning Models

Challenges • Designing and documen/ng APIs • Running the service
and conﬁguring URL endpoints • Scale to meet the usage • Client library to use the API • Authen/ca/on • Tracking the usage and performance

Tradi&onal Approach

Observa(ons Running a Python func/on as an API is hard!
It doesn't have to be!!

Right-level of Abstrac3on! Step 1: Write your func1on # sq.py
def square(n): """compute square of a number. """ return n*n

Step 2: Run it as an API $ firefly sq.square
http://127.0.0.1:8000/ ... Fireﬂy is the open-source library that we built to solve that problem.

Step 3: Use it >>> import firefly >>> api =
firefly.Client("http://127.0.0.1:8000/") >>> api.square(n=4) 16 Out-of the box client library to access the API.

What about ML models? Write your predict func-on and run
it as API. # face_detection.py import joblib model = joblib.load("model.pkl") def predict(image): ...

Integra(on Write a conﬁg ﬁle in the project specifying what
services to run. services: - name: api function: face_detection.predict size: S2

Need Javascript support? services: - name: api function: face_detection.predict size:
S2 cors_allow_origins: *

Need to scale up? services: - name: api function: face_detection.predict
size: S2 cors_allow_origins: * scale: 4

The Push Bu)on The deploy command submits the code to
pla3orm and it starts the requires services in that project. $ roro deploy ...

Behind the Scenes • It builts the docker images •
Starts the speciﬁed services • Provides URL endpoints with HTTPS

Discussion

Good Design is Invisible!

Summary • Making the data science team self-suﬃcient is key
to their produc9vity • Op9mize for developer experience • Right-level of abstrac9on is the key!

Thank You! Anand Chitipothu @anandology https://rorodata.com/

DevOps for Data Science

DevOps for Data Science

More Decks by Anand Chitipothu

Other Decks in Technology

Featured

Transcript