Slide 1

Slide 1 text

DevOps for Data Science Experiences from building a cloud-based data science pla6orm ... Anand Chi)pothu rorodata

Slide 2

Slide 2 text

Who is Speaking? Anand Chi)pothu @anandology Co-founder and pla.orm architect of @rorodata Worked at Internet Archive & Strand Life Sciences Advanced programming courses at @pipalacademy

Slide 3

Slide 3 text

DevOps for Data Science

Slide 4

Slide 4 text

Managing Data Science in Produc1on is Hard! • The tools and prac0ces are not very mature • Everyone ends up building their own solu0ons • Building own solu0ons require careful system architecture and complex devops

Slide 5

Slide 5 text

The Goal: The Effec-ve Data Science Team • The data science team is self-sufficient to build end-to-end ML applica8ons • Not steep learning curve

Slide 6

Slide 6 text

How? ???

Slide 7

Slide 7 text

Inspira(on from Origami

Slide 8

Slide 8 text

Flapping Bird - Crease Pa1erns

Slide 9

Slide 9 text

Take Aways • Focus on the experience • Right-level of abstrac8on

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

No content

Slide 12

Slide 12 text

Example: Heroku To deploy a web applica.on: $ git push heroku master

Slide 13

Slide 13 text

Case Studies • Lauching notebooks • Deploy ML models as an APIs

Slide 14

Slide 14 text

Case Study 1: Launching Notebooks

Slide 15

Slide 15 text

Launching Notebooks - Challenges • Switching between different compute needs • Installing required so9ware dependencies • Data storage • GPU support

Slide 16

Slide 16 text

Abstrac(ons • Project • Run-me • Instance Size

Slide 17

Slide 17 text

Project Manages all the notebooks, data and the specified so4ware dependencies.

Slide 18

Slide 18 text

Run$me Base so'ware setup for the project. • python3-tensorflow • python3-keras • python3-pytorch

Slide 19

Slide 19 text

Instance Size Different instance sizes to pick from: • S1 - 1 CPU core, 1 GB RAM • S2 - 1 CPU core, 3.5 GB RAM • M1 - 2 CPU core, 15 GB RAM • X1 - 64 CPU core, 1024 GB RAM • G1 - 4 CPU cores, 60GB RAM, 1 K100 GPU

Slide 20

Slide 20 text

How to specify addi/onal dependencies? • runtime.txt: specifies the run.me • environment.yml: conda environment file with python dependencies • requirements.txt: python dependencies to be installed from pip • apt.txt: system packages that need to be installed • postBuild: script for custom needs Wri$ng a Dockerfile is too low-level.

Slide 21

Slide 21 text

Behind the Scenes • Two docker images are built for each project - One for CPU and another for GPU • Run@mes are also built using the same approach • Manages compute instances • Pools the compute resources to op@mize resource consump@on • Uses a network file system to persist data and notebooks • Automa@c endpoint and HTTPS management

Slide 22

Slide 22 text

Sample Usage (1/4)

Slide 23

Slide 23 text

Sample Usage (2/4)

Slide 24

Slide 24 text

Sample Usage (3/4)

Slide 25

Slide 25 text

Sample Usage (4/4)

Slide 26

Slide 26 text

Discussion

Slide 27

Slide 27 text

Case Study 2: Deploying Machine Learning Models

Slide 28

Slide 28 text

Challenges • Designing and documen/ng APIs • Running the service and configuring URL endpoints • Scale to meet the usage • Client library to use the API • Authen/ca/on • Tracking the usage and performance

Slide 29

Slide 29 text

Tradi&onal Approach

Slide 30

Slide 30 text

Observa(ons Running a Python func/on as an API is hard! It doesn't have to be!!

Slide 31

Slide 31 text

Right-level of Abstrac3on! Step 1: Write your func1on # sq.py def square(n): """compute square of a number. """ return n*n

Slide 32

Slide 32 text

Step 2: Run it as an API $ firefly sq.square http://127.0.0.1:8000/ ... Firefly is the open-source library that we built to solve that problem.

Slide 33

Slide 33 text

Step 3: Use it >>> import firefly >>> api = firefly.Client("http://127.0.0.1:8000/") >>> api.square(n=4) 16 Out-of the box client library to access the API.

Slide 34

Slide 34 text

What about ML models? Write your predict func-on and run it as API. # face_detection.py import joblib model = joblib.load("model.pkl") def predict(image): ...

Slide 35

Slide 35 text

Integra(on Write a config file in the project specifying what services to run. services: - name: api function: face_detection.predict size: S2

Slide 36

Slide 36 text

Need Javascript support? services: - name: api function: face_detection.predict size: S2 cors_allow_origins: *

Slide 37

Slide 37 text

Need to scale up? services: - name: api function: face_detection.predict size: S2 cors_allow_origins: * scale: 4

Slide 38

Slide 38 text

The Push Bu)on The deploy command submits the code to pla3orm and it starts the requires services in that project. $ roro deploy ...

Slide 39

Slide 39 text

Behind the Scenes • It builts the docker images • Starts the specified services • Provides URL endpoints with HTTPS

Slide 40

Slide 40 text

Discussion

Slide 41

Slide 41 text

Good Design is Invisible!

Slide 42

Slide 42 text

Summary • Making the data science team self-sufficient is key to their produc9vity • Op9mize for developer experience • Right-level of abstrac9on is the key!

Slide 43

Slide 43 text

Thank You! Anand Chitipothu @anandology https://rorodata.com/