Slide 1

Slide 1 text

BUILDING PREDICTION API AS A SERVICE KYRYLO PEREVOZCHYKOV

Slide 2

Slide 2 text

WHO AM I KYRYLO PEREVOZCHYKOV ‣ Easy-to-use Machine Learning service that helps people building models and predict on them ‣ My daily tasks split 70% - development, 30% - operations. ‣ Calling myself software engineer 2

Slide 3

Slide 3 text

MACHINE LEARNING PIPELINE MACHINE LEARNING AS A SERVICE PIPELINE DATA INTAKE DATA TRANSFORM BUILD MODEL DEPLOY MODEL PREDICT

Slide 4

Slide 4 text

USE CASES USE CASES Inputs Application Outputs Products clicked Product recommendation List of similar products

Slide 5

Slide 5 text

USE CASES USE CASES Inputs Application Outputs Item (house, airline ticket) Price intelligence and forecasting Price trend and likelihood to change

Slide 6

Slide 6 text

USE CASES USE CASES Inputs Application Outputs Transaction details Fraud detection Transaction approved/ declined

Slide 7

Slide 7 text

USE CASES USE CASES Inputs Application Outputs In-game activity Game engagement Change difficulty in the game

Slide 8

Slide 8 text

ML AS A SERVICE CHALLENGES ▸ Time constraints — run and respond fast enough ▸ Needs to be robust and “just works”

Slide 9

Slide 9 text

NOT ALL TIMES ARE MATTER ▸ Time to respond ▸ Time to build and deploy

Slide 10

Slide 10 text

LATENCY FOR PREDICTIONS WHAT IS FAST? (BABY DON’T HURT ME) For what Good enough Recommendations <300 ms Ads, click-through rates <70 ms Fraud detection <30 ms

Slide 11

Slide 11 text

MODEL.PREDICT(DATA)

Slide 12

Slide 12 text

PREDICTION SERVER SERVER CLIENT CPU RAM DB FILESYSTEM HTTP (NETWORK IO)

Slide 13

Slide 13 text

PREDICTION API AS A SERVICE BOTTLENECKS EVERYWHERE ▸ Distributed FS are slow and unreliable ▸ Deserializing model and load it to RAM is expensive ▸ model.predict(data) is CPU-bound operation ▸ model can be heavy 1Gb object.

Slide 14

Slide 14 text

PREDICTION WORKERS PYTHON. PAIN. ▸ One (same) 1gb models takes 16gb in 16 process. ▸ Ten different models takes 10gb * 16 = 160 Gb.

Slide 15

Slide 15 text

CUSTOMER RULE ▸ Limit to 10 models at once is not acceptable by some of the customers

Slide 16

Slide 16 text

POSSIBLE SOLUTIONS ▸ Horizontal scale with hash ring ▸ Benefits ▸ Custom routing (good UX) ▸ Gaps ▸ Complexity is growing (operations effort) ▸ Problem with utilization on opposite use case — can’t scale 1 model to N servers

Slide 17

Slide 17 text

POSSIBLE SOLUTIONS ▸ Flatten your models ▸ Simplify serializing format (use something different to pickle)

Slide 18

Slide 18 text

ZERO DOWNTIME DEPLOYMENT AND SCALING ▸ Blue-green deployment strategy with cache warming ▸ AMIs and autoscaling (for fast autoscaling) ▸ K8s is not cost effective

Slide 19

Slide 19 text

MONITORING PREDICTIONS ▸ Protect against model deterioration ▸ Alert on deteriorations or functional violations (500, high latency) ▸ Retire and replace an important model

Slide 20

Slide 20 text

THANKS! QUESTIONS? ‣ github.com/Axik ‣ twitter.com/axique