Building Prediction API as a Service

BUILDING PREDICTION API AS A SERVICE KYRYLO PEREVOZCHYKOV

WHO AM I KYRYLO PEREVOZCHYKOV ‣ Easy-to-use Machine Learning service
that helps people building models and predict on them ‣ My daily tasks split 70% - development, 30% - operations. ‣ Calling myself software engineer 2

MACHINE LEARNING PIPELINE MACHINE LEARNING AS A SERVICE PIPELINE DATA
INTAKE DATA TRANSFORM BUILD MODEL DEPLOY MODEL PREDICT

USE CASES USE CASES Inputs Application Outputs Products clicked Product
recommendation List of similar products

USE CASES USE CASES Inputs Application Outputs Item (house, airline
ticket) Price intelligence and forecasting Price trend and likelihood to change

USE CASES USE CASES Inputs Application Outputs Transaction details Fraud
detection Transaction approved/ declined

USE CASES USE CASES Inputs Application Outputs In-game activity Game
engagement Change difﬁculty in the game

ML AS A SERVICE CHALLENGES ▸ Time constraints — run
and respond fast enough ▸ Needs to be robust and “just works”

NOT ALL TIMES ARE MATTER ▸ Time to respond ▸
Time to build and deploy

LATENCY FOR PREDICTIONS WHAT IS FAST? (BABY DON’T HURT ME)
For what Good enough Recommendations <300 ms Ads, click-through rates <70 ms Fraud detection <30 ms

MODEL.PREDICT(DATA)

PREDICTION SERVER SERVER CLIENT CPU RAM DB FILESYSTEM HTTP (NETWORK
IO)

PREDICTION API AS A SERVICE BOTTLENECKS EVERYWHERE ▸ Distributed FS
are slow and unreliable ▸ Deserializing model and load it to RAM is expensive ▸ model.predict(data) is CPU-bound operation ▸ model can be heavy 1Gb object.

PREDICTION WORKERS PYTHON. PAIN. ▸ One (same) 1gb models takes
16gb in 16 process. ▸ Ten different models takes 10gb * 16 = 160 Gb.

CUSTOMER RULE ▸ Limit to 10 models at once is
not acceptable by some of the customers

POSSIBLE SOLUTIONS ▸ Horizontal scale with hash ring ▸ Beneﬁts
▸ Custom routing (good UX) ▸ Gaps ▸ Complexity is growing (operations effort) ▸ Problem with utilization on opposite use case — can’t scale 1 model to N servers

POSSIBLE SOLUTIONS ▸ Flatten your models ▸ Simplify serializing format
(use something different to pickle)

ZERO DOWNTIME DEPLOYMENT AND SCALING ▸ Blue-green deployment strategy with
cache warming ▸ AMIs and autoscaling (for fast autoscaling) ▸ K8s is not cost effective

MONITORING PREDICTIONS ▸ Protect against model deterioration ▸ Alert on
deteriorations or functional violations (500, high latency) ▸ Retire and replace an important model

THANKS! QUESTIONS? ‣ github.com/Axik ‣ twitter.com/axique

Building Prediction API as a Service

Building Prediction API as a Service

Kyrylo Perevozchikov

More Decks by Kyrylo Perevozchikov

Featured

Transcript