Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Prediction API as a Service

Cf0fa9150c00f0ea72eea1bc59a2701b?s=47 Kyrylo Perevozchikov
November 04, 2017
59

Building Prediction API as a Service

Cf0fa9150c00f0ea72eea1bc59a2701b?s=128

Kyrylo Perevozchikov

November 04, 2017
Tweet

Transcript

  1. BUILDING PREDICTION API AS A SERVICE KYRYLO PEREVOZCHYKOV

  2. WHO AM I KYRYLO PEREVOZCHYKOV ‣ Easy-to-use Machine Learning service

    that helps people building models and predict on them ‣ My daily tasks split 70% - development, 30% - operations. ‣ Calling myself software engineer 2
  3. MACHINE LEARNING PIPELINE MACHINE LEARNING AS A SERVICE PIPELINE DATA

    INTAKE DATA TRANSFORM BUILD MODEL DEPLOY MODEL PREDICT
  4. USE CASES USE CASES Inputs Application Outputs Products clicked Product

    recommendation List of similar products
  5. USE CASES USE CASES Inputs Application Outputs Item (house, airline

    ticket) Price intelligence and forecasting Price trend and likelihood to change
  6. USE CASES USE CASES Inputs Application Outputs Transaction details Fraud

    detection Transaction approved/ declined
  7. USE CASES USE CASES Inputs Application Outputs In-game activity Game

    engagement Change difficulty in the game
  8. ML AS A SERVICE CHALLENGES ▸ Time constraints — run

    and respond fast enough ▸ Needs to be robust and “just works”
  9. NOT ALL TIMES ARE MATTER ▸ Time to respond ▸

    Time to build and deploy
  10. LATENCY FOR PREDICTIONS WHAT IS FAST? (BABY DON’T HURT ME)

    For what Good enough Recommendations <300 ms Ads, click-through rates <70 ms Fraud detection <30 ms
  11. MODEL.PREDICT(DATA)

  12. PREDICTION SERVER SERVER CLIENT CPU RAM DB FILESYSTEM HTTP (NETWORK

    IO)
  13. PREDICTION API AS A SERVICE BOTTLENECKS EVERYWHERE ▸ Distributed FS

    are slow and unreliable ▸ Deserializing model and load it to RAM is expensive ▸ model.predict(data) is CPU-bound operation ▸ model can be heavy 1Gb object.
  14. PREDICTION WORKERS PYTHON. PAIN. ▸ One (same) 1gb models takes

    16gb in 16 process. ▸ Ten different models takes 10gb * 16 = 160 Gb.
  15. CUSTOMER RULE ▸ Limit to 10 models at once is

    not acceptable by some of the customers
  16. POSSIBLE SOLUTIONS ▸ Horizontal scale with hash ring ▸ Benefits

    ▸ Custom routing (good UX) ▸ Gaps ▸ Complexity is growing (operations effort) ▸ Problem with utilization on opposite use case — can’t scale 1 model to N servers
  17. POSSIBLE SOLUTIONS ▸ Flatten your models ▸ Simplify serializing format

    (use something different to pickle)
  18. ZERO DOWNTIME DEPLOYMENT AND SCALING ▸ Blue-green deployment strategy with

    cache warming ▸ AMIs and autoscaling (for fast autoscaling) ▸ K8s is not cost effective
  19. MONITORING PREDICTIONS ▸ Protect against model deterioration ▸ Alert on

    deteriorations or functional violations (500, high latency) ▸ Retire and replace an important model
  20. THANKS! QUESTIONS? ‣ github.com/Axik ‣ twitter.com/axique