Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Prediction API as a Service

Kyrylo Perevozchikov
November 04, 2017
81

Building Prediction API as a Service

Kyrylo Perevozchikov

November 04, 2017
Tweet

Transcript

  1. WHO AM I KYRYLO PEREVOZCHYKOV ‣ Easy-to-use Machine Learning service

    that helps people building models and predict on them ‣ My daily tasks split 70% - development, 30% - operations. ‣ Calling myself software engineer 2
  2. MACHINE LEARNING PIPELINE MACHINE LEARNING AS A SERVICE PIPELINE DATA

    INTAKE DATA TRANSFORM BUILD MODEL DEPLOY MODEL PREDICT
  3. USE CASES USE CASES Inputs Application Outputs Item (house, airline

    ticket) Price intelligence and forecasting Price trend and likelihood to change
  4. ML AS A SERVICE CHALLENGES ▸ Time constraints — run

    and respond fast enough ▸ Needs to be robust and “just works”
  5. LATENCY FOR PREDICTIONS WHAT IS FAST? (BABY DON’T HURT ME)

    For what Good enough Recommendations <300 ms Ads, click-through rates <70 ms Fraud detection <30 ms
  6. PREDICTION API AS A SERVICE BOTTLENECKS EVERYWHERE ▸ Distributed FS

    are slow and unreliable ▸ Deserializing model and load it to RAM is expensive ▸ model.predict(data) is CPU-bound operation ▸ model can be heavy 1Gb object.
  7. PREDICTION WORKERS PYTHON. PAIN. ▸ One (same) 1gb models takes

    16gb in 16 process. ▸ Ten different models takes 10gb * 16 = 160 Gb.
  8. CUSTOMER RULE ▸ Limit to 10 models at once is

    not acceptable by some of the customers
  9. POSSIBLE SOLUTIONS ▸ Horizontal scale with hash ring ▸ Benefits

    ▸ Custom routing (good UX) ▸ Gaps ▸ Complexity is growing (operations effort) ▸ Problem with utilization on opposite use case — can’t scale 1 model to N servers
  10. ZERO DOWNTIME DEPLOYMENT AND SCALING ▸ Blue-green deployment strategy with

    cache warming ▸ AMIs and autoscaling (for fast autoscaling) ▸ K8s is not cost effective
  11. MONITORING PREDICTIONS ▸ Protect against model deterioration ▸ Alert on

    deteriorations or functional violations (500, high latency) ▸ Retire and replace an important model