Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Serverless Inferences

Serverless Inferences

There are several ways to host a machine learning model on AWS. It can range from a dedicated EC2 instance to a fully managed lambda based solution. This presentation talks about several ways to host machine learning models embracing the serverless paradigm on AWS.

Nikhil Kuriakose

December 04, 2018
Tweet

More Decks by Nikhil Kuriakose

Other Decks in Programming

Transcript

  1. 3 Define Collect Data Clean Data Build Model Train Model

    Deploy Model Monitor/ Operate Model
  2. Pre-trained Model Model artifacts (structure, helper files, parameters, weights, ...)

    JSON or Binary format Can be serialized (saved) and de-serialized (loaded) 4
  3. 10 API Gateway Kinesis Streams Where does the pre-trained model

    live • Part of the lambda deployment • On demand from S3 Lambda S3
  4. Trade Offs Size of the Model Memory for a Lambda

    Processing power for a Lambda Multiple Models? Latency of transfer from S3 11
  5. Sagemaker What does the Sagemaker do? ◍ Runs the Inference

    code ◍ Copies Models from S3 ◍ Load Balancer ◍ Monitoring ◍ C-test on new Models What does the Sagemaker need? ◍ Inference Code as ECR ◍ Location of Models on S3 ◍ Type of Machine to Run 15
  6. Specifications 16 /opt/ml/model } The directory to which the models

    are copied from S3 docker run <image> serve } Sagemaker run command /ping /invocations } Endpoints that must be defined