Slide 1

Slide 1 text

Serverless Inferences On AWS

Slide 2

Slide 2 text

What are we talking about? 2

Slide 3

Slide 3 text

3 Define Collect Data Clean Data Build Model Train Model Deploy Model Monitor/ Operate Model

Slide 4

Slide 4 text

Pre-trained Model Model artifacts (structure, helper files, parameters, weights, ...) JSON or Binary format Can be serialized (saved) and de-serialized (loaded) 4

Slide 5

Slide 5 text

5 Serverless!

Slide 6

Slide 6 text

6 Inference!

Slide 7

Slide 7 text

1. Generic Architecture What should a good design look like 7

Slide 8

Slide 8 text

8 Load Balancer Model Server Model Artifacts

Slide 9

Slide 9 text

2. Lambda and S3 buckets.. 9

Slide 10

Slide 10 text

10 API Gateway Kinesis Streams Where does the pre-trained model live ● Part of the lambda deployment ● On demand from S3 Lambda S3

Slide 11

Slide 11 text

Trade Offs Size of the Model Memory for a Lambda Processing power for a Lambda Multiple Models? Latency of transfer from S3 11

Slide 12

Slide 12 text

2. Sagemaker 12

Slide 13

Slide 13 text

13 Sagemaker ECR S3 Python SDK

Slide 14

Slide 14 text

What does the sage do? 14

Slide 15

Slide 15 text

Sagemaker What does the Sagemaker do? ◍ Runs the Inference code ◍ Copies Models from S3 ◍ Load Balancer ◍ Monitoring ◍ C-test on new Models What does the Sagemaker need? ◍ Inference Code as ECR ◍ Location of Models on S3 ◍ Type of Machine to Run 15

Slide 16

Slide 16 text

Specifications 16 /opt/ml/model } The directory to which the models are copied from S3 docker run serve } Sagemaker run command /ping /invocations } Endpoints that must be defined

Slide 17

Slide 17 text

17 Sagemaker ECR S3 Python SDK /ping /invocations Where the models live

Slide 18

Slide 18 text

Inference Code 18

Slide 19

Slide 19 text

Incoming requests Reverse Proxy Inference /ping /invocations

Slide 20

Slide 20 text

Thanks! Any questions? 20