ML and Serverless - Speaker Deck

Slide 1

Slide 1 text

Slide 2

Slide 2 text

© 2021, Amazon Web Services, Inc. or its Affiliates. Key Lambda announcements from re:Invent 2020 Package code and dependencies as a Docker or Open Container Initiative compatible container image (up to 10GB) Reduced the billing granularity for function duration from 100ms to 1ms Allocate up to 10 GB of memory to a Lambda function and get access to up to 6 vCPUs in each execution environment

Slide 3

Slide 3 text

© 2021, Amazon Web Services, Inc. or its Affiliates. AWS Lambda and containers Python, Node.js, Java, .NET Core, Go, Ruby Custom runtimes Container images Lambda Function Event Databases AWS Services Third Party APIs Amazon SNS Amazon API Gateway Amazon DynamoDB Amazon S3 and many more… Amazon SQS

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Slide 6

Slide 6 text

Slide 7

Slide 7 text

© 2021, Amazon Web Services, Inc. or its Affiliates. Packaging code for Lambda Managed runtimes (max 250 MB) Container images (max 10 GB) Function Code (/var/task) Function Container Image Function Layer (/opt) Function Layer (/opt) Operating system (AL or AL2)

Slide 8

Slide 8 text

© 2021, Amazon Web Services, Inc. or its Affiliates. Handling requests (managed runtimes) Execution Environment Runtime API Runtime def handler(event, _): name = event.get("name", "World!") return f"Hello, {name}!" Lambda Extensions (Optional)

Slide 9

Slide 9 text

Slide 10

Slide 10 text

Slide 11

Slide 11 text

© 2021, Amazon Web Services, Inc. or its Affiliates. One use case - Machine Learning! • We can use Lambda for ML inference from many different event sources. • Lambda can be significantly cheaper and faster for irregular, lower volume invokes compared to SageMaker or EC2. • Many pretrained models can be found on HuggingFace.co. We will be using a standard “distilbert” model.

Slide 12

Slide 12 text

Slide 13

Slide 13 text

© 2021, Amazon Web Services, Inc. or its Affiliates. Keeping ML inference performant • Loading the ML model for the first, “cold” invocation can take a few seconds. Make sure to load the model outside the Lambda handler. • For synchronous invocations through API Gateway, you can use Provisioned Concurrency to significantly lower cold start times. • For queue or stream based invocations, the cold start is far less of a problem and you may not need to do anything.

Slide 14

Slide 14 text

© 2021, Amazon Web Services, Inc. or its Affiliates. Execution Environment Lifecycle Initialization Invoke Invoke Invoke Invoke Invoke Shutdown Execution Environment Initialization Invoke Invoke Shutdown Execution Environment time

Slide 15

Slide 15 text

© 2021, Amazon Web Services, Inc. or its Affiliates. Sizing the Execution Environment More memory = more CPU resources From 128MB to 10GB Up to 6 vCPUS https://github.com/alexcasalboni/aws-lambda- power-tuning

Slide 16

Slide 16 text

Slide 17

Slide 17 text

© 2021, Amazon Web Services, Inc. or its Affiliates. Further reading and sources The CDK code for the ML demo https://github.com/marekq/lambda-pytorch A TerraForm sample for Lambda Docker https://github.com/marekq/terraform-lambda-docker Detailed documentation about the feature https://docs.aws.amazon.com/lambda/latest/dg/images-create.html

Slide 18

Slide 18 text