Slide 1

Slide 1 text

Deep Dive into Lambda Response Streaming Yohei Watanabe [AWS Ambassador, AWS Community Builder] [JAWS-UG Tokyo]

Slide 2

Slide 2 text

Happy 10th Anniversary, Lambda! 2 https://aws.amazon.com/jp/blogs/aws/run-code-cloud/

Slide 3

Slide 3 text

Introduction ● Introduced in 2022, Response Streaming is a key feature in AWS Lambda. ● It's particularly relevant in the current LLM boom, where streaming large volumes of text output is becoming the norm. ● While Lambda supports this capability, comprehensive documentation is still limited. 3

Slide 4

Slide 4 text

Me Watanabe Yohei(@watany) ● NTT TechnoCross Corporation ● JAWS-UG Tokyo Organizer ● Title https://jawsug.connpass.com/event/316451/

Slide 5

Slide 5 text

Table of Contents 1. What is Lambda Response Streaming? 2. Managed Runtime 3. Custom Runtime 4. Lambda Web Adapter (LWA) 5

Slide 6

Slide 6 text

1.What is Lambda Response Streaming? 6

Slide 7

Slide 7 text

What is ”Response Streaming” ? Lambda Response Streaming allows immediate sending of available response data back to the caller. ● Efficiently returns large data, up to 20MB ● Reduces time to first byte (TTFB) to just a few milliseconds, minimizing response latency 7 https://aws.amazon.com/jp/blogs/compute/introducing-aws-lambda-resp onse-streaming/

Slide 8

Slide 8 text

What is ”Response Streaming” ? Key Focus Areas: ● LLM Chat UI Responses ○ Examples: Claude, ChatGPT ● Server-Side Rendering (SSR) ○ Examples: Next.js, Remix 8

Slide 9

Slide 9 text

What is ”Response Streaming” ? Written in SAM/CFn 9

Slide 10

Slide 10 text

What is ”Response Streaming” ? Written in CDK 10

Slide 11

Slide 11 text

Response Streaming Limitations ● Types of Streaming ● Limits of Data Size ● Buffering ● Runtime-specific Limitations 11

Slide 12

Slide 12 text

Types of ”Response Streaming” Which types of streaming does Lambda Response Streaming support? ● Transfer-Encoding: chunked ● Server-Sent Events (SSE) No ● Websocket 12

Slide 13

Slide 13 text

Limits Payload Size Limits ● Buffered (Normal): 6MB ● Response Streaming: 20MB Bandwidth limits ● First 6MB: Uncapped bandwidth for the initial 6MB of your function’s response. ● After 6MB: Streaming limited to 2MBps. 13

Slide 14

Slide 14 text

Buffering ● Undocumented Issue: Small response data may not be output. ● Estimated Flush Threshold: Around 100KB. ● Ref: Forcing Lambda Response Streaming to Flush Content ○ https://betterdev.blog/lambda-response-streaming-flush-c ontent/ 14

Slide 15

Slide 15 text

Runtime-specific Limitations Each runtime has its own limitations; let’s go through them one by one. ● Managed Runtime ● Custom Runtime ● Managed or Container With Lambda Web Adapter 15

Slide 16

Slide 16 text

2.Managed Runtime 16

Slide 17

Slide 17 text

Managed Runtime ● Available in Node.js v14 and above. ● Not supported in other languages. ● Wrap your function using the decorator implemented in this managed runtime. 17

Slide 18

Slide 18 text

Decorator Wrap with the awslambda.streamifyResponse decorator (right) 18

Slide 19

Slide 19 text

awslambda.streamifyResponse Pipeline is recommended over Write for Streams. 19

Slide 20

Slide 20 text

Write metadata 20 awslambda.HttpResponseStream.from(stream, metadata) Use Case: ● custom HTTP response status codes ● custom HTTP headers ● some cookie data to the client.

Slide 21

Slide 21 text

3.Custom Runtime 21

Slide 22

Slide 22 text

Custom Runtime Overview ● Customizable Environment: Allows you to define your own runtime environment instead of using AWS’s managed runtimes. ● bootstrap Script: Central to custom runtimes, handling initialization, request processing, and response. ● Runtime API Communication: Interacts with the Lambda Runtime API for event handling and response delivery. 22 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html

Slide 23

Slide 23 text

Custom Runtime Overview After the Execution Environment starts, the Runtime and Function communicate requests and responses via the Lambda Runtime API. 23 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html

Slide 24

Slide 24 text

Custom Runtime Overview Request: The Runtime API GET /runtime/invocation/next fetches the event, which the Runtime and Function then process. 24 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html curl -X GET "http://${AWS_LAMBDA_RUNTIME_API}/r untime/invocation/next"

Slide 25

Slide 25 text

Custom Runtime Overview Response: The Runtime and Function process the result, then POST it to the Runtime API at /runtime/invocation/AwsRequestId/response. 25 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html curl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/r untime/invocation/${AwsRequestId}/re sponse"

Slide 26

Slide 26 text

How to Enable Response Streaming in a Custom Runtime When POST to Runtime API /runtime/invocation/AwsRequestId/response, do the following ● "Add the following to the headers ○ Lambda-Runtime-Function-Response-Mode: streaming ○ Transfer-Encoding: chunked ● Send the response in chunks, then close the connection. ● and other error handling 26

Slide 27

Slide 27 text

How to Enable Response Streaming in a Custom Runtime ● The current implementation covers the equivalent of `awslambda.streamifyResponse`. ● To implement the equivalent of `awslambda.HttpResponseStream`.from, you need: ○ Set Content-Type to `application/vnd.awslambda.http-integration-response`. ○ Send custom headers (status code, headers, cookies) in JSON. ○ Add 8 NULL characters as separators. ○ Encode the response using HTTP/1.1 chunked transfer. ● Ref:https://aws.amazon.com/jp/blogs/compute/using-response-streaming -with-aws-lambda-web-adapter-to-optimize-performance/ 27

Slide 28

Slide 28 text

Implementation Example Refer to the implemented "Rust Runtime for AWS Lambda" for guidance. 28 https://github.com/awslabs/aws-lambda-rust-runtime/blob/fbf212f4eef8c0fd8bd87f87998239fa17bc2b23/lambda-runtime/src/streaming.rs

Slide 29

Slide 29 text

4.Lambda Web Adapter (LWA) 29

Slide 30

Slide 30 text

What is Lambda Web Adapter(LWA)? ● Run web apps on AWS Lambda using familiar frameworks like Express.js, Flask, and SpringBoot. ● Deploy the same Docker image across AWS Lambda, EC2, Fargate, and local environments. 30 https://github.com/awslabs/aws-lambda-web-adapter

Slide 31

Slide 31 text

Why Response Streaming Works with LWA? Lambda Web Adapter includes a "Runtime Interface Client." 31 https://github.com/awslabs/aws-lambda-web-adapter

Slide 32

Slide 32 text

Why Response Streaming Works with LWA? Lambda Web Adapter connects the "Web App" and "Runtime API. 32 https://aws.amazon.com/jp/blogs/compute/using-response-streaming-wit h-aws-lambda-web-adapter-to-optimize-performance/ Triggers the app to start upon receiving an HTTP request. Converts events fetched from the Runtime API into HTTP requests.

Slide 33

Slide 33 text

Why Response Streaming Works with LWA? Lambda Web Adapter connects the "Web App" and "Runtime API. 33 https://aws.amazon.com/jp/blogs/compute/using-response-streaming-wit h-aws-lambda-web-adapter-to-optimize-performance/ Returns the HTTP Response to the Lambda Web Adapter (LWA). Converts the HTTP Response and POSTs it to the Runtime API.

Slide 34

Slide 34 text

Why Response Streaming Works with LWA? Two Common Use Cases for Using LWA 1. Container Runtime 2. Managed Runtime 34

Slide 35

Slide 35 text

4-1.Lambda Web Adapter With Container Runtime 35

Slide 36

Slide 36 text

Container Runtime + LWA LWA Binary move to `/opt/extensions/` 36

Slide 37

Slide 37 text

Container Runtime + LWA It works with just one additional line. (2nd line) Nothing else is special. 37

Slide 38

Slide 38 text

4-2.Lambda Web Adapter With Managed Runtime 38

Slide 39

Slide 39 text

Managed Runtime + LWA 1.Attach as a Lambda Layer 39

Slide 40

Slide 40 text

Managed Runtime + LWA 2. Lambda Environment Variables 40

Slide 41

Slide 41 text

Managed Runtime + LWA 2. Lambda Environment Variables Must: ● AWS_LWA_EXEC_WRAPPER: /opt/bootstrap ○ Required for LWA + Managed runtimes. ● AWS_LWA_INVOKE_MODE: RESPONSE_STREAM ○ Important for streaming. (Functions URL alone is not enough) 41

Slide 42

Slide 42 text

Managed Runtime + LWA 2. Lambda Environment Variables Optional: ● https://github.com/awslabs/ aws-lambda-web-adapter 42

Slide 43

Slide 43 text

Managed Runtime + LWA 3. Run Script 43

Slide 44

Slide 44 text

Managed Runtime + LWA 3. Run Script(FastAPI) 44

Slide 45

Slide 45 text

Managed Runtime + LWA 3. Run Script(Express.js) 45

Slide 46

Slide 46 text

Managed Runtime + LWA Appendix. The LWA GitHub repository contains examples for various frameworks. https://github.com/awslabs/aws-lambda -web-adapter/tree/main/examples/ 46

Slide 47

Slide 47 text

Closing 47

Slide 48

Slide 48 text

Which one should you use? ● If you can use Node.js, that's the best choice. ○ Also, consider using Hono. ● If you can trigger your app with an HTTP request, LWA is ideal. ● Custom runtimes are challenging, but examples are available. 48

Slide 49

Slide 49 text

See you 👋 49