feature in AWS Lambda. • It's particularly relevant in the current LLM boom, where streaming large volumes of text output is becoming the norm. • While Lambda supports this capability, comprehensive documentation is still limited. 3
sending of available response data back to the caller. • Efficiently returns large data, up to 20MB • Reduces time to first byte (TTFB) to just a few milliseconds, minimizing response latency 7 https://aws.amazon.com/jp/blogs/compute/introducing-aws-lambda-resp onse-streaming/
Streaming: 20MB Bandwidth limits • First 6MB: Uncapped bandwidth for the initial 6MB of your function’s response. • After 6MB: Streaming limited to 2MBps. 13
your own runtime environment instead of using AWS’s managed runtimes. • bootstrap Script: Central to custom runtimes, handling initialization, request processing, and response. • Runtime API Communication: Interacts with the Lambda Runtime API for event handling and response delivery. 22 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html
the event, which the Runtime and Function then process. 24 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html curl -X GET "http://${AWS_LAMBDA_RUNTIME_API}/r untime/invocation/next"
result, then POST it to the Runtime API at /runtime/invocation/AwsRequestId/response. 25 https://docs.aws.amazon.com/lambda/latest/dg/runtimes-api.html curl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/r untime/invocation/${AwsRequestId}/re sponse"
POST to Runtime API /runtime/invocation/AwsRequestId/response, do the following • "Add the following to the headers ◦ Lambda-Runtime-Function-Response-Mode: streaming ◦ Transfer-Encoding: chunked • Send the response in chunks, then close the connection. • and other error handling 26
The current implementation covers the equivalent of `awslambda.streamifyResponse`. • To implement the equivalent of `awslambda.HttpResponseStream`.from, you need: ◦ Set Content-Type to `application/vnd.awslambda.http-integration-response`. ◦ Send custom headers (status code, headers, cookies) in JSON. ◦ Add 8 NULL characters as separators. ◦ Encode the response using HTTP/1.1 chunked transfer. • Ref:https://aws.amazon.com/jp/blogs/compute/using-response-streaming -with-aws-lambda-web-adapter-to-optimize-performance/ 27
Lambda" for guidance. 28 https://github.com/awslabs/aws-lambda-rust-runtime/blob/fbf212f4eef8c0fd8bd87f87998239fa17bc2b23/lambda-runtime/src/streaming.rs
AWS Lambda using familiar frameworks like Express.js, Flask, and SpringBoot. • Deploy the same Docker image across AWS Lambda, EC2, Fargate, and local environments. 30 https://github.com/awslabs/aws-lambda-web-adapter
the "Web App" and "Runtime API. 32 https://aws.amazon.com/jp/blogs/compute/using-response-streaming-wit h-aws-lambda-web-adapter-to-optimize-performance/ Triggers the app to start upon receiving an HTTP request. Converts events fetched from the Runtime API into HTTP requests.
the "Web App" and "Runtime API. 33 https://aws.amazon.com/jp/blogs/compute/using-response-streaming-wit h-aws-lambda-web-adapter-to-optimize-performance/ Returns the HTTP Response to the Lambda Web Adapter (LWA). Converts the HTTP Response and POSTs it to the Runtime API.
AWS_LWA_EXEC_WRAPPER: /opt/bootstrap ◦ Required for LWA + Managed runtimes. • AWS_LWA_INVOKE_MODE: RESPONSE_STREAM ◦ Important for streaming. (Functions URL alone is not enough) 41
Node.js, that's the best choice. ◦ Also, consider using Hono. • If you can trigger your app with an HTTP request, LWA is ideal. • Custom runtimes are challenging, but examples are available. 48