at once, and process occasionaly - Streaming Processing - Micro-batch - ex) AWS Lambda with “Batch Size = 2 ~ n” (n is a not-too-large natural number.) - Real Streaming - ex) AWS Lambda with “Batch Size = 1”
which events actually occurred. - Processing time, which is the time at which events are observed in the system. Figure: Example time domain mapping from https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 “Event Time” vs “Processing Time”
CEO at Confluent which was acquired by LinkedIn - “Questioning the Lambda Architecture” https://www.oreilly.com/ideas/questioning-the-lambda-architecture - Streaming Layer + Serving Layer
System), YARN, MapReduce O X Storm Topology (Spout + Bolt), Tuple, Task X O Kafka Broker, Producer/Consumer (“Un-Managed Kinesis” ?) X O Spark inspired by Hadoop’s MapReduce engine O O (Spark Streaming) Flink Batch and Streaming in One System, ML Support, DataStram API O O
discard late logs - Watermarks - “all input data with event times less than X have been observed.” - Triggers - declaring when the output for a window should be materialized - Accumulation - accumulate multiple results that are observed for the same window Read https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 for more details :)
memory - e.g. RDS, DynamoDB - B. Calculate on-memory - formula … - e.g. late threshold = 10 minutes - use `median(event time)` instead to handle the too-future logs - e.g. mobile devices’ system clock are somehow modified by users incorrectly watermark = max(event time) - late_threshold
directly via kinesis:PutRecord - once per 90 sec, from at most 15,000 devices - Lambda polls Kinesis and aggregate as impression by increment - DynamoDB stores incremented records with UpdateItem (ADD) operation - another Lambda(s) polls DynamoDB Streams and aggregate hourly/daily
-> hourly -> daily - Partition usual logs/lagged logs, and update records to the separate tables - NOT discarding (for now) to see how many logs will be discarded - determine whether logs’ timestamp is behind the watermark or not - watermark … the median of all timestamps - because users can change system clocks to the future
sort.Ints(eventTimes) // 1. sort l := len(eventTimes) // 2. get the median if l%2 == 0 { // when even median = Mean(eventTimes[l/2-1 : l/2 + 1]) } else { // when odd median = EventTime(eventTimes[l/2]) } return median }
updated records’ timestamp and current time, and compare with the threshold. - timestamp … get from DynamoDB Stream event records - threshold … pass via ENV (currently 10 min) - calculation … - [DynamoDB Stream -> Custom CloudWatch Metrics -> SNS -> Lambda -> Slack] time.Since(timestamp).Minutes() >= threshold
- which can kinesis:PutRecord to the target Kinesis Stream ARN - use unidentified pool - because the Android does not need any login feature - much more secure than embedding API_KEY/CREDENTIAL_KEY to the Android clients
2018 - https://dev.classmethod.jp/cloud/aws/aws-lambda-supports-go/ - aws/aws-lambda-go - https://github.com/aws/aws-lambda-go - can easily grasp what kind of JSON event records will be available with Type - IMHO - One of the favorite language above other officially supported languages - Type, Runtime Performance, ecosystem, etc. - goroutine/channels have too much overhead for running on Lambda
global vars > A single instance of your Lambda function will never handle multiple events simultaneously https://docs.aws.amazon.com/lambda/latest/dg/go-programming-model-handler-types.html
to configure) - Motivation (never used before at production) - (Might be) easy to migrate to CloudFormation/SAM later - Why not Apex? - Easy to deploy Lambda, but that’s all - Why not SAM? - Writing CloudFormation stacks from the scratch might takes time - However, sam-local is a great tool so might migrate to SAM in the near future
stage via `--stage` option - $ serverless metrics - show simple metrics for functions - $ serverless invoke - Useful Lambda Event fixtures can be found at ... https://github.com/aws/aws-lambda-go/tree/master/events/testdata