Serverless Streaming Log Architecture ~ Theory & Practice ~

Serverless Streaming Log Architecture ~ Theory & Practice ~

Stream Processing

3b36493b4296ebeb219bcd3ffab3aa2b?s=128

Kenju Wagatsuma

August 12, 2018
Tweet

Transcript

  1. 2.

    Agenda Part I … Theory - What is “streaming” ?

    - “Batch” vs “Streaming” - “Event Time” vs “Processing Time” - Lambda Architecture - Kappa Architecture - Apache Hadoop/Storm/Spark/Kafka/Flink - Late Logs - Discarding, Watermark, Trigger, Accumulation Part II … Practice - Overall Data-flow - Watermark Implementation - Aggregation - Kinesis -> Lambda -> DynamoDB - DynamoDB Streams -> Lambda -> DynamoDB - Monitoring - “GetRecords.IteratorAgeMilliseconds” - DynamoDB Streams -> Lambda -> Slack - Misc (Cognito, Golang, Serverless Framework)
  2. 3.

    Who are you? Kenju Wagatsuma - Serverside Engineer at Cookpad

    Inc. - Ruby, Golang, AWS - https://github.com/kenju/ - “Header Bidding 導入によるネットワーク広告改善 の開発事情”
  3. 6.

    Image Area What is “streaming” ? Definition: - a type

    of data processing engine that is designed with infinite data sets in mind. (https://www.oreilly.com/ideas/the-world-bey ond-batch-streaming-101)
  4. 7.

    “Batch” vs “Streaming” - Batch Processing - Process grouped logs

    at once, and process occasionaly - Streaming Processing - Micro-batch - ex) AWS Lambda with “Batch Size = 2 ~ n” (n is a not-too-large natural number.) - Real Streaming - ex) AWS Lambda with “Batch Size = 1”
  5. 8.

    Image Area - Event time, which is the time at

    which events actually occurred. - Processing time, which is the time at which events are observed in the system. Figure: Example time domain mapping from https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 “Event Time” vs “Processing Time”
  6. 10.

    Lambda Architecture - introduced by Nathan Marz, the programmer of

    Apache Storm - “How to beat the CAP theorem” http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html - Batch Layer + Serving Layer + Speed Layer - Batch Layer … re-computable, can ensure Consistency - Serving Layer … merge views from Batch Layer & real-time logs from Speed Layer - Speed Layer … low latency
  7. 14.

    Kappa Architecture - introduced by Jay Kreps, a co-founder and

    CEO at Confluent which was acquired by LinkedIn - “Questioning the Lambda Architecture” https://www.oreilly.com/ideas/questioning-the-lambda-architecture - Streaming Layer + Serving Layer
  8. 16.

    Lambda vs Kappa Architecture Pros Cons Lambda Architecture - Robust

    to data consistency - Harder to maintain multiple layers Kappa Architecture - Simple implementation - Need extra works to guarantee data onsistency
  9. 17.

    Apache Hadoop/Storm/Spark/Kafka/Flink Name Speciality Batch Processing Stream Processing Hadoop HDFS(FIle

    System), YARN, MapReduce O X Storm Topology (Spout + Bolt), Tuple, Task X O Kafka Broker, Producer/Consumer (“Un-Managed Kinesis” ?) X O Spark inspired by Hadoop’s MapReduce engine O O (Spark Streaming) Flink Batch and Streaming in One System, ML Support, DataStram API O O
  10. 20.

    How to tackle on “Late Logs”? - Discarding - simply

    discard late logs - Watermarks - “all input data with event times less than X have been observed.” - Triggers - declaring when the output for a window should be materialized - Accumulation - accumulate multiple results that are observed for the same window Read https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102 for more details :)
  11. 21.

    What is “Watermark” Definition: a special mark contained in electronic

    documents, pictures, music etc that is used to stop people from copying them Example: audio watermark detection, photo watermark, copyright watermark
  12. 22.

    How to “Watermark” - A. Save State in any external

    memory - e.g. RDS, DynamoDB - B. Calculate on-memory - formula … - e.g. late threshold = 10 minutes - use `median(event time)` instead to handle the too-future logs - e.g. mobile devices’ system clock are somehow modified by users incorrectly watermark = max(event time) - late_threshold
  13. 27.

    Overall Data-flow - Kinesis Stream receives logs from Android clients

    directly via kinesis:PutRecord - once per 90 sec, from at most 15,000 devices - Lambda polls Kinesis and aggregate as impression by increment - DynamoDB stores incremented records with UpdateItem (ADD) operation - another Lambda(s) polls DynamoDB Streams and aggregate hourly/daily
  14. 30.

    Aggregation - Aggregate gradually per minute -> per 10 min

    -> hourly -> daily - Partition usual logs/lagged logs, and update records to the separate tables - NOT discarding (for now) to see how many logs will be discarded - determine whether logs’ timestamp is behind the watermark or not - watermark … the median of all timestamps - because users can change system clocks to the future
  15. 31.

    Watermark Implementation func (sr *StreamRecords) watermark(eventTimes []EventTime) (median EventTime) {

    sort.Ints(eventTimes) // 1. sort l := len(eventTimes) // 2. get the median if l%2 == 0 { // when even median = Mean(eventTimes[l/2-1 : l/2 + 1]) } else { // when odd median = EventTime(eventTimes[l/2]) } return median }
  16. 34.

    Image Area What to “monitor”? The famous Google’s “SRE” book

    says in Chapter 6 “The Four Golden Signals” section: - Latency - Traffix - Errors - Saturation Monitoring - The Four Golden Signals
  17. 35.

    Image Area - Latency - How long does it take?

    - Traffix - How many Get/PutRecords? - Errors - Availability? - How many errors occur? - Saturation - Any delayed data? - IteratorAge? Monitoring - CloudWatch Dashboard
  18. 36.

    Image Area - Create custom metrics - via cloudwatch:PutMetricData -

    Flexible alarm setting - Period - Evaluation Period - Datapoints to Alarm Figure: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html Monitoring - Custom Metrics & Alarm
  19. 37.

    Monitoring - Implementation Example dataInput := &cloudwatch.PutMetricDataInput{ Namespace: aws.String("StoreTvAdLambdaMetrics"), MetricData:

    []*cloudwatch.MetricDatum{ { Dimensions: []*cloudwatch.Dimension{ { Name: aws.String("Function"), Value: aws.String("monitor-late-data"), }, }, MetricName: aws.String("LateLogCount"), Unit: aws.String("Count"), Value: aws.Float64(float64(lateLogCount)), }, }, } cloudWatchClient.PutMetricData(dataInput)
  20. 38.

    Monitoring - Kinesis Streams IteraterAge - “GetRecords.IteratorAgeMilliseconds” Metrics - can

    monitor “how much stream processing is delayed” - https://docs.aws.amazon.com/streams/latest/dev/monitoring-with-cloudwatch.html - [CloudWatch Alarms -> SNS Topic -> Lambda -> Slack]
  21. 39.

    Monitoring - Stream Saturation - Simply calculate the diff between

    updated records’ timestamp and current time, and compare with the threshold. - timestamp … get from DynamoDB Stream event records - threshold … pass via ENV (currently 10 min) - calculation … - [DynamoDB Stream -> Custom CloudWatch Metrics -> SNS -> Lambda -> Slack] time.Since(timestamp).Minutes() >= threshold
  22. 40.
  23. 41.
  24. 42.
  25. 43.

    Image Area Cognito - pass role to the Android clients

    - which can kinesis:PutRecord to the target Kinesis Stream ARN - use unidentified pool - because the Android does not need any login feature - much more secure than embedding API_KEY/CREDENTIAL_KEY to the Android clients
  26. 44.

    Lambda x Golang 1.x - Officially supported from January 15th,

    2018 - https://dev.classmethod.jp/cloud/aws/aws-lambda-supports-go/ - aws/aws-lambda-go - https://github.com/aws/aws-lambda-go - can easily grasp what kind of JSON event records will be available with Type - IMHO - One of the favorite language above other officially supported languages - Type, Runtime Performance, ecosystem, etc. - goroutine/channels have too much overhead for running on Lambda
  27. 45.

    Lambda x Golang 1.x 1. init() function for declaring the

    global vars > A single instance of your Lambda function will never handle multiple events simultaneously https://docs.aws.amazon.com/lambda/latest/dg/go-programming-model-handler-types.html
  28. 46.

    Lambda x Golang 1.x 2. use -ldflags=”-s -w” to reduce

    the binary size - -s … Omit the symbol table and debug information. - -w … Omit the DWARF symbol table. by https://golang.org/cmd/link/
  29. 47.

    go build without -ldflags=“-s -w” $ ls -lh bin/ total

    145704 -rwxr-xr-x 1 kenju-wagatsuma staff 12M Aug 9 22:26 aggregate-daily-logs -rwxr-xr-x 1 kenju-wagatsuma staff 12M Aug 9 22:26 aggregate-hourly-logs -rwxr-xr-x 1 kenju-wagatsuma staff 12M Aug 9 22:26 aggregate-logs -rwxr-xr-x 1 kenju-wagatsuma staff 12M Aug 9 22:27 monitor-late-data -rwxr-xr-x 1 kenju-wagatsuma staff 14M Aug 9 22:26 put-s3 -rwxr-xr-x 1 kenju-wagatsuma staff 7.8M Aug 9 22:27 sns-notification
  30. 48.

    go build -ldflags=“-s -w” $ ls -lh bin/ total 96776

    -rwxr-xr-x 1 kenju-wagatsuma staff 8.3M Aug 9 22:05 aggregate-daily-logs -rwxr-xr-x 1 kenju-wagatsuma staff 8.3M Aug 9 22:05 aggregate-hourly-logs -rwxr-xr-x 1 kenju-wagatsuma staff 8.2M Aug 9 22:05 aggregate-logs -rwxr-xr-x 1 kenju-wagatsuma staff 7.8M Aug 9 22:05 monitor-late-data -rwxr-xr-x 1 kenju-wagatsuma staff 9.3M Aug 9 22:05 put-s3 -rwxr-xr-x 1 kenju-wagatsuma staff 5.4M Aug 9 22:05 sns-notification
  31. 49.

    [NOTE] Golang dependencies $ dep status PROJECT CONSTRAINT VERSION REVISION

    LATEST PKGS USED github.com/aws/aws-lambda-go ^1.0.0 v1.2.0 4d30d0f e630af3 4 github.com/aws/aws-sdk-go ^1.14.23 v1.15.3 cc03a15 36aaf21 37 github.com/go-ini/ini * v1.38.1 358ee76 358ee76 1 github.com/jmespath/go-jmespath * 0b12d6b 1 github.com/kenju/go-cloudwatch branch master branch master c60ecc3 c60ecc3 1 github.com/kenju/go-nested-counter branch master branch master c6ca0d8 c6ca0d8 1 github.com/kenju/go-slack-webhook branch master branch master 627aa7e 627aa7e 1 github.com/satori/go.uuid ^1.2.0 v1.2.0 f58768c f58768c 1
  32. 50.

    Serverless Framework - Why Serverless Framework? - Development Speed (easy

    to configure) - Motivation (never used before at production) - (Might be) easy to migrate to CloudFormation/SAM later - Why not Apex? - Easy to deploy Lambda, but that’s all - Why not SAM? - Writing CloudFormation stacks from the scratch might takes time - However, sam-local is a great tool so might migrate to SAM in the near future
  33. 51.

    Serverless Framework - $ serverless deploy --stage (dev|prod|staging) - change

    stage via `--stage` option - $ serverless metrics - show simple metrics for functions - $ serverless invoke - Useful Lambda Event fixtures can be found at ... https://github.com/aws/aws-lambda-go/tree/master/events/testdata
  34. 52.

    serverless invoke $ cat Makefile | tail -n8 run-sns-notification: deploy-dev

    serverless invoke \ --log \ --stage dev \ --function sns-notification \ --path fixtures/sns-events.json