Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK

Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK

This presentation compares Amazon Kinesis Data Streams to Managed Streaming for Kafka (MSK) in both architectural perspective and operational perspective. In addition, it shows common architectural patterns: (1) Data Hub: Event-Bus, (2) Log Aggregation, (3) IoT, (4) Event sourcing and CQRS.

Sungmin Kim

April 21, 2022
Tweet

More Decks by Sungmin Kim

Other Decks in Programming

Transcript

  1. Agenda • Key Components of Real-time Analytics • Anatomy of

    Amazon Kinesis Data Streams and MSK • Comparing Amazon Kinesis Data Streams to MSK • Monitoring Metrics • Reference Architecture • Key Takeaways
  2. From Batch to Real-time: Lambda Architecture Data Source Stream Storage

    Speed Layer Batch Layer Batch Process Batch View Real- time View Consumer Query & Merge Results Service Layer Stream Ingestion Raw Data Storage Streaming Data Stream Delivery Stream Process
  3. Lambda Architecture Streaming Data Batch View Stream Process Real-time View

    Query Query Batch View Real-time View Raw Data Batch Process Batch Layer Serving Layer Speed Layer
  4. Key Components of Real-time Analytics Data Source Stream Storage Stream

    Process Stream Ingestion Data Sink Devices and/or applications that produce real-time data at high velocity Data from tens of thousands of data sources can be written to a single stream Data are stored in the order they were received for a set duration of time and can be replayed indefinitely during that time Records are read in the order they are produced, enabling real-time analytics or streaming ETL Data lake (most common) Database (least common)
  5. Stream Storage Data Source Stream Storage Stream Process Stream Ingestion

    Data Sink Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka
  6. Key Features of Kinesis Data Streams and MSK • Distributed

    Queue • Stream Storage #Queue #Distributed #Storage
  7. Consumer oldest data newest data 5 4 3 2 1

    0 3 2 1 0 2 #Queue: FIFO, Scale-Up vs Scale-Out 5 4 4 3 2 1 0 5 Producers
  8. Hash Function Consumer PK PK PK PK oldest data newest

    data Producers shard/partition-1 shard/partition-2 3 2 1 0 5 4 3 2 1 0 4 3 2 1 0 2 shard/partition-3 #Distributed: Scale-Out Consumer 0 Consumer 4 0 Consumer Group 4 3 2 1 0
  9. Hash Function Consumer Consumer Consumer Consumer Group PK PK PK

    PK = next consumer offset oldest data newest data Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 4 2 0 shard/partition-3 #Storage: Stream Buffer 2 1 0 4 3 2 1 0 0
  10. Hash Function Consumer Consumer Consumer Consumer Group PK PK PK

    PK = next consumer offset oldest data newest data Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 4 2 0 shard/partition-3 Anatomy of
  11. Benefits of Stream Storage • Decouple producers & consumers •

    Persistent buffer • Collect multiple streams • Preserve client ordering • Parallel consumption • Streaming MapReduce
  12. Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka •

    Operational Perspective • Number of clusters? • Number of brokers per cluster? • Number of topics per broker? • Number of partitions per topic? • Cluster provisioning model • Only increase number of partitions; can’t decrease • Integration with a few of AWS Services such as Kinesis Data Analytics for Apache Flink • Operational Perspective • Number of Kinesis Data Streams? • Number of shards per stream? • Throughput provisioning model • Increase/Decrease number of shards • Fully Integration with AWS Services such as Lambda function, Kinesis Data Analytics, etc
  13. RequestQueue - Length - WaitTime ResponseQueue - Length - WaitTime

    Network - Packet Drop? Produce/Consume Rate Unbalance Who is Leader? Disk Full? Too many topics? Metrics to Monitor: MSK (Kafka)
  14. Metrics to Monitor: MSK (Kafka) Metric Level Description ActiveControllerCount DEFAULT

    Only one controller per cluster should be active at any given time. OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster. GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster. GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster. KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs. KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs. RootDiskUsed DEFAULT The percentage of the root disk used by the broker. PartitionCount PER_BROKER The number of partitions for the broker. LeaderCount PER_BROKER The number of leader replicas. UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker. UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker. FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on fetching data from the broker. ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.
  15. How about monitoring Kinesis Data Streams? How long time does

    a record stay in a shard? 5 transactions per second, per shard With only one consumer application, records can be retrieved every 200 ms up to 1MB or 1,000 records per seconds, per shard for writes • 10MB per second, per shard • up to 10,000 records per call Consumer Application GetRecords() Data
  16. Metrics to Monitor: Kinesis Data Streams Metric Description GetRecords.IteratorAgeMilliseconds Age

    of the last record in all GetRecords ReadProvisionedThroughputExceeded Number of GetRecords calls throttled WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations GetRecords.Success Number of successful GetRecords operations
  17. Kafka vs MSK vs Kinesis Data Streams Operational Excellence Kinesis

    Data Streams Kafka Amazon MSK Degree of Freedom ≈ Complexity
  18. Comparison Summary Attribute Apache Kafka Kinesis Streams Managed Streaming for

    Kafka Cost $$$ $ (pay for what you use) $$ (pay for infrastructure) Ease of use Advanced setup required Get started in minutes Get started in minutes Management Overhead High Low Low Scalability Difficult to scale Scale in seconds with one click Scale in minutes with one click Throughput Infinite Scales with shards, supports up to 1mb payloads Infinite Durability Configurable 3x by default Configurable Infrastructure You manage AWS manages AWS manages Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable Open Sourced? Yes No Yes
  19. Kinesis Data Streams Kinesis Data Firehose Amazon S3 Amazon EC2

    AWS Lambda Amazon ECS Kinesis Data Analytics Amazon ES Amazon Athena Amazon CloudWatch https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/ Example Usage Pattern 1: Data Hub Amazon MSK
  20. Example Usage Pattern 2: Web Analytics and Leaderboards Amazon DynamoDB

    Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon Cognito Lightweight JS client code Web server on Amazon EC2 OR Compute top 10 users Ingest web app data Persist to feed live apps Lambda function https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/ Amazon MSK
  21. IoT IoT Things Remote Control Prediction/ Fraud Detection Device Monitoring

    Quality Control Data Visualization Events Analytics AI/ML
  22. https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/ Example Usage Pattern 3: Monitoring IoT Devices Ingest sensor

    data Convert json to parquet Store all data points in an S3 data lake AWS IoT Core IoT rule AWS Glue Streaming Job Amazon Athena Glue Crawler Glue Data Catalog S3 Bucket AWS Cloud MQTT Topic Amazon Kinesis Data Streams Raspberry PI + Sense HAT
  23. Event Sourcing and CQRS https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/ App Write Interface App Read

    Interface Event Queue Application State Kafka Streams Topology Kafka Topic Event Handler App Write Interface App Read Interface Kafka Streams State Store Event Store Event Handler + App State Event Store
  24. Amazon Kinesis Data Streams Amazon Kinesis Data Analytics (SQL) Example

    Usage Pattern 4: Streaming SQL Continuous filter Aggregate function Data enrichment (join) S3 Bucket Anomaly Detection Ticker, Company AMZN, Amazon ASD, SomeCompanyA BAC, SomeCompanyB CRM, SomeCompanyC Event Store https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html App Write Interface App Read Interface {"TICKER_SYMBOL": "CVB", "SECTOR": "TECHNOLOGY", "CHANGE": 0.81, "PRICE": 53.63} {"TICKER_SYMBOL": "ABC", "SECTOR": "RETAIL", "CHANGE": -1.14, "PRICE": 23.64} {"TICKER_SYMBOL": "JKL", "SECTOR": "TECHNOLOGY", "CHANGE": 0.22, "PRICE": 15.32} Event Handler + App State join
  25. Key Takeaways • Distributed Queue as Stream Storage • Preserve

    Ordering • Parallel Consumption • Persistent Buffer • Decouple producers & consumers • Trade-off: Operational Excellence vs Degree of Freedom • MUST keep an eye on the right monitoring metrics • Architectural Patterns • Data Hub: (Asynchronous) Event-Bus • Log Aggregation • IoT • Event Sourcing and CQRS
  26. Where To Go Next? • Amazon MSK Labs https://amazonmsk-labs.workshop.aws/ •

    Amazon Managed Streaming for Kafka: Best Practices https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html • Monitoring Kafka performance metrics (2020-04-16) https://tinyurl.com/y6hrhwbq • Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11) https://tinyurl.com/y4uwyenx • AWS Analytics Immersion Day - Build BI System from Scratch • Workshop - https://tinyurl.com/yapgwv77 • Slides - https://tinyurl.com/ybxkb74b • Realtime Analytics on AWS https://tinyurl.com/y3evwm3v • Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2 • Part1 - https://tinyurl.com/y8vo8q7o • Part2 - https://tinyurl.com/ycbv7wel