Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK

Choose Right Stream Storage: Kinesis Data Streams vs MSK Sungmin,
Kim Solutions Architect, AWS

Agenda • Key Components of Real-time Analytics • Anatomy of
Amazon Kinesis Data Streams and MSK • Comparing Amazon Kinesis Data Streams to MSK • Monitoring Metrics • Reference Architecture • Key Takeaways

Key Components of Real- time Analytics

From Batch to Real-time: Lambda Architecture Data Source Stream Storage
Speed Layer Batch Layer Batch Process Batch View Real- time View Consumer Query & Merge Results Service Layer Stream Ingestion Raw Data Storage Streaming Data Stream Delivery Stream Process

Lambda Architecture Streaming Data Batch View Stream Process Real-time View
Query Query Batch View Real-time View Raw Data Batch Process Batch Layer Serving Layer Speed Layer

Key Components of Real-time Analytics Data Source Stream Storage Stream
Process Stream Ingestion Data Sink Devices and/or applications that produce real-time data at high velocity Data from tens of thousands of data sources can be written to a single stream Data are stored in the order they were received for a set duration of time and can be replayed indefinitely during that time Records are read in the order they are produced, enabling real-time analytics or streaming ETL Data lake (most common) Database (least common)

Stream Storage Data Source Stream Storage Stream Process Stream Ingestion
Data Sink Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka

Anatomy of Amazon Kinesis Data Streams and MSK

Key Features of Kinesis Data Streams and MSK • Distributed
Queue • Stream Storage #Queue #Distributed #Storage

Consumer oldest data newest data 5 4 3 2 1
0 3 2 1 0 2 #Queue: FIFO, Scale-Up vs Scale-Out 5 4 4 3 2 1 0 5 Producers

Hash Function Consumer PK PK PK PK oldest data newest
data Producers shard/partition-1 shard/partition-2 3 2 1 0 5 4 3 2 1 0 4 3 2 1 0 2 shard/partition-3 #Distributed: Scale-Out Consumer 0 Consumer 4 0 Consumer Group 4 3 2 1 0

Hash Function Consumer Consumer Consumer Consumer Group PK PK PK
PK = next consumer offset oldest data newest data Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 4 2 0 shard/partition-3 #Storage: Stream Buffer 2 1 0 4 3 2 1 0 0

Hash Function Consumer Consumer Consumer Consumer Group PK PK PK
PK = next consumer offset oldest data newest data Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka Producers shard/partition-1 shard/partition-2 5 4 3 2 1 0 3 2 1 0 4 3 2 1 0 4 2 0 shard/partition-3 Anatomy of

Benefits of Stream Storage • Decouple producers & consumers •
Persistent buffer • Collect multiple streams • Preserve client ordering • Parallel consumption • Streaming MapReduce

Comparing Amazon Kinesis Data Streams to MSK

Topic Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka
Comparing Kinesis Data Streams to MSK

Amazon Kinesis Data Streams Amazon Managed Streaming for Kafka •
Operational Perspective • Number of clusters? • Number of brokers per cluster? • Number of topics per broker? • Number of partitions per topic? • Cluster provisioning model • Only increase number of partitions; can’t decrease • Integration with a few of AWS Services such as Kinesis Data Analytics for Apache Flink • Operational Perspective • Number of Kinesis Data Streams? • Number of shards per stream? • Throughput provisioning model • Increase/Decrease number of shards • Fully Integration with AWS Services such as Lambda function, Kinesis Data Analytics, etc

Monitoring Metrics

RequestQueue - Length - WaitTime ResponseQueue - Length - WaitTime
Network - Packet Drop? Produce/Consume Rate Unbalance Who is Leader? Disk Full? Too many topics? Metrics to Monitor: MSK (Kafka)

Metrics to Monitor: MSK (Kafka) Metric Level Description ActiveControllerCount DEFAULT
Only one controller per cluster should be active at any given time. OfflinePartitionsCount DEFAULT Total number of partitions that are offline in the cluster. GlobalPartitionCount DEFAULT Total number of partitions across all brokers in the cluster. GlobalTopicCount DEFAULT Total number of topics across all brokers in the cluster. KafkaAppLogsDiskUsed DEFAULT The percentage of disk space used for application logs. KafkaDataLogsDiskUsed DEFAULT The percentage of disk space used for data logs. RootDiskUsed DEFAULT The percentage of the root disk used by the broker. PartitionCount PER_BROKER The number of partitions for the broker. LeaderCount PER_BROKER The number of leader replicas. UnderMinIsrPartitionCount PER_BROKER The number of under minIsr partitions for the broker. UnderReplicatedPartitions PER_BROKER The number of under-replicated partitions for the broker. FetchConsumerTotalTimeMsMean PER_BROKER The mean total time in milliseconds that consumers spend on fetching data from the broker. ProduceTotalTimeMsMean PER_BROKER The mean produce time in milliseconds.

How about monitoring Kinesis Data Streams? How long time does
a record stay in a shard? 5 transactions per second, per shard With only one consumer application, records can be retrieved every 200 ms up to 1MB or 1,000 records per seconds, per shard for writes • 10MB per second, per shard • up to 10,000 records per call Consumer Application GetRecords() Data

Metrics to Monitor: Kinesis Data Streams Metric Description GetRecords.IteratorAgeMilliseconds Age
of the last record in all GetRecords ReadProvisionedThroughputExceeded Number of GetRecords calls throttled WriteProvisionedThroughputExceeded Number of PutRecord(s) calls throttled PutRecord.Success, PutRecords.Success Number of successful PutRecord(s) operations GetRecords.Success Number of successful GetRecords operations

Choosing Right Metrics Too Much = Useless = Too Little

Kafka vs MSK vs Kinesis Data Streams Operational Excellence Kinesis
Data Streams Kafka Amazon MSK Degree of Freedom ≈ Complexity

Comparison Summary Attribute Apache Kafka Kinesis Streams Managed Streaming for
Kafka Cost $$$ $ (pay for what you use) $$ (pay for infrastructure) Ease of use Advanced setup required Get started in minutes Get started in minutes Management Overhead High Low Low Scalability Difficult to scale Scale in seconds with one click Scale in minutes with one click Throughput Infinite Scales with shards, supports up to 1mb payloads Infinite Durability Configurable 3x by default Configurable Infrastructure You manage AWS manages AWS manages Write-to-Read Latency <100 ms is achievable <100 ms (with HTTP/2) <100 ms is achievable Open Sourced? Yes No Yes

Reference Architecture

Data Hub: (Asynchronous) Event-Bus

Kinesis Data Streams Kinesis Data Firehose Amazon S3 Amazon EC2
AWS Lambda Amazon ECS Kinesis Data Analytics Amazon ES Amazon Athena Amazon CloudWatch https://aws.amazon.com/solutions/case-studies/autodesk-log-analytics/ Example Usage Pattern 1: Data Hub Amazon MSK

Log Aggregation Web servers access log Aggregated logs

Example Usage Pattern 2: Web Analytics and Leaderboards Amazon DynamoDB
Amazon Kinesis Data Analytics Amazon Kinesis Data Streams Amazon Cognito Lightweight JS client code Web server on Amazon EC2 OR Compute top 10 users Ingest web app data Persist to feed live apps Lambda function https://aws.amazon.com/solutions/implementations/real-time-web-analytics-with-kinesis/ Amazon MSK

IoT IoT Things Remote Control Prediction/ Fraud Detection Device Monitoring
Quality Control Data Visualization Events Analytics AI/ML

https://aws.amazon.com/blogs/aws/new-serverless-streaming-etl-with-aws-glue/ Example Usage Pattern 3: Monitoring IoT Devices Ingest sensor
data Convert json to parquet Store all data points in an S3 data lake AWS IoT Core IoT rule AWS Glue Streaming Job Amazon Athena Glue Crawler Glue Data Catalog S3 Bucket AWS Cloud MQTT Topic Amazon Kinesis Data Streams Raspberry PI + Sense HAT

Event Sourcing and CQRS https://www.confluent.io/blog/event-sourcing-cqrs-stream-processing-apache-kafka-whats-connection/ App Write Interface App Read
Interface Event Queue Application State Kafka Streams Topology Kafka Topic Event Handler App Write Interface App Read Interface Kafka Streams State Store Event Store Event Handler + App State Event Store

Amazon Kinesis Data Streams Amazon Kinesis Data Analytics (SQL) Example
Usage Pattern 4: Streaming SQL Continuous filter Aggregate function Data enrichment (join) S3 Bucket Anomaly Detection Ticker, Company AMZN, Amazon ASD, SomeCompanyA BAC, SomeCompanyB CRM, SomeCompanyC Event Store https://docs.aws.amazon.com/kinesisanalytics/latest/dev/examples.html App Write Interface App Read Interface {"TICKER_SYMBOL": "CVB", "SECTOR": "TECHNOLOGY", "CHANGE": 0.81, "PRICE": 53.63} {"TICKER_SYMBOL": "ABC", "SECTOR": "RETAIL", "CHANGE": -1.14, "PRICE": 23.64} {"TICKER_SYMBOL": "JKL", "SECTOR": "TECHNOLOGY", "CHANGE": 0.22, "PRICE": 15.32} Event Handler + App State join

Takeaways

Lambda Kappa Lambda vs Kappa Architecture

Key Takeaways • Distributed Queue as Stream Storage • Preserve
Ordering • Parallel Consumption • Persistent Buffer • Decouple producers & consumers • Trade-off: Operational Excellence vs Degree of Freedom • MUST keep an eye on the right monitoring metrics • Architectural Patterns • Data Hub: (Asynchronous) Event-Bus • Log Aggregation • IoT • Event Sourcing and CQRS

Where To Go Next? • Amazon MSK Labs https://amazonmsk-labs.workshop.aws/ •
Amazon Managed Streaming for Kafka: Best Practices https://docs.aws.amazon.com/msk/latest/developerguide/bestpractices.html • Monitoring Kafka performance metrics (2020-04-16) https://tinyurl.com/y6hrhwbq • Apache Kafka 모니터링을 위한 Metrics 이해 및 최적화 방안 (2018-11) https://tinyurl.com/y4uwyenx • AWS Analytics Immersion Day - Build BI System from Scratch • Workshop - https://tinyurl.com/yapgwv77 • Slides - https://tinyurl.com/ybxkb74b • Realtime Analytics on AWS https://tinyurl.com/y3evwm3v • Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1, 2 • Part1 - https://tinyurl.com/y8vo8q7o • Part2 - https://tinyurl.com/ycbv7wel

Choose Right Stream Storage: Amazon Kinesis Dat...

Choose Right Stream Storage: Amazon Kinesis Data Streams vs MSK

More Decks by Sungmin Kim

Other Decks in Programming

Featured

Transcript