Cloud Native Kafka - 分散データ基盤がクラウドネイティブを目指すということ

Cloud Native Days Tokyo 2022 Cloud Native Kafka - When
a Data Platform Aims to Be Cloud Native Shinichi Hashitani, Solutions Engineer, Nov. 2022.

Enter Apache Kafka

@ShinHashitani | developer.confluent.io #1 Durable Event Driven Architecture Kafkaによる高耐性イベント駆動一般的なメッセージブローカーと異な
り、Kafkaを利用したイベント駆動では本格的な運用では大きなメリットがある。 • 極めて高いスケーラビリティ • バックプレッシャーが不要 • イベントの長期保全 • Order Guarantee • Exactly Once Semantics • Fan-out • ストレージ/OLAPドメインとの接続 Service 2 Service 5 Service 6 Service 1 Service 4 Service 3

@ShinHashitani | developer.confluent.io #2 Streaming ETL Databases RDBMS/NoSQL Files CSV/JSON/XML…
Application Events Webhook SaaS Applications REST APIs Data Warehouse Analytics Continuous Data Processing Stream Processing Databases OLTPとOLAPを繋ぐリアルタイムパイプライン • バッチ前提であるデータの流れをリアルタイム化 • これまで必要であった中間的なデータストアや様々な処理ロジックをストリーム処理化 • データは必要なストアに必要な形態まで一次処理をリアルタイムで実施した後に連携。

@ShinHashitani | developer.confluent.io #3 Real Time Action Data Warehouse Analytics
Stream Processing Databases 今起こったイベントに反応ストリームがシンクに到達するまでにアクショナブルなリアルタイムイベントに変換 • 不正検知 • 数百件の捜査対象を数億件のイベントから抽出 • リアルタイムな状況アップデート（Daily/Hourlyステータス） Backend Service Alert Immediate Action Needed

@ShinHashitani | developer.confluent.io Stream = A Series of Continuous Events
customer login: abc order confirmed: #001 order updated: #002 customer login: efg order canceled: #003 package received: #a01 at dist center: #b02 left dist center: #a02 delivered: #a01 customer C: 0001 order U: 0003 payment U: 0002 payment C: 0003 customer U: 0002 store-order order conﬁrmed: #001 order updated: #002 order canceled: #003 store-customer customer login: abc customer login: efg logistic package received: #a01 left dist center: #a02 delivered: #a01 at dist center: #b02 orderdb-c customer C: 0001 customer U: 0002 orderdb-o order U: 0003 orderdb-p payment C: 0003 payment U: 0002

@ShinHashitani | developer.confluent.io Why Kafka? - Kafka Keeps Data Consistent
7 イベントはトランザクションログとして保存イベントはログとして永続化され、同じイベントを何度でも読み込み処理する事が可能。Pullモデルでもある為、イベントを漏れなく順序通り高速に連携出来る仕組みとなっている。 customer login order conﬁrmed order updated customer logout order canceled Append-Only Immutable 1 2 3 4 5 6 8 7 10 9 11 12 1 2 3 4 5 6 8 7 Old New

@ShinHashitani | developer.confluent.io Event, Total Order, and Data Consistency “Streams
and Tables in Apache Kafka: A Primer”, Michael Noll, Conﬂuent Blog. チェスの一手一手とチェス盤の状態は同じデータの異なる表現方法。 • チェス盤はある特定時点での完全な状態 (State) を表現できる。 • チェスの一手一手を漏れなく、順序通り適用すればチェス盤の状態を再現できる。

@ShinHashitani | developer.confluent.io Kafka is a Durable Storage Broker 1
Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition1 Topic1 partition1 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4 Concurrent Access Data Replication

Communication with Kafka

@ShinHashitani | developer.confluent.io Communications Are All Direct Transaction partition1 Transaction
partition2 Transaction partition3 Transaction partition4 Application partition1 Application partition2 Application partition3 Payment partition1 Payment partition2 The app knows to whom it should send events to.

@ShinHashitani | developer.confluent.io Kafka Producer and Broker Producer Client Partitioning
is done at the client side. Batching and Compression are done per broker.

@ShinHashitani | developer.confluent.io Producer Internal Process Serializer • スキーマの確認。 •
Schema Registryを利用する場合はキャッシュを確認。キャッシュに無い場合にはSchema Registryからフェッチ。 • スキーマ定義に合わせてシリアライズ。 Partitioner • Javaクライアントの場合はmurmur2を利用してキーのハッシュ化。 • Partitionerが指定されている場合はそのルールに基づきPartitionを決定。 • キーがない場合はラウンドロビンでPartition を決定。 Sender thread • バッチをターゲット Broker毎にグループ化。 • データを1リクエストとして転送。 Record accumulator • Partition毎にバッファリング。 Compression • Broker単位に圧縮。 • Brokerへのデータ転送量を最適化。 • Broker間のレプリケーション負荷も最適化。 record batch request

Kafka Internals

@ShinHashitani | developer.confluent.io Control Plane and Data Plane Kafka Cluster
Broker Broker Data Plane Broker Broker Controller Broker Broker Zookeeper Zookeeper Zookeeper Control Plane • Cluster Membership • Broker and Controller • Topic Partition and Partition Leader • Access Control List

@ShinHashitani | developer.confluent.io Inside The Apache Kafka Broker Broker Broker
Broker Broker Broker Broker

@ShinHashitani | developer.confluent.io Network Thread Adds Request to Queue

@ShinHashitani | developer.confluent.io IO Thread Veriﬁes Record Batch And Stores

@ShinHashitani | developer.confluent.io Kafka Physical Storage /var/lib/kafka/data/account-deposits-1 00000000000047926734.log 00000000000047926734.index ...
00000000000052497535.log 00000000000052497535.index ...

@ShinHashitani | developer.confluent.io Purgatory Holds Requests Being Replicated

@ShinHashitani | developer.confluent.io Response Added to Socket Send Buffer

KIP Kafka Improvement Proposal

Segment 1 Segment 2 KIP-405 - Kafka Tiered Storage 24
KIP-405: Kafka Tiered Storage Segment 1 Segment 2 Segment 3 Active Segment Segment 1 Segment 2 • ComputeとStorageの密結合 • 高速 - Disk I/O • ログ保全コスト大 • 障害時 - 大量データのリバランス • ComputeとStorageの分離 • 低速 - Network I/O • ログ保全コスト小 • 障害時 - 限定的なデータのリバランス Read Write Read Offset = earliest

KIP-500 - Replace Zookeeper with a Self-Managed Metadata Quorum 25
KIP-500: Replace Zookeeper with a Self-Managed Metadata Quorum • 別途Zookeeperクラスタ必要 • Zookeeperで合意形成 • Controllerがメタデータ提供 ZK ZK ZK • 別途Zookeeperクラスタ不要 • Controllerx3+で合意形成 • Controllerがメタデータ提供

KIP-595 - A Raft Protocol for Metadata Quorum 26 •
メタデータ合意 (Leader/Follower) • Leader Election KIP-595: A Raft Protocol for Metadata Quorum メタデータの永続化ストアからログベースへ元々Kafkaはユーザーデータ/メタデータのそれぞれをログベースで行っており、Raftモデルとの親和性は高い。メタデータは内部Topic __cluster_metadata にて管理。スナップショットとTopicを利用して迅速にメタデータを復旧。

KIP-630 - Kafka Raft Snapshot 27 KIP-630: Kafka Raft Snapshot
0からのState復旧 Controllerはメタデータログから状態 (State) を更新しメモリに保存、合わせて永続化している。この Metadata Stateを再現する為にはログが必要だが、ログは絶えず増えState再現にかかる時間も比例的に増大する。 Active Controllerはあるコミットされた状態で定期てにMetadata Storeのスナップショットを取得。新たに参加 (新規/復旧) する新しいControllerはスナップショットを始点として必要なオフセットからメタデータログを消化しStateを再現する。

Your Apache Kafka® journey begins here developer.conﬂuent.io

Cloud Native Kafka - 分散データ基盤がクラウドネイティブを目指すということ

Cloud Native Kafka - 分散データ基盤がクラウドネイティブを目指すということ

hashi

More Decks by hashi

Other Decks in Technology

Featured

Transcript

Cloud Native Days Tokyo 2022 Cloud Native Kafka - When

Enter Apache Kafka

@ShinHashitani | developer.confluent.io #1 Durable Event Driven Architecture Kafkaによる高耐性イベント駆動一般的なメッセージブローカーと異な

@ShinHashitani | developer.confluent.io #2 Streaming ETL Databases RDBMS/NoSQL Files CSV/JSON/XML…

@ShinHashitani | developer.confluent.io #3 Real Time Action Data Warehouse Analytics

@ShinHashitani | developer.confluent.io Stream = A Series of Continuous Events

@ShinHashitani | developer.confluent.io Why Kafka? - Kafka Keeps Data Consistent

@ShinHashitani | developer.confluent.io Event, Total Order, and Data Consistency “Streams

@ShinHashitani | developer.confluent.io Kafka is a Durable Storage Broker 1

Communication with Kafka

@ShinHashitani | developer.confluent.io Communications Are All Direct Transaction partition1 Transaction

@ShinHashitani | developer.confluent.io Kafka Producer and Broker Producer Client Partitioning

@ShinHashitani | developer.confluent.io Producer Internal Process Serializer • スキーマの確認。 •

Kafka Internals

@ShinHashitani | developer.confluent.io Control Plane and Data Plane Kafka Cluster

@ShinHashitani | developer.confluent.io Inside The Apache Kafka Broker Broker Broker

@ShinHashitani | developer.confluent.io Network Thread Adds Request to Queue

@ShinHashitani | developer.confluent.io IO Thread Veriﬁes Record Batch And Stores

@ShinHashitani | developer.confluent.io Kafka Physical Storage /var/lib/kafka/data/account-deposits-1 00000000000047926734.log 00000000000047926734.index ...

@ShinHashitani | developer.confluent.io Purgatory Holds Requests Being Replicated

@ShinHashitani | developer.confluent.io Response Added to Socket Send Buffer

KIPs

KIP Kafka Improvement Proposal

Segment 1 Segment 2 KIP-405 - Kafka Tiered Storage 24

KIP-500 - Replace Zookeeper with a Self-Managed Metadata Quorum 25

KIP-595 - A Raft Protocol for Metadata Quorum 26 •

KIP-630 - Kafka Raft Snapshot 27 KIP-630: Kafka Raft Snapshot

Your Apache Kafka® journey begins here developer.conﬂuent.io