strong messaging system. • Its core capabilities are ◦ To write and read streams of events with high performance ◦ To store streams of events durably and reliably • Messages are grouped into “Topic”. Producer Consumer Kafka Streams of Events Write → ← Read Store Topic Topic Topic
1 (Follower) Topic A - Partition 1 (Follower) Topic A - Partition 2 (Leader) Topic A - Partition 2 (Follower) Topic A - Partition 3 (Leader) Topic A - Partition 3 (Follower) Broker01 Topic A - Partition 1 (Leader) Topic A - Partition 2 (Follower) Topic A - Partition 3 (Follower) • Kafka replicates messages in each partition across brokers. • Partition Producer and Consumer handle is Leader partition, partition fetch messages from Leader partition fetching messages from Leader partition is Follower partition. • When a server fails, the Follower partition is automatically promoted to Leader Partition.
partitioning strategy. • Kery in a message is used to decide partition. partition 1 partition 2 partition 3 partition 4 P Message<Key, Value> Partitioner partition = hash(key) % numPartitions
Partition 1 (Follower) Topic A - Partition 1 (Follower) Broker01 Topic A - Partition 1 (Leader) Producer Fetch min.insync.replicas = 2 acks = 0 acks = 1 acks = all • Kafka provides a parameter to decide when to reply acknowledgment to Producer (acks) ◦ acks = 0 ▪ Producer will not wait for any acknowledgment from Brokers ◦ acks = 1 ▪ Producer will wait until leader partition write a record to its local log ◦ acks = all(-1) ▪ Producer will wait until leader partition writes a record to its local log and sync up follower partition to meet min.insync.replicas Fetch
• Consumer group is a set of consumers which cooperate to consumer data from topics. • When changes in the group, assigned partitions will be re-assigned among members. • After fetching messages, consumer commits offset to manage position. offset=7 offset=5 offset=3 offset=8
producer requirements? ◦ Produce into single DC? or dual produce? • What are consumer requirements? ◦ Does order matter to consumers? ◦ Does consumer handle duplicates? • Failover Requirements ◦ No Auto Client Failover ◦ No Offset Preservation • Asynchronous Replication acceptable? or Synchronous needed? • Data governance?
cluster stretched over one or multiple locations. ◦ synchronous replication base • This patterns are ◦ One Location Cluster ◦ Stretched Cluster ▪ 2 DCs ▪ 2.5 DCs ▪ 3 DCs 17
Consumer Read Write Description This is one cluster in one DC location layout and the simplest pattern among typical cluster patterns. This clustering pattern can handle node failure without any data loss but can’t cover DC failure without continuing operations. RPO = 0 and RTO is the time of promoting replica partition to leader partition. Note Fundamental cluster layout. Quorum Zookeeper Zookeeper Zookeeper Topic
Broker Broker Broker Broker Broker Zookeeper Broker Broker Broker Group Broker Broker Zookeeper DC#2 DC#1 Cluster Broker Broker Broker Zookeeper Broker Broker Broker Zookeeper DC#3 Zookeeper Quorum DC#2 DC#1 Cluster Broker Broker Broker Zookeeper Broker Broker Broker Zookeeper DC#3 Broker Broker Broker Zookeeper 2 DCs 2.5 DCs 3 DCs • Stretched Cluster is a big one cluster stretched over multiple DCs. • Can set up robust cluster across DCs, and handle failure easily with more DC locations. • Message is synchronously replicated over multiple locations. • Low network latency between all DCs is required.
2 DCs - 20 Description This is one cluster stretched over 2 DCs. Kafka Brokers are clustered across 2 DCs and Zookeeper is in a hierarchical quorum. Kafka Brokers connect to the local Zookeeper group. When DC failure, RTO > 0 because you need to remove Zookeeper group. RPO is dependent on min.insync.replica setting. Note Need to select failover strategy (Consistency vs Availability) Broker Broker Broker ZK ZK ZK Broker Broker Broker ZK ZK ZK Producer Consumer Topic Producer Consumer Read Write Read Write
zk1 zk2 zk3 zk4 zk5 zk6 • Hierarchical Quorum is like quorum of quorum. server.1=zk1:2888:3888 server.2=zk2:2888:3888 server.3=zk3:2888:3888 server.4=zk4:2888:3888 server.5=zk5:2888:3888 server.6=zk6:2888:3888 group.1=1:2:3 group.2=4:5:6
L Broker F Broker Zookeeper Zookeeper Zookeeper Broker F → L Broker F Broker Zookeeper Zookeeper Zookeeper Producer Failover 1 2 4 6 1 2 3 4 5 DC#1 failure occured Can’t elect Leader Partition Remove zk and group of outage from configuration and restart zk New Leader Partition is elected on DC#2 5 Change min.insync.replica 3 to 2 6 Producer can send messages to new Leader Partition Failback 1 2 Restore Zookeeper hierarchical quorum Restore min.insync.replicas min.insync.replicas=3 min.insync.replicas > (replication-factor / 2) #server.1=zk1:2888:3888 #server.2=zk2:2888:3888 #server.3=zk3:2888:3888 server.4=zk4:2888:3888 server.5=zk5:2888:3888 server.6=zk6:2888:3888 #group.1=1:2:3 #group.2=4:5:6 3 acks=all
23 Description This is one cluster stretched over 2 DCs + 1 DC for running a single Zookeeper. Zookeeper can maintain quorum across 3 DCs. RPO & RTO are 0 when a DC failure running a single Zookeeper. Note In terms of RPO, Consistency vs Availability consideration still exists when one DC running Kafka brokers. Broker Broker ZK Broker Broker ZK Producer Consumer Topic Producer Consumer Read Write Read Write DC#3 ZK Topic
- 24 Description This is one cluster stretched over 3 DCs. RTO & RPO are 0 when one DC failure. This pattern is the simplest and most robust among all patterns. Note This pattern is very common in public cloud (using multiple AZ) Broker Broker ZK Broker Broker ZK Producer Consumer Topic Producer Consumer Read Write Read Write ZK Broker Broker Producer Consumer Read Write Topic
clusters located in separate locations ◦ Asynchronous replication between clusters • This patterns are ◦ Active - Passive ◦ Active - Active ◦ Aggregation 26
Description Parimay cluster(Active) mirrors data to standby cluster(Passive) using MM2. When active site failure, You need to move Producer and Consumer applications to the standby site. RTO depends on how much Standby site warmed up. When it comes to RPO, data loss might happen because MM2 asynchronously copies data from active site to standby site. Note MM2 is running on the destination site. Applications are independently able to consumer mirrored data on the standby site. Broker Broker Broker ZK ZK ZK Broker Broker Broker ZK ZK ZK Producer Consumer Consumer Topic Topic Topic MM2 DC#1 → DC#2 Write Read Read
Description Two clusters bidirectional mirroring. Messages produced at both clusters are mirrored to each other. RPO might be less than the Active-Passive pattern because both sites are hot status. But you need to decide which is active when a problem happens. Note Because MM2 make destination topic with prefix, consumer applications might have to specify topics by a prefix like *.Topic. Broker Broker Broker ZK ZK ZK Broker Broker Broker ZK ZK ZK Producer Consumer Topic Producer Consumer DC#1.Topic DC#2.Topic Topic Write Read Read Write MM2 DC#1 → DC#2 MM2 DC#2 → DC#1
messages across multiple clusters into another cluster. This pattern can centrally analyze messages generated across multiple clusters. Note Hybrid & multi-cloud architecture is also similar to this pattern. Broker Broker Broker Consumer DC#1.Topic DC#2.Topic MM2 MM2 DC#1 Cluster Broker Broker Broker DC#1.Topic DC#2 Cluster Broker Broker Broker DC#2.Topic Read
layout, how can consumers resume from where they left off on the source cluster? • MM2 supports to sync commit offset by RemoteClusterUtil API. ◦ Transferring Commit Offset with MirrorMaker 2 • MirrorCheckpointConnector tracks offsets for consumer groups, and resume consuming on the destination cluster by using RemoteClusterUtil API.
• Kafka Streams & ksqlDB use internal topics to manage their state. • Avoid mirroring these internal topics to another cluster because consistency can be corrupted when launching applications on the destination site. Kafka Connect • Kafka Connect has to maintain consistency among related all settings and states (source system, destination system, Kafka Connect settings ,and running Connectors settings) Approach • Single cluster pattern is much safer for these components. • Running these components as standby on the destination site is another approach. (continue to update state information on the destination site and reduce RTO)
two ways. ◦ Single Cluster Pattern ◦ Multi Cluster Pattern • Single Cluster Pattern is ◦ The cluster stretched over multiple locations ◦ Synchronous replication ◦ Minimize RTO & RPO ◦ Simpler & easier to operate and maintain • Multi Cluster Pattern is ◦ The multiple clusters located in separate locations ◦ Asynchronous replication between clusters ◦ Preserving commit offset should be cared if needed ◦ Hybrid environments