大量トラフィックを受けるトピックのレプリケーションボトルネック

大量トラフィックを受けるトピックのレプリケーションボトルネック Li Yan Kit, Wilson LINE Corporation Replication
Bottleneck of topic receiving huge traffic

Self Introduction 自己紹介

Who am I  Name: Li Yan Kit, Wilson 
Company: LINE Corporation  Work: Development Engineer, Provides Company-Wide Kafka service

Around 2019-01-01 00:00 2019年1月1日午前0時0分

Happy New Year!

Traffic Increased, Kafka Latency Spiked

Replicas were out-of-sync Few partitions of the big topic now
only had 2 replicas

No Problems in Network, CPU, Disk CPU: <30% Network: <50%
Disk: <10%

Traffic Increased, Replica Lost Let’s check Kafka’s Replication

Kafkaのレプリケーションの仕組み How Kafka Replication works

Replication  Kafka replicates messages to multiple brokers  Default
value is 3  Leader – Master Replica  Handles Clients Read (Consume) and Write (Produce) Requests  Keep tracks of Followers  Acknowledges produces after making sure Followers have caught up  Follower – Standby Replica  Reads from Leader and stores a copy

Illustration  For simplicity, we consider the case with 3
replicas, ack = -1 (all ISR) Leader Log msg1 msg2 Follower Follower Log msg1 msg2 Log msg1 msg2 HW User Purgatory

1. Leader Receives Produce Req.  Appends to local log
 Adds a DelayedProduce Operation to Produce Purgatory  with new offset  with timeout Leader User Produce, msg3 1 Log msg1 msg2 msg3 2 Purgatory msg3 3

2. Leader Receives FetchFollower Req.  Follower sends fetch request
to leaders with its log end offset  Leader replies with the message  Leader updates its record of follower offset Leader Log msg1 msg2 msg3 Follower msg3 Fetch, off3 1 2

3. Leader Updates the high watermark  When all replicas
have fetched the new message  Completes DelayedProduce in the Produce Purgatory with offset smaller than the high watermark  Replies client that produce is completed Leader Log msg1 msg2 msg3 Follower Follower Log msg1 msg2 msg3 Log msg1 msg2 msg3 HW HW 1 User Purgatory msg3 2 OK, msg3

How partitions are assigned to brokers Brokerに対するPartition割り当て

Assign Partitions to Brokers  For a topic with n
partitions  Pick a random BrokerID (N)  Assign Leader of partition i to broker N + i  Pick a random BrokerID (M)  Assign Followers of partition to broker M + i, M + i + 1  Avoid same (leader, follower) pattern Partition Leader Follower 0 5 2, 3 1 0 3, 4 2 1 4, 5 3 2 5, 0 4 3 0, 1 5 4 1, 2 6 5 3, 4 7 0 4, 5

Workload Distribution  Each broker is leader of 1~2 partitions
 Each broker is follower of 2~3 partitions  Evenly distributed within topic  Round Robin  Randomized to evenly distributed across many topics Partition Leader Follower 0 5 2, 3 1 0 3, 4 2 1 4, 5 3 2 5, 0 4 3 0, 1 5 4 1, 2 6 5 3, 4 7 0 4, 5

Replica Fetcher Thread (Fetcher) Replica Fetcherスレッド (Fetcher)

Replica Fetcher Thread  Kafka spawns threads to perform the
fetching in background  Sends Fetch Request to Leader  Receives New Messages  Stores New Messages to Log Leader Follower Fetch, off3 Fetch, off4 Fetch, off5 Fetch, off6

Throughput of single fetcher  Topic with 1 partition 
Keep Producing ~1 Gbps

How partitions are assigned to fetchers Fetcherに対するPartition割り当て

num.replica.fetchers  Control the number of fetchers to one leader
 Responsible for fetching partitions from one broker  Total Number of Fetchers  (No. of brokers – 1) * num.replica.fetchers Broker2 Broker1 Fetch, p0 Fetch, p1 Fetch, p2 Fetch, p3 Fetch, p4 Fetch, p5

Assign Partitions to Fetchers  For a topic with n
partitions  Calculate hash of topic and modulo No. of Fetchers = K  Assign partition i to fetcher K + i Partition Fetcher 0 1 1 2 2 0 3 1 4 2 5 0 6 1 7 2

Workload Distribution  Each fetcher is fetching 2~3 partitions 
Evenly distributed within topic  Round Robin  Randomized to evenly distributed across many topics Partition Fetcher 0 1 1 2 2 0 3 1 4 2 5 0 6 1 7 2

Collision of the two assignments 2つの割り当ての衝突

Partitions – Brokers + Fetcher - Brokers Partition Fetcher 0
1 1 2 2 0 3 1 4 2 5 0 6 1 7 2 Partition Leader Follower 0 5 2, 3 1 0 3, 4 2 1 4, 5 3 2 5, 0 4 3 0, 1 5 4 1, 2 6 5 3, 4 7 0 4, 5

Partitions – Brokers + Fetcher - Brokers Partition Leader Follower
Fetcher 0 5 2 3 1 1 1 0 3 4 2 2 6 5 3 4 1 1 7 0 4 5 1 1

Partitions – Brokers + Fetcher - Brokers Partition Leader Follower
Fetcher 0 5 2 3 1 1 1 0 3 4 2 2 6 5 3 4 1 1 7 0 4 5 2 2

Workload Distribution  Fetcher 1 in Broker 3 is responsible
for 2 partitions but Fetcher 0 is idle  This breaks the load balancing  This is the two assignments are round robin and the modulus are multiple to each other Broker Fetcher Number of Partitions 3 0 0 1 2 2 1 4 0 1 1 0 2 2 5 0 1 1 1 2 1

Workload Distribution  Fetcher 1 in Broker 3 is responsible
for 2 partitions but Fetcher 0 is idle  This breaks the load balancing  This is the two assignments are round robin and the modulus are multiple to each other Broker Fetcher Number of Partitions 3 0 0 1 2 2 1 4 0 1 1 0 2 2 5 0 1 1 1 2 1 We have 3 fetchers, we want to assign partitions to all of them. But it only assigns to 1 fetcher

Verification 検証

Testing the theory  Large Topic (# of partitions >
# of brokers) : 40 partitions, 25 brokers  Multiples (# of brokers is multiples of # of fetchers) 1 2 3 4 5 6 7 8 9 10

Testing the theory  Large Topic (# of partitions >
# of brokers) : 40 partitions, 25 brokers  Multiples (# of brokers is multiples of # of fetchers) 1 2 3 4 5 6 7 8 9 10 25 (5 x 5) brokers 5 fetchers

Summary まとめ

Lesson Learned  Round Robin is not always perfect 
Similar to multi-level load balancing preferably different hashing function  Avoid setting number of fetchers to be factor of number of brokers  Neither multiple of number of brokers

大量トラフィックを受けるトピックのレプリケーションボトルネック

大量トラフィックを受けるトピックのレプリケーションボトルネック

Other Decks in Technology

Featured

Transcript