Slide 1

Slide 1 text

大量トラフィック を受けるトピックの レプリケーションボトルネック Li Yan Kit, Wilson LINE Corporation Replication Bottleneck of topic receiving huge traffic

Slide 2

Slide 2 text

Self Introduction 自己紹介

Slide 3

Slide 3 text

Who am I  Name: Li Yan Kit, Wilson  Company: LINE Corporation  Work: Development Engineer, Provides Company-Wide Kafka service

Slide 4

Slide 4 text

Around 2019-01-01 00:00 2019年1月1日午前0時0分

Slide 5

Slide 5 text

Happy New Year!

Slide 6

Slide 6 text

Traffic Increased, Kafka Latency Spiked

Slide 7

Slide 7 text

Replicas were out-of-sync Few partitions of the big topic now only had 2 replicas

Slide 8

Slide 8 text

No Problems in Network, CPU, Disk CPU: <30% Network: <50% Disk: <10%

Slide 9

Slide 9 text

Traffic Increased, Replica Lost Let’s check Kafka’s Replication

Slide 10

Slide 10 text

Kafkaのレプリケーションの仕組み How Kafka Replication works

Slide 11

Slide 11 text

Replication  Kafka replicates messages to multiple brokers  Default value is 3  Leader – Master Replica  Handles Clients Read (Consume) and Write (Produce) Requests  Keep tracks of Followers  Acknowledges produces after making sure Followers have caught up  Follower – Standby Replica  Reads from Leader and stores a copy

Slide 12

Slide 12 text

Illustration  For simplicity, we consider the case with 3 replicas, ack = -1 (all ISR) Leader Log msg1 msg2 Follower Follower Log msg1 msg2 Log msg1 msg2 HW User Purgatory

Slide 13

Slide 13 text

1. Leader Receives Produce Req.  Appends to local log  Adds a DelayedProduce Operation to Produce Purgatory  with new offset  with timeout Leader User Produce, msg3 1 Log msg1 msg2 msg3 2 Purgatory msg3 3

Slide 14

Slide 14 text

2. Leader Receives FetchFollower Req.  Follower sends fetch request to leaders with its log end offset  Leader replies with the message  Leader updates its record of follower offset Leader Log msg1 msg2 msg3 Follower msg3 Fetch, off3 1 2

Slide 15

Slide 15 text

3. Leader Updates the high watermark  When all replicas have fetched the new message  Completes DelayedProduce in the Produce Purgatory with offset smaller than the high watermark  Replies client that produce is completed Leader Log msg1 msg2 msg3 Follower Follower Log msg1 msg2 msg3 Log msg1 msg2 msg3 HW HW 1 User Purgatory msg3 2 OK, msg3

Slide 16

Slide 16 text

How partitions are assigned to brokers Brokerに対するPartition割り当て

Slide 17

Slide 17 text

Assign Partitions to Brokers  For a topic with n partitions  Pick a random BrokerID (N)  Assign Leader of partition i to broker N + i  Pick a random BrokerID (M)  Assign Followers of partition to broker M + i, M + i + 1  Avoid same (leader, follower) pattern Partition Leader Follower 0 5 2, 3 1 0 3, 4 2 1 4, 5 3 2 5, 0 4 3 0, 1 5 4 1, 2 6 5 3, 4 7 0 4, 5

Slide 18

Slide 18 text

Workload Distribution  Each broker is leader of 1~2 partitions  Each broker is follower of 2~3 partitions  Evenly distributed within topic  Round Robin  Randomized to evenly distributed across many topics Partition Leader Follower 0 5 2, 3 1 0 3, 4 2 1 4, 5 3 2 5, 0 4 3 0, 1 5 4 1, 2 6 5 3, 4 7 0 4, 5

Slide 19

Slide 19 text

Replica Fetcher Thread (Fetcher) Replica Fetcherスレッド (Fetcher)

Slide 20

Slide 20 text

Replica Fetcher Thread  Kafka spawns threads to perform the fetching in background  Sends Fetch Request to Leader  Receives New Messages  Stores New Messages to Log Leader Follower Fetch, off3 Fetch, off4 Fetch, off5 Fetch, off6

Slide 21

Slide 21 text

Throughput of single fetcher  Topic with 1 partition  Keep Producing ~1 Gbps

Slide 22

Slide 22 text

How partitions are assigned to fetchers Fetcherに対するPartition割り当て

Slide 23

Slide 23 text

num.replica.fetchers  Control the number of fetchers to one leader  Responsible for fetching partitions from one broker  Total Number of Fetchers  (No. of brokers – 1) * num.replica.fetchers Broker2 Broker1 Fetch, p0 Fetch, p1 Fetch, p2 Fetch, p3 Fetch, p4 Fetch, p5

Slide 24

Slide 24 text

Assign Partitions to Fetchers  For a topic with n partitions  Calculate hash of topic and modulo No. of Fetchers = K  Assign partition i to fetcher K + i Partition Fetcher 0 1 1 2 2 0 3 1 4 2 5 0 6 1 7 2

Slide 25

Slide 25 text

Workload Distribution  Each fetcher is fetching 2~3 partitions  Evenly distributed within topic  Round Robin  Randomized to evenly distributed across many topics Partition Fetcher 0 1 1 2 2 0 3 1 4 2 5 0 6 1 7 2

Slide 26

Slide 26 text

Collision of the two assignments 2つの割り当ての衝突

Slide 27

Slide 27 text

Partitions – Brokers + Fetcher - Brokers Partition Fetcher 0 1 1 2 2 0 3 1 4 2 5 0 6 1 7 2 Partition Leader Follower 0 5 2, 3 1 0 3, 4 2 1 4, 5 3 2 5, 0 4 3 0, 1 5 4 1, 2 6 5 3, 4 7 0 4, 5

Slide 28

Slide 28 text

Partitions – Brokers + Fetcher - Brokers Partition Leader Follower Fetcher 0 5 2 3 1 1 1 0 3 4 2 2 6 5 3 4 1 1 7 0 4 5 1 1

Slide 29

Slide 29 text

Partitions – Brokers + Fetcher - Brokers Partition Leader Follower Fetcher 0 5 2 3 1 1 1 0 3 4 2 2 6 5 3 4 1 1 7 0 4 5 2 2

Slide 30

Slide 30 text

Workload Distribution  Fetcher 1 in Broker 3 is responsible for 2 partitions but Fetcher 0 is idle  This breaks the load balancing  This is the two assignments are round robin and the modulus are multiple to each other Broker Fetcher Number of Partitions 3 0 0 1 2 2 1 4 0 1 1 0 2 2 5 0 1 1 1 2 1

Slide 31

Slide 31 text

Workload Distribution  Fetcher 1 in Broker 3 is responsible for 2 partitions but Fetcher 0 is idle  This breaks the load balancing  This is the two assignments are round robin and the modulus are multiple to each other Broker Fetcher Number of Partitions 3 0 0 1 2 2 1 4 0 1 1 0 2 2 5 0 1 1 1 2 1 We have 3 fetchers, we want to assign partitions to all of them. But it only assigns to 1 fetcher

Slide 32

Slide 32 text

Verification 検証

Slide 33

Slide 33 text

Testing the theory  Large Topic (# of partitions > # of brokers) : 40 partitions, 25 brokers  Multiples (# of brokers is multiples of # of fetchers) 1 2 3 4 5 6 7 8 9 10

Slide 34

Slide 34 text

Testing the theory  Large Topic (# of partitions > # of brokers) : 40 partitions, 25 brokers  Multiples (# of brokers is multiples of # of fetchers) 1 2 3 4 5 6 7 8 9 10

Slide 35

Slide 35 text

Testing the theory  Large Topic (# of partitions > # of brokers) : 40 partitions, 25 brokers  Multiples (# of brokers is multiples of # of fetchers) 1 2 3 4 5 6 7 8 9 10 25 (5 x 5) brokers 5 fetchers

Slide 36

Slide 36 text

Summary まとめ

Slide 37

Slide 37 text

Lesson Learned  Round Robin is not always perfect  Similar to multi-level load balancing preferably different hashing function  Avoid setting number of fetchers to be factor of number of brokers  Neither multiple of number of brokers

Slide 38

Slide 38 text

Q & A