is a INSTANT messenger! u Reliable Delivery u DON’T LOSE any messages! u Powerful Delivery u >5 billions of message per day u >164 millions of users in 4 dominant countries
of LINE Apps u Multi-DC deployment (FAST!) u Nginx was not enough, let’s develop our own! u From Scratch u In Erlang (Telecom Systems) u Low Latency (FAST!)
in Line) u https://www.slideshare.net/linecorp/redis-at-line- 99471322 u HBase (Search HBase in Line) u https://www.slideshare.net/linecorp/a-5-47983106 u (Armenia) u Open Sourced Async HTTP/2 RPC THRIFT Java Library u https://github.com/line/armeria
u Develop Tools / Client SDKs for easier use u Capacity Planning of the Kafka Cluster u Consulting users on development u SRE (Reliability Engineering) u Trouble Shooting when performance violates SLO (e.g. Latency increase) u Patch Kafka for big fix / performance improvement
In-sync Replica u Message in Kafka are replicated (copied) to three servers u 1x Leader handles clients u 2x Followers fetch from Leader u If Follower cannot keep up, they got removed from the ISR
Network (~4 Gbps usage) u Are we reaching the limit of server? u NO! We have 60 servers serving the topic u With the spec we have, one server can handle 10 Gbps (~1 Gbps) u Hmmm....... Distributed...... Is the load distributed evenly? u Let’s check how Kafka distributes load
u in this case we had 96 partitions u Partitions were assigned to servers (brokers) in Round-Robin Style u So it should be evenly distributed in local (topic) sense u Followers fetch from Leader continuously with multiple fetcher threads u 1 fetcher thread is able to fetch around 1 Gbps
1 Gbps per fetcher thread u 6.5 GBps (52 Gbps) u Each Partitions = 550 Mbps u No. of partitions in 1 server = 1 or 2 u Traffic in 1 server = 550 Mbps or 1.1 Gbps
broker (configurable) u Kafka assigned the 2 partitions of 1 topic to 1 fetcher thread u Why not assigned to multiple threads? (Code Time!) u Hash as a shuffle, and distributes partitions to multiple fetchers u Looks legit....... NO! Not in our case Utils.abs(31 * topic.hashCode() + partitionId) % numFetchersPerBroker
is done in RR u Partitions is done with RR too u For 60 brokers u partition i and i + 60 are on the same broker u For 6 fetcher u partition i, i + 6, ... I + 60 are on the same fetcher