LINE messaging architecture, how we use and contribute OSS / LINE Campus Talk in Hong Kong by Li Yan Kit, Wilson

LINE messaging architecture, how we use and contribute OSS Li
Yan Kit, Wilson LINE Corporation

Who am I

Self Introduction u 2007 – 2012: CE + iBBA, CUHK
u 2012 – 2014: Mphil. in CSE, CUHK u 2015 – 2018: Research Scientist, Rakuten u 2018 – current: Development Engineer

In short,

What does LINE messaging architecture look like

Requirements of LINE Server System u Fast Delivery u It
is a INSTANT messenger! u Reliable Delivery u DON’T LOSE any messages! u Powerful Delivery u >5 billions of message per day u >164 millions of users in 4 dominant countries

Messaging System Overview Data Store Talk-server LEGY DE Async Task
Processing LEGY JP LEGY SG

LEGY u API Gateway / Reverse Proxy u Contact Points
of LINE Apps u Multi-DC deployment (FAST!) u Nginx was not enough, let’s develop our own! u From Scratch u In Erlang (Telecom Systems) u Low Latency (FAST!)

Talk-server + Data Store u Web Application Server u Message
Delivery u Friendship Management u Java + Spring + Thrift RPC u Redis – Cache u Hbase – Persistent Store

OSS are (hidden) in the picture u Redis (Search Redis
in Line) u https://www.slideshare.net/linecorp/redis-at-line- 99471322 u HBase (Search HBase in Line) u https://www.slideshare.net/linecorp/a-5-47983106 u (Armenia) u Open Sourced Async HTTP/2 RPC THRIFT Java Library u https://github.com/line/armeria

Which part I am working on? (hidden in the overview)

IMF (Internal Message Flow) u Client Actions -> Async Processing
System u Accounting System u Abuser Detection u Recommendation (Friends/News/Timeline) Talk-server Async Task Processing

Apache Kafka u Distributed Streaming Platform u Distributed u Persistent
(configurable retention period) u Message Queue u Pub-Sub Model u Built-in load balancing, failover

Example – Add a new friend

What exactly am I doing? u DevOPS (Development , Operations)
u Develop Tools / Client SDKs for easier use u Capacity Planning of the Kafka Cluster u Consulting users on development u SRE (Reliability Engineering) u Trouble Shooting when performance violates SLO (e.g. Latency increase) u Patch Kafka for big fix / performance improvement

Example of troubleshooting

Traffic Surged -> Some replicas could not keep up

Traffic Surged -> Some replicas could not keep up u
In-sync Replica u Message in Kafka are replicated (copied) to three servers u 1x Leader handles clients u 2x Followers fetch from Leader u If Follower cannot keep up, they got removed from the ISR

Thoughts u Network Saturation? u NO! Servers are with 10Gbps
Network (~4 Gbps usage) u Are we reaching the limit of server? u NO! We have 60 servers serving the topic u With the spec we have, one server can handle 10 Gbps (~1 Gbps) u Hmmm....... Distributed...... Is the load distributed evenly? u Let’s check how Kafka distributes load

Kafka Load Distribution u Topic is sub-divided into multiple partitions
u in this case we had 96 partitions u Partitions were assigned to servers (brokers) in Round-Robin Style u So it should be evenly distributed in local (topic) sense u Followers fetch from Leader continuously with multiple fetcher threads u 1 fetcher thread is able to fetch around 1 Gbps

Hmm... These numbers... u 60 Brokers u 96 Partitions u
1 Gbps per fetcher thread u 6.5 GBps (52 Gbps) u Each Partitions = 550 Mbps u No. of partitions in 1 server = 1 or 2 u Traffic in 1 server = 550 Mbps or 1.1 Gbps

What’s wrong? u We had 6 fetcher threads in 1
broker (configurable) u Kafka assigned the 2 partitions of 1 topic to 1 fetcher thread u Why not assigned to multiple threads? (Code Time!) u Hash as a shuffle, and distributes partitions to multiple fetchers u Looks legit....... NO! Not in our case Utils.abs(31 * topic.hashCode() + partitionId) % numFetchersPerBroker

Partitions (7) -> Broker (6) Partitions -> Fetcher (3) 1
2 3 4 5 6 7 b1 b2 b3 b4 b5 b6 b1 b1t1 b2t2 b3t3 b4t1 b5t2 b6t3 b1t1 Partitions Brokers Fetchers

Partitions (7) -> Broker (6) Partitions -> Fetcher (3) 1
2 3 4 5 6 7 b1 b2 b3 b4 b5 b6 b1 b1t1 b1t1 Partitions Brokers Fetchers

Problem is “Collision” of two rings u Partitions to broker
is done in RR u Partitions is done with RR too u For 60 brokers u partition i and i + 60 are on the same broker u For 6 fetcher u partition i, i + 6, ... I + 60 are on the same fetcher

Benchmark with fetcher threads So 5 fetchers = 1 fetcher,
What could be the number of brokers?

What have I said?

What LINERs do u Solve real problems! u (Millions of
Users) u Work on any problems! u (Tens of services: Messaging, Pay, Insurance, ...) u Meet LINE Friends! u (We have Brown in the office)

Thank you!

LINE messaging architecture, how we use and con...

LINE messaging architecture, how we use and contribute OSS / LINE Campus Talk in Hong Kong by Li Yan Kit, Wilson

LINE Developers

More Decks by LINE Developers

Other Decks in Technology

Featured

Transcript