Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE messaging architecture, how we use and contribute OSS / LINE Campus Talk in Hong Kong by Li Yan Kit, Wilson

LINE messaging architecture, how we use and contribute OSS / LINE Campus Talk in Hong Kong by Li Yan Kit, Wilson

25.03.2019 Campus Talk in HKUST, CUHK
26.03.2019 Campus Talk in HKU
Presented by Li Yan Kit, Wilson

LINE Developers

March 25, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. Self Introduction u 2007 – 2012: CE + iBBA, CUHK

    u 2012 – 2014: Mphil. in CSE, CUHK u 2015 – 2018: Research Scientist, Rakuten u 2018 – current: Development Engineer
  2. Requirements of LINE Server System u Fast Delivery u It

    is a INSTANT messenger! u Reliable Delivery u DON’T LOSE any messages! u Powerful Delivery u >5 billions of message per day u >164 millions of users in 4 dominant countries
  3. LEGY u API Gateway / Reverse Proxy u Contact Points

    of LINE Apps u Multi-DC deployment (FAST!) u Nginx was not enough, let’s develop our own! u From Scratch u In Erlang (Telecom Systems) u Low Latency (FAST!)
  4. Talk-server + Data Store u Web Application Server u Message

    Delivery u Friendship Management u Java + Spring + Thrift RPC u Redis – Cache u Hbase – Persistent Store
  5. OSS are (hidden) in the picture u Redis (Search Redis

    in Line) u https://www.slideshare.net/linecorp/redis-at-line- 99471322 u HBase (Search HBase in Line) u https://www.slideshare.net/linecorp/a-5-47983106 u (Armenia) u Open Sourced Async HTTP/2 RPC THRIFT Java Library u https://github.com/line/armeria
  6. IMF (Internal Message Flow) u Client Actions -> Async Processing

    System u Accounting System u Abuser Detection u Recommendation (Friends/News/Timeline) Talk-server Async Task Processing
  7. Apache Kafka u Distributed Streaming Platform u Distributed u Persistent

    (configurable retention period) u Message Queue u Pub-Sub Model u Built-in load balancing, failover
  8. What exactly am I doing? u DevOPS (Development , Operations)

    u Develop Tools / Client SDKs for easier use u Capacity Planning of the Kafka Cluster u Consulting users on development u SRE (Reliability Engineering) u Trouble Shooting when performance violates SLO (e.g. Latency increase) u Patch Kafka for big fix / performance improvement
  9. Traffic Surged -> Some replicas could not keep up u

    In-sync Replica u Message in Kafka are replicated (copied) to three servers u 1x Leader handles clients u 2x Followers fetch from Leader u If Follower cannot keep up, they got removed from the ISR
  10. Thoughts u Network Saturation? u NO! Servers are with 10Gbps

    Network (~4 Gbps usage) u Are we reaching the limit of server? u NO! We have 60 servers serving the topic u With the spec we have, one server can handle 10 Gbps (~1 Gbps) u Hmmm....... Distributed...... Is the load distributed evenly? u Let’s check how Kafka distributes load
  11. Kafka Load Distribution u Topic is sub-divided into multiple partitions

    u in this case we had 96 partitions u Partitions were assigned to servers (brokers) in Round-Robin Style u So it should be evenly distributed in local (topic) sense u Followers fetch from Leader continuously with multiple fetcher threads u 1 fetcher thread is able to fetch around 1 Gbps
  12. Hmm... These numbers... u 60 Brokers u 96 Partitions u

    1 Gbps per fetcher thread u 6.5 GBps (52 Gbps) u Each Partitions = 550 Mbps u No. of partitions in 1 server = 1 or 2 u Traffic in 1 server = 550 Mbps or 1.1 Gbps
  13. What’s wrong? u We had 6 fetcher threads in 1

    broker (configurable) u Kafka assigned the 2 partitions of 1 topic to 1 fetcher thread u Why not assigned to multiple threads? (Code Time!) u Hash as a shuffle, and distributes partitions to multiple fetchers u Looks legit....... NO! Not in our case Utils.abs(31 * topic.hashCode() + partitionId) % numFetchersPerBroker
  14. Partitions (7) -> Broker (6) Partitions -> Fetcher (3) 1

    2 3 4 5 6 7 b1 b2 b3 b4 b5 b6 b1 b1t1 b2t2 b3t3 b4t1 b5t2 b6t3 b1t1 Partitions Brokers Fetchers
  15. Partitions (7) -> Broker (6) Partitions -> Fetcher (3) 1

    2 3 4 5 6 7 b1 b2 b3 b4 b5 b6 b1 b1t1 b1t1 Partitions Brokers Fetchers
  16. Problem is “Collision” of two rings u Partitions to broker

    is done in RR u Partitions is done with RR too u For 60 brokers u partition i and i + 60 are on the same broker u For 6 fetcher u partition i, i + 6, ... I + 60 are on the same fetcher
  17. Benchmark with fetcher threads So 5 fetchers = 1 fetcher,

    What could be the number of brokers?
  18. What LINERs do u Solve real problems! u (Millions of

    Users) u Work on any problems! u (Tens of services: Messaging, Pay, Insurance, ...) u Meet LINE Friends! u (We have Brown in the office)