Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINE messaging architecture, how we use and contribute OSS / LINE Campus Talk in Hong Kong by Li Yan Kit, Wilson

LINE messaging architecture, how we use and contribute OSS / LINE Campus Talk in Hong Kong by Li Yan Kit, Wilson

25.03.2019 Campus Talk in HKUST, CUHK
26.03.2019 Campus Talk in HKU
Presented by Li Yan Kit, Wilson

A3966f193f4bef226a0d3e3c1f728d7f?s=128

LINE Developers
PRO

March 25, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. LINE messaging architecture, how we use and contribute OSS Li

    Yan Kit, Wilson LINE Corporation
  2. Who am I

  3. Self Introduction u 2007 – 2012: CE + iBBA, CUHK

    u 2012 – 2014: Mphil. in CSE, CUHK u 2015 – 2018: Research Scientist, Rakuten u 2018 – current: Development Engineer
  4. In short,

  5. What does LINE messaging architecture look like

  6. Requirements of LINE Server System u Fast Delivery u It

    is a INSTANT messenger! u Reliable Delivery u DON’T LOSE any messages! u Powerful Delivery u >5 billions of message per day u >164 millions of users in 4 dominant countries
  7. Messaging System Overview Data Store Talk-server LEGY DE Async Task

    Processing LEGY JP LEGY SG
  8. LEGY u API Gateway / Reverse Proxy u Contact Points

    of LINE Apps u Multi-DC deployment (FAST!) u Nginx was not enough, let’s develop our own! u From Scratch u In Erlang (Telecom Systems) u Low Latency (FAST!)
  9. Talk-server + Data Store u Web Application Server u Message

    Delivery u Friendship Management u Java + Spring + Thrift RPC u Redis – Cache u Hbase – Persistent Store
  10. OSS are (hidden) in the picture u Redis (Search Redis

    in Line) u https://www.slideshare.net/linecorp/redis-at-line- 99471322 u HBase (Search HBase in Line) u https://www.slideshare.net/linecorp/a-5-47983106 u (Armenia) u Open Sourced Async HTTP/2 RPC THRIFT Java Library u https://github.com/line/armeria
  11. Which part I am working on? (hidden in the overview)

  12. IMF (Internal Message Flow) u Client Actions -> Async Processing

    System u Accounting System u Abuser Detection u Recommendation (Friends/News/Timeline) Talk-server Async Task Processing
  13. Apache Kafka u Distributed Streaming Platform u Distributed u Persistent

    (configurable retention period) u Message Queue u Pub-Sub Model u Built-in load balancing, failover
  14. Example – Add a new friend

  15. What exactly am I doing? u DevOPS (Development , Operations)

    u Develop Tools / Client SDKs for easier use u Capacity Planning of the Kafka Cluster u Consulting users on development u SRE (Reliability Engineering) u Trouble Shooting when performance violates SLO (e.g. Latency increase) u Patch Kafka for big fix / performance improvement
  16. Example of troubleshooting

  17. Traffic Surged -> Some replicas could not keep up

  18. Traffic Surged -> Some replicas could not keep up u

    In-sync Replica u Message in Kafka are replicated (copied) to three servers u 1x Leader handles clients u 2x Followers fetch from Leader u If Follower cannot keep up, they got removed from the ISR
  19. Thoughts u Network Saturation? u NO! Servers are with 10Gbps

    Network (~4 Gbps usage) u Are we reaching the limit of server? u NO! We have 60 servers serving the topic u With the spec we have, one server can handle 10 Gbps (~1 Gbps) u Hmmm....... Distributed...... Is the load distributed evenly? u Let’s check how Kafka distributes load
  20. Kafka Load Distribution u Topic is sub-divided into multiple partitions

    u in this case we had 96 partitions u Partitions were assigned to servers (brokers) in Round-Robin Style u So it should be evenly distributed in local (topic) sense u Followers fetch from Leader continuously with multiple fetcher threads u 1 fetcher thread is able to fetch around 1 Gbps
  21. Hmm... These numbers... u 60 Brokers u 96 Partitions u

    1 Gbps per fetcher thread u 6.5 GBps (52 Gbps) u Each Partitions = 550 Mbps u No. of partitions in 1 server = 1 or 2 u Traffic in 1 server = 550 Mbps or 1.1 Gbps
  22. What’s wrong? u We had 6 fetcher threads in 1

    broker (configurable) u Kafka assigned the 2 partitions of 1 topic to 1 fetcher thread u Why not assigned to multiple threads? (Code Time!) u Hash as a shuffle, and distributes partitions to multiple fetchers u Looks legit....... NO! Not in our case Utils.abs(31 * topic.hashCode() + partitionId) % numFetchersPerBroker
  23. Partitions (7) -> Broker (6) Partitions -> Fetcher (3) 1

    2 3 4 5 6 7 b1 b2 b3 b4 b5 b6 b1 b1t1 b2t2 b3t3 b4t1 b5t2 b6t3 b1t1 Partitions Brokers Fetchers
  24. Partitions (7) -> Broker (6) Partitions -> Fetcher (3) 1

    2 3 4 5 6 7 b1 b2 b3 b4 b5 b6 b1 b1t1 b1t1 Partitions Brokers Fetchers
  25. Problem is “Collision” of two rings u Partitions to broker

    is done in RR u Partitions is done with RR too u For 60 brokers u partition i and i + 60 are on the same broker u For 6 fetcher u partition i, i + 6, ... I + 60 are on the same fetcher
  26. Benchmark with fetcher threads So 5 fetchers = 1 fetcher,

    What could be the number of brokers?
  27. What have I said?

  28. What LINERs do u Solve real problems! u (Millions of

    Users) u Work on any problems! u (Tens of services: Messaging, Pay, Insurance, ...) u Meet LINE Friends! u (We have Brown in the office)
  29. Thank you!