Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Scalable Multi Datacenter ID Generator for LINE's Messaging Application

Scalable Multi Datacenter ID Generator for LINE's Messaging Application

Masahiro Ide
LINE / LINE Platform Development Center1 Messaging Platform Development / Senior Engineer

https://linedevday.linecorp.com/2021/ja/sessions/163
https://linedevday.linecorp.com/2021/en/sessions/163
https://linedevday.linecorp.com/2021/ko/sessions/163

LINE DEVDAY 2021

November 10, 2021
Tweet

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Transcript

  1. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. .FTTBHJOH "QQMJDBUJPO #BDLFOE
  2. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE
  3. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message
  4. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message
  5. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message Hello!
  6. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message Hello!
  7. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked Hello!
  8. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked Read Hello!
  9. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked Read Hello! How are you?
  10. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked 1’. send message 2’. recv message Read Hello! How are you?
  11. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked 1’. send message 2’. recv message Read How are you? Hello! How are you?
  12. LINE Messaging Application - Around 4 billion messages sent everyday

    - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked 1’. send message 2’. recv message Read How are you? Hello! … … … … How are you?
  13. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message

    3. read message 4. notify message checked … … 1’. send message 2’. recv message … Read How are you? Hello! … How are you? Message ID - Each messages are identified each other by Message ID - Globally Unique - Monotonically Increasing - 64bit Integer Assign a Message ID to the message
  14. Generate Message ID - Since we already have tons of

    Redis clusters internally, we use Redis INCRBY to implement this ID generator - Requirements - Fast - Simple - Low maintenance cost
  15. Generate Message ID using Redis - But… this setup has

    enough Reliability, Availability, and Scalability? - Single Master-Replica setup
  16. Case 1: Sudden death of Redis hosts - How to

    recover from this situation? Fallback to Replica? - Last generated ID is replicated to Replica correctly? - If not, new Master may generate duplicated ID - Users can’t resume conversation until we fix Redis hosts - Let’s consider what happens if Redis Master dies suddenly?
  17. Case 2: Network partition - Can we wait until someone

    fix network partition? - Can we block the user’s conversation…? - What if network between backends ⁵ Redis is partitioned?
  18. Then, how to fix it? - Requirements - Globally unique,

    Monotonically increasing ID - Existing messaging features rely on - e.g. Message order - Scalable - No breaking change at storage/API level - We already exposed to client apps. Cannot change this.
  19. Globally Unique Monotonically increasing ID at high scale - Twitter

    Snowflake - Timestamp based ID format - Monotonically Increase 64 bit integer ID - Scalable without sharing state
  20. Our Approach: Message ID Generator (MIG) - Reduce Timestamp resolution

    from 1msec to 10msec - Can use this format until 2188/12/24 - More Sequence bits (12 bits → 18 bits) - Prepare for big message bursting = 64 workers = 262,144 seq
  21. iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG

    .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 workerId:0 workerId:1
  22. iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG

    .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 workerId:0 workerId:1
  23. iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG

    .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 workerId:0 workerId:1
  24. iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG

    .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 MIG workerId:2 workerId:0 workerId:1
  25. iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG

    .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 369342118327746561 MIG workerId:2 workerId:0 workerId:1
  26. iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG

    .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 369342118327746561 MIG workerId:2 workerId:0 workerId:1
  27. iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG

    .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 369342118327746561 369342149328531952 MIG workerId:2 workerId:0 workerId:1
  28. Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG

    workerId:0 workerId:1 MIG workerId:2 MIG workerId:3
  29. Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG)

    MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3
  30. Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG)

    MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3 MIG MIG MIG MIG workerId:4 workerId:5 workerId:6 workerId:7
  31. Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG)

    MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3 MIG MIG MIG MIG workerId:4 workerId:5 workerId:6 workerId:7
  32. Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG)

    MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3 MIG MIG MIG MIG workerId:4 workerId:5 workerId:6 workerId:7 MIG MIG workerId:8 workerId:9
  33. Our Approach: Message ID Generator (MIG) - No!! We found

    critical issues - Case 1. Not monotonically increasing ID - Case 2. Duplicated message ID - Everything works fine with MIG?
  34. 3FDFJWFE*%GSPN.*( 370481812746797057 370481829322686465 370481821655498753 370481838399160321 370481839170912257 370483746157363201 370483746144780289 MIG Time

    Newer ID use older timestamp due to clock drift Case 1: Not monotonically increasing ID - Timestamp based ID is not resistant to clock drift
  35. - Timestamp based ID is not resistant to clock drift

    - Solution: 1. Keep timestamp monotonic increasing 2. Correct timestamp gradually Timestamp Time Clock drift happens With Slew mode, timestamp keeps increasing monotonically, gradually Case 1: Not monotonically increasing ID
  36. Case 2: Message lost/dup at Redis storage - Messaging platform

    uses Redis cluster as Storage - Heavy rely on Lua script for stored procedure Master Replica Master Replica Master Replica Redis Cluster Redis Cluster MIG HBase Cluster HBase Cluster … …
  37. Case 2: Message lost/dup at Redis storage - Message lost/dup

    were found during Test > GET userId:123:last-msg-id 36646745286099999 > GET 36646745286099999 “message-object" > EVAL "local lastId = redis.call('GET', 'userId:123:last-msg-id'); return redis.call('GET', tonumber(lastId))" 0
  38. Case 2: Message lost/dup at Redis storage - Message lost/dup

    were found during Test > GET userId:123:last-msg-id 36646745286099999 > GET 36646745286099999 “message-object" > EVAL "local lastId = redis.call('GET', 'userId:123:last-msg-id'); return redis.call('GET', tonumber(lastId))" 0 (nil) ??? Expected to return message-object but returned nil
  39. Case 2: Message lost/dup at Redis storage - Message lost/dup

    were found during Test > EVAL "return 36646745286099999" 0 (integer) 36646745286100000 > EVAL "return 36646745286100000" 0 (integer) 36646745286100000 > EVAL "return 36646745286100001" 0 (integer) 36646745286100000 > EVAL "return 36646745286100002" 0 (integer) 36646745286100000 > EVAL "return 36646745286100003" 0 (integer) 36646745286100000
  40. Case 2: Message lost/dup at Redis storage - Message lost/dup

    were found during Test > EVAL "return 36646745286099999" 0 (integer) 36646745286100000 > EVAL "return 36646745286100000" 0 (integer) 36646745286100000 > EVAL "return 36646745286100001" 0 (integer) 36646745286100000 > EVAL "return 36646745286100002" 0 (integer) 36646745286100000 > EVAL "return 36646745286100003" 0 (integer) 36646745286100000 Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio > print(36646745286099999) 3.66467452861e+16 At Lua, all IDs are represented as Number which can represent precisely integer if -253 < n < 253
  41. Wrap up of this talk - Migrate without breaking changes

    - Test and evaluate client & server very carefully - Understand technology internals to notice hidden bug - Made MIG - Timestamp-based ID generator - Provides a fast scalable ID generation