Slide 1

Slide 1 text

No content

Slide 2

Slide 2 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. .FTTBHJOH "QQMJDBUJPO #BDLFOE

Slide 3

Slide 3 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE

Slide 4

Slide 4 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message

Slide 5

Slide 5 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message

Slide 6

Slide 6 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message Hello!

Slide 7

Slide 7 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message Hello!

Slide 8

Slide 8 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked Hello!

Slide 9

Slide 9 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked Read Hello!

Slide 10

Slide 10 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked Read Hello! How are you?

Slide 11

Slide 11 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked 1’. send message 2’. recv message Read Hello! How are you?

Slide 12

Slide 12 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked 1’. send message 2’. recv message Read How are you? Hello! How are you?

Slide 13

Slide 13 text

LINE Messaging Application - Around 4 billion messages sent everyday - 200 Million active users approx. Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked 1’. send message 2’. recv message Read How are you? Hello! … … … … How are you?

Slide 14

Slide 14 text

Hello! .FTTBHJOH "QQMJDBUJPO #BDLFOE 1. send message 2. recv message 3. read message 4. notify message checked … … 1’. send message 2’. recv message … Read How are you? Hello! … How are you? Message ID - Each messages are identified each other by Message ID - Globally Unique - Monotonically Increasing - 64bit Integer Assign a Message ID to the message

Slide 15

Slide 15 text

Generate Message ID - Since we already have tons of Redis clusters internally, we use Redis INCRBY to implement this ID generator - Requirements - Fast - Simple - Low maintenance cost

Slide 16

Slide 16 text

Generate Message ID using Redis - But… this setup has enough Reliability, Availability, and Scalability? - Single Master-Replica setup

Slide 17

Slide 17 text

Case 1: Sudden death of Redis hosts - How to recover from this situation? Fallback to Replica? - Last generated ID is replicated to Replica correctly? - If not, new Master may generate duplicated ID - Users can’t resume conversation until we fix Redis hosts - Let’s consider what happens if Redis Master dies suddenly?

Slide 18

Slide 18 text

Case 2: Network partition - Can we wait until someone fix network partition? - Can we block the user’s conversation…? - What if network between backends ⁵ Redis is partitioned?

Slide 19

Slide 19 text

Then, how to fix it? - Requirements - Globally unique, Monotonically increasing ID - Existing messaging features rely on - e.g. Message order - Scalable - No breaking change at storage/API level - We already exposed to client apps. Cannot change this.

Slide 20

Slide 20 text

Globally Unique Monotonically increasing ID at high scale - Twitter Snowflake - Timestamp based ID format - Monotonically Increase 64 bit integer ID - Scalable without sharing state

Slide 21

Slide 21 text

Our Approach: Message ID Generator (MIG) - Reduce Timestamp resolution from 1msec to 10msec - Can use this format until 2188/12/24 - More Sequence bits (12 bits → 18 bits) - Prepare for big message bursting = 64 workers = 262,144 seq

Slide 22

Slide 22 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E workerId:0 workerId:1

Slide 23

Slide 23 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 workerId:0 workerId:1

Slide 24

Slide 24 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 workerId:0 workerId:1

Slide 25

Slide 25 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 workerId:0 workerId:1

Slide 26

Slide 26 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 MIG workerId:2 workerId:0 workerId:1

Slide 27

Slide 27 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 369342118327746561 MIG workerId:2 workerId:0 workerId:1

Slide 28

Slide 28 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 369342118327746561 MIG workerId:2 workerId:0 workerId:1

Slide 29

Slide 29 text

iDC 1 Our Approach: Message ID Generator (MIG) MIG MIG .FTTBHF*E 369341229940604929 369341242657734657 369341264921100289 369341276933586945 369342083984785409 369342118327746561 369342149328531952 MIG workerId:2 workerId:0 workerId:1

Slide 30

Slide 30 text

Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2

Slide 31

Slide 31 text

Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2

Slide 32

Slide 32 text

Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2

Slide 33

Slide 33 text

Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3

Slide 34

Slide 34 text

Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3

Slide 35

Slide 35 text

Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3 MIG MIG MIG MIG workerId:4 workerId:5 workerId:6 workerId:7

Slide 36

Slide 36 text

Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3 MIG MIG MIG MIG workerId:4 workerId:5 workerId:6 workerId:7

Slide 37

Slide 37 text

Datacenter 3 Datacenter 1 Our Approach: Message ID Generator (MIG) MIG MIG workerId:0 workerId:1 MIG workerId:2 MIG workerId:3 MIG MIG MIG MIG workerId:4 workerId:5 workerId:6 workerId:7 MIG MIG workerId:8 workerId:9

Slide 38

Slide 38 text

Our Approach: Message ID Generator (MIG) - No!! We found critical issues - Case 1. Not monotonically increasing ID - Case 2. Duplicated message ID - Everything works fine with MIG?

Slide 39

Slide 39 text

3FDFJWFE*%GSPN.*( 370481812746797057 370481829322686465 370481821655498753 370481838399160321 370481839170912257 370483746157363201 370483746144780289 MIG Time Newer ID use older timestamp due to clock drift Case 1: Not monotonically increasing ID - Timestamp based ID is not resistant to clock drift

Slide 40

Slide 40 text

- Timestamp based ID is not resistant to clock drift - Solution: 1. Keep timestamp monotonic increasing 2. Correct timestamp gradually Timestamp Time Clock drift happens With Slew mode, timestamp keeps increasing monotonically, gradually Case 1: Not monotonically increasing ID

Slide 41

Slide 41 text

Case 2: Message lost/dup at Redis storage - Messaging platform uses Redis cluster as Storage - Heavy rely on Lua script for stored procedure Master Replica Master Replica Master Replica Redis Cluster Redis Cluster MIG HBase Cluster HBase Cluster … …

Slide 42

Slide 42 text

Case 2: Message lost/dup at Redis storage - Message lost/dup were found during Test > GET userId:123:last-msg-id 36646745286099999 > GET 36646745286099999 “message-object" > EVAL "local lastId = redis.call('GET', 'userId:123:last-msg-id'); return redis.call('GET', tonumber(lastId))" 0

Slide 43

Slide 43 text

Case 2: Message lost/dup at Redis storage - Message lost/dup were found during Test > GET userId:123:last-msg-id 36646745286099999 > GET 36646745286099999 “message-object" > EVAL "local lastId = redis.call('GET', 'userId:123:last-msg-id'); return redis.call('GET', tonumber(lastId))" 0 (nil) ??? Expected to return message-object but returned nil

Slide 44

Slide 44 text

Case 2: Message lost/dup at Redis storage - Message lost/dup were found during Test > EVAL "return 36646745286099999" 0 (integer) 36646745286100000 > EVAL "return 36646745286100000" 0 (integer) 36646745286100000 > EVAL "return 36646745286100001" 0 (integer) 36646745286100000 > EVAL "return 36646745286100002" 0 (integer) 36646745286100000 > EVAL "return 36646745286100003" 0 (integer) 36646745286100000

Slide 45

Slide 45 text

Case 2: Message lost/dup at Redis storage - Message lost/dup were found during Test > EVAL "return 36646745286099999" 0 (integer) 36646745286100000 > EVAL "return 36646745286100000" 0 (integer) 36646745286100000 > EVAL "return 36646745286100001" 0 (integer) 36646745286100000 > EVAL "return 36646745286100002" 0 (integer) 36646745286100000 > EVAL "return 36646745286100003" 0 (integer) 36646745286100000 Lua 5.1.5 Copyright (C) 1994-2012 Lua.org, PUC-Rio > print(36646745286099999) 3.66467452861e+16 At Lua, all IDs are represented as Number which can represent precisely integer if -253 < n < 253

Slide 46

Slide 46 text

Wrap up of this talk - Migrate without breaking changes - Test and evaluate client & server very carefully - Understand technology internals to notice hidden bug - Made MIG - Timestamp-based ID generator - Provides a fast scalable ID generation