> Ensure event delivery with robustness and resiliency More flexibility A B block UserB message from UserB Blocked UserB though? 1 No unread message though?
issues to prevent the future growth > Perform systematically toward common goal A TaskForce across multi-teams LINT TF was born against big technical debts Technical PM iOS Android Desktop Chrome front- end server back- end server auth server storage server QA
OP for LINE Group > request user > members of the group OP for Messaging > sender > receivers LINE core process is Operation delivery SEND_MSG OP RECV_MSG OP SEND_CHAT_ CHECKED OP NOTIFIED_READ_MSG OP sendMessage mark-as-read
cache Own optimization for better messaging UX In-house reverse proxy since 2012 > SPDY based protocol for multiplexing > Request routing and Connection/Session management > Protocol conversion between SPDY and HTTP LINE LEGY
• with OS native features like MPTCP, TLS1.3, metrics > Enable to replace own optimization with standard • LEGY header cache => Header compression in HTTP/2 • LEGY encryption => 0-RTT by TLS 1.3 Outdated SPDY > No standard client library > Complex own optimization without full documentation Shift to HTTP/2 standard LEGY In-house maintenance per kinds of devices
> No way to see the inconsistency load > Must fetch Operations sequentially > Too low cost-effective storage management 226TB +3TB/month 0.005% usage talk-server JOIN_GROUP OP RECV_MSG OP Inefficient way for inactive Apps Not robust on partial data lost Needs complex workaround
latest snapshot per each Categories App sendMessage, addFriend, createGroup, … Operation Storage MessageBox Storage Store mutation in 2 kind of ways fetchOperations API Snapshot APIs UserSettings Storage SocialGraph Storage Group Storage …. Utilize more
IO/network costs FullSync (batch) > Efficient sync for inactive clients > On-demand partial sync sync() API Cover fetchOperations and FullSync talk-server client API call with local revision conditions
to given server revision=100,100 Client talk-server revision client revision bump up rev = 100 => 100,100 rev = 100,200 100 100,100 server revision 100,200
Put Availability before Consistency > Always possible to occur unexpected mistake and code bugs anywhere Hard to maintain 100% consistency at LINE scale Resiliency enhancement > ASIS: Adhoc recovery after CS income > TOBE: Repair mechanism to satisfy eventual consistency talk-server
between client and server Dynamic period control based on data granularity & load data load Small Medium Large Tier1: O(1) data like Profile, Settings Tier2: Digest of O(N) data like friends/groups Tier3: Digest of O(NxM) data like num of members per groups Per 1day Per 1 Weeks Per 2 Weeks cycle
Call getRepairElements API weekly - numFriends = 2 numBlocked = 0 numRecommendation = 1 with local state 2. Compare with server state - numFriends = 2 numBlocked = 1 numRecommendation = 0
against technical debts to be resolved for platform under approx. 200 million users > LINT is an organization/project of multiple-teams to empower the future messaging platform
> Support multi-accounts features > Support multi-devices features > Bi-directional social-graph model > Social-graph redesign to support more features > More flexible Message metadata > Idempotent event delivery > Migration to async/non-blocking processing > Release various system limitations > Flexible fan-out/fan-in async mechanism > Make monolithic talk-server MSA > Multi-iDC aware 0-downtime reliable data store > Multi-iDC aware messaging protocol renewal > Bot broadcast/multi-cast architecture renewal > and more technical challenges…
to manage unused Auth Token > No way to manage multiple accounts / devices efficiently Objective > Enable to invalidate inactive/abnormal accounts’ Auth token > Enable to renew Auth token for inactive accounts securely Zzz… wake up Auth server Can renew token don’t know token usage..
of client local data that are required for multiple devices/accounts feature > Server settings on legacy in-memory store on Redis cluster (space bounded) > No proper storage to maintain such data flexibly Objective > Flexible setting storage & server to storage local/server data per accounts/devices as an isolated Microservice > Enable to utilize client/server integrated data via pipelining > Enable to analyze data across client/server on Data platform local theme per-chats pin options for A/B test and etc. talk-server Redis Redis Redis