A new scalable social relation for LINE users : `Follow`

Agenda › Background › Design for scalability › Maintaining data
consistency › Real time count › Event delivery to various components › Summary › Future improvements

Why did we need a new relationship? Relationship Distance ß
close Size small ↓ Friend distant à ↑ large

Why did we need a new relationship? Relationship Distance ß
close Size small ↓ Friend distant à ↑ large Follow

`Follow` relationship

`Follow` relationship 100M+ followers / user

`Follow` relationship 100M+ followers / user (regardless of the user
type)

Various considerations 100M+ followers

Various considerations 100M+ followers Scalability

Various considerations 100M+ followers Scalability Consistency

Various considerations 100M+ followers Scalability Consistency …

Various considerations 100M+ followers Scalability Consistency Reliability …

Design for scalability Optimized Database Structure Eventual Consistency Reliable Event
Delivery

Follow Storage overview Redis HBase Kafka with Decaton

Current Follow Relationships Statistics 6.5B 8ms Follow relationships Avg API
response time Follow events / min 200,000 Based on the design concepts, we are currently stably operating the service.

Optimized structure for high read performance HBase Data Table HBase
Index Table Redis Cache HBase Count Table edge data optimized indexes real time count cache

Optimized structure for read performance HBase Data Table HBase Index
Table Redis Cache HBase Count Table edge data optimized indexes real time count cache Q. How about consistency between the tables / cache…?

Table Redis Cache HBase Count Table edge data optimized indexes real time count cache Q. How about concurrency issue…?

Kafka with Decaton › Responsible for event delivery & maintaining
data consistency between HBase tables & Redis cache › Powered by opensource framework Decaton maintained by LINE Kafka with Decaton

What is Decaton? › Opensource maintained by LINE ( https://github.com/line/decaton
) › Streaming task processing framework built on top of Apache Kafka › Various useful features (retry queueing, concurrent processing, task compaction, …)

Decaton features powering follow storage Retrying failed tasks with back-off
without blocking other tasks flow Concurrent processing of records consumed from one partition Dynamic property configuration in runtime using Central Dogma

Failure logging producer › Sometimes there could be various failures
while producing tasks (network issue, temporary increased latency, …) › Even when exhausting Kafka’s default retry attempts, the producer serializes the task to the local file system, and retries with a daemon automatically › Guarantee producing the events eventually

LINE IMF Kafka Cluster Reliable in-house Kafka cluster (LINE Devday
2019 - Reliability Engineering Behind The Most Trusted Kafka Platform)

Reliable Event Delivery Decaton Failure logging producer IMF Kafka Reliable
Event Delivery + + =

With Reliable Event Delivery Can maintain data consistency between HBase
tables & Redis caches Other components can build own data by consuming events Real time following / follower count

Maintaining data consistency HBase Data Table HBase Index Table Redis
Cache Follow

What if it fails in the middle? HBase Data Table
HBase Index Table Redis Cache Follow

HBase Index Table Redis Cache We can guarantee eventual consistency through reliable event delivery Follow

HBase Index Table Redis Cache Follow

HBase Index Table Redis Cache Retry Event Follow

HBase Index Table Redis Cache Retry Event Decaton Processor Follow

HBase Index Table Redis Cache Retry Event Decaton Processor retry the remaining flow Follow

What if the user attempts a new action during the
retry? HBase Data Table HBase Index Table Redis Cache Previous Retry Event Decaton Processor Unfollow again The next request may be processed faster than the previous retry event

What if the user attempts a new action during the
retry? HBase Data Table HBase Index Table Redis Cache Previous Retry Event Decaton Processor Unfollow again The next request may be processed faster than the previous retry event overwriting stale data?!

What if the user makes a new action during the
retry? action timestamp unfollow 2 Next request has been processed faster HBase table

What if the user makes a new action during the
retry? action timestamp unfollow 2 HBase sorts the versions of a cell from newest to oldest by sorting the timestamp regardless of the put order. action timestamp unfollow 2 follow 1 Previous retry event has been processed

How about redis cache? HBase Data Table HBase Index Table
Redis Cache Set the cache only when processed without retry

Redis Cache Decaton Processor Retry Event

Redis Cache Decaton Processor Invalidate cache only (cache will be set by next read) Retry Event

Table Redis Cache HBase Count Table edge data optimized indexes real time count cache Basic strategy: increment / decrement count when follow actions occur

Concurrency problem What if one influencer get so many followers
at once user follower count influencer Influencer’s count can be a hotspot

Simple approach? user follower count influencer If each request tries
to lock and increments the count sequentially, the performance would not be scalable. Lock & +1 Wait

How about also using kafka & decaton here? +1 -1
+1 +1 -1 +1 Decaton Processor A Kafka partition

How about also using kafka & decaton here? (Kafka’s characteristic)
+1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition

1. All messages with the same key will go to the same partition. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition

1. All messages with the same key will go to the same partition. 2. Each Kafka partition will be assigned to exactly one consumer. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition

How about using kafka & decaton also here? = The
Influencer’s follower count events can be delivered to a dedicated consumer one by one in the order that they are enqueued. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition

How about using kafka & decaton also here? When events
are delivered to a dedicated consumer one by one in the order that they are enqueued, we can avoid the lock contention issue. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition

But is it scalable? What would happen if an influencer’s
follower count events were flooding in a partition? Other users’ events produced to the partition can be affected also right…? +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition

But is it scalable? Decaton supports concurrent processing of records
consumed from one partition. We can change the thread count dynamically during runtime according to the traffic. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition Multithreading

But is it scalable? Furthermore, Decaton also supports Task Compaction.
If tasks are flooding, we can use the task compaction feature to combine the same user’s tasks within a specific time window. +1 -1 +1 +1 -1 +1 Decaton Processor Influencer’s tasks

But is it scalable? Furthermore, Decaton also supports Task Compaction.
If tasks are flooding, we can use the task compaction feature to combine the same user’s tasks within a specific time window. +1 -1 +1 +1 -1 +1 Decaton Processor +2 Compact the tasks within specific time window Influencer’s tasks

How about reliability? What if server fails to produce the
event? Kafka producer will try to produce the event again using kafka’s built-in retry logic. However, the retry count could be exhausted due to various problems such as network issues. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server

How about reliability? What if server fails to produce the
event? Even when exhausting kafka’s default retry attempts, Failure Logging Producer serialize the task to the local file system, and automatically retries with a daemon +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server Failure Logging Producer

How about reliability? What if Processor fails to process the
event? Sometimes, Decaton Processor may fail to process the events. Thanks to the Decaton’s `Retry Queuing`, the event can be eventually processed successfully. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server HBase Redis

Checking inconsistency However, it is very difficult to make sure
that the retry logic only inputs the data once. For example, if there is a network issue, Decaton Processor may not be able to receive a response from the DB, but the DB might process the request successfully. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server HBase Redis

Checking inconsistency Thanks to the HBase’s timestamp behavior, putting the
record with the same timestamp doesn’t matter. However, incrementing count twice can cause an unexpected inconsistency. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server HBase Redis

Checking inconsistency We check count inconsistency when the follow list
api is getting the list data. When the follow list is fetched from HBase, inconsistencies are checked while minimizing additional storage I/O for the sampled targets. Another component Platform Server Query follow list by api HBase Redis follow list Checking inconsistency

Producing repair event We produce repair events for each inconsistent
count data periodically. When Decaton processor consumes the repair event, the processor fetches & counts the user’s whole follow list and stores the right count again asynchronously. Platform Server HBase Index Table fetch & count the user’s whole follow list Found inconsistency Repair Event Decaton Processor HBase Count Table repair follow count

Generalized follow event for multiple purposes We can generalize the
event’s format for multiple purposes. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition

Generalized follow event for multiple purposes Not just for real
time count, but also for other components. follow unfollow follow follow unfollow follow IMF Kakfa Cluster Decaton Processor … Timeline Consumer

Platform produces generalized events When user A follows user B
Follows (Kafka key : from user’s id) A B

Platform Server Follows (Kafka key : from user’s id) A B

Platform Server Follows (Kafka key : from user’s id) A B Follow Event (forward) Action : follow Direction : forward From : user A To : user B Kafka partition assigned to A’s userId A B

Platform Server Follows (Kafka key : from user’s id) A B Follow Event (forward) Action : follow Direction : forward From : user A To : user B Kafka partition assigned to A’s userId A B Follow Event (reverse) Action : follow Direction : reverse From : user B To : user A Kafka partition assigned to B’s userId A B

Consumer chooses which events to consume Follow count processor consumes
both events Follow Event (forward) Action : follow Direction : forward From : user A To : user B Follow Event (reverse) Action : follow Direction : reverse From : user B To : user A Decaton Processor Decaton Processor Direction : forward => Increment following count of from user Direction : reverse => Increment follower count of from user

Consumer chooses which events to consume Follow Event (forward) Action
: follow Direction : forward From : A user To : B user Follow Event (reverse) Action : follow Direction : reverse From : B user To : A user Timeline Consumer Build data for timeline feeds and so on Ignore reverse event depending on the service characteristics

Building own data for each component's purpose Another component follow
unfollow follow follow unfollow follow IMF Kakfa Cluster Platform components LINE Channel

unfollow follow follow unfollow follow IMF Kakfa Cluster Platform components The api can be used to get specific user’s follow list directly with the LINE Channel API (with authentication) + Follow list api LINE Channel

unfollow follow follow unfollow follow IMF Kakfa Cluster By combining Kafka event and on-demand api call, components can build data for their own purposes. + Follow list api Platform components LINE Channel

Summary 100M+ followers

Summary 100M+ followers Decaton + Failure logging producer + IMF
Kafka => Reliable event delivery

Kafka => Reliable event delivery Scalability Eventual Consistency

Kafka => Reliable event delivery Scalability Eventual Consistency … Reliability

Current Follow Relationships Statistics 6.5B 8ms Follow relationships Avg API
response time Follow events / min 200,000 Based on the design concepts, we are currently operating the service stably.

Current design’s limitations Another component follow unfollow follow follow unfollow
follow IMF Kakfa Cluster Platform components + Follow list api

Future improvement: rich api Another component follow unfollow follow follow
unfollow follow IMF Kakfa Cluster Platform components + rich apis with various parameters

Other future improvements › More flexible secondary indexes for the
various requirements › Enhanced inconsistency checker for the whole data (Currently, we are checking inconsistencies only for the sampled targets during runtime) › Improvements for the future hotspots related to the influencers › More features for the users (block unwanted follow requests, search, …) › Recommending accounts to follow based on user interests › Anti-abuse › and any improvements or challenges that arise in the future…

Thank you Thank you

A new scalable social relation for LINE users :...

A new scalable social relation for LINE users : `Follow`

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript