Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A new scalable social relation for LINE users :...

A new scalable social relation for LINE users : `Follow`

LINE DevDay 2020

November 19, 2020
Tweet

More Decks by LINE DevDay 2020

Other Decks in Technology

Transcript

  1. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  2. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  3. Why did we need a new relationship? Relationship Distance ß

    close Size small ↓ Friend distant à ↑ large
  4. Why did we need a new relationship? Relationship Distance ß

    close Size small ↓ Friend distant à ↑ large Follow
  5. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  6. Current Follow Relationships Statistics 6.5B 8ms Follow relationships Avg API

    response time Follow events / min 200,000 Based on the design concepts, we are currently stably operating the service.
  7. Optimized structure for high read performance HBase Data Table HBase

    Index Table Redis Cache HBase Count Table edge data optimized indexes real time count cache
  8. Optimized structure for read performance HBase Data Table HBase Index

    Table Redis Cache HBase Count Table edge data optimized indexes real time count cache Q. How about consistency between the tables / cache…?
  9. Optimized structure for read performance HBase Data Table HBase Index

    Table Redis Cache HBase Count Table edge data optimized indexes real time count cache Q. How about concurrency issue…?
  10. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  11. Kafka with Decaton › Responsible for event delivery & maintaining

    data consistency between HBase tables & Redis cache › Powered by opensource framework Decaton maintained by LINE Kafka with Decaton
  12. What is Decaton? › Opensource maintained by LINE ( https://github.com/line/decaton

    ) › Streaming task processing framework built on top of Apache Kafka › Various useful features (retry queueing, concurrent processing, task compaction, …)
  13. Decaton features powering follow storage Retrying failed tasks with back-off

    without blocking other tasks flow Concurrent processing of records consumed from one partition Dynamic property configuration in runtime using Central Dogma
  14. Decaton features powering follow storage Retrying failed tasks with back-off

    without blocking other tasks flow Concurrent processing of records consumed from one partition Dynamic property configuration in runtime using Central Dogma
  15. Decaton features powering follow storage Retrying failed tasks with back-off

    without blocking other tasks flow Concurrent processing of records consumed from one partition Dynamic property configuration in runtime using Central Dogma
  16. Decaton features powering follow storage Retrying failed tasks with back-off

    without blocking other tasks flow Concurrent processing of records consumed from one partition Dynamic property configuration in runtime using Central Dogma
  17. Failure logging producer › Sometimes there could be various failures

    while producing tasks (network issue, temporary increased latency, …) › Even when exhausting Kafka’s default retry attempts, the producer serializes the task to the local file system, and retries with a daemon automatically › Guarantee producing the events eventually
  18. LINE IMF Kafka Cluster Reliable in-house Kafka cluster (LINE Devday

    2019 - Reliability Engineering Behind The Most Trusted Kafka Platform)
  19. With Reliable Event Delivery Can maintain data consistency between HBase

    tables & Redis caches Other components can build own data by consuming events Real time following / follower count
  20. What if it fails in the middle? HBase Data Table

    HBase Index Table Redis Cache Follow
  21. What if it fails in the middle? HBase Data Table

    HBase Index Table Redis Cache Follow
  22. What if it fails in the middle? HBase Data Table

    HBase Index Table Redis Cache We can guarantee eventual consistency through reliable event delivery Follow
  23. What if it fails in the middle? HBase Data Table

    HBase Index Table Redis Cache Follow
  24. What if it fails in the middle? HBase Data Table

    HBase Index Table Redis Cache Retry Event Follow
  25. What if it fails in the middle? HBase Data Table

    HBase Index Table Redis Cache Retry Event Decaton Processor Follow
  26. What if it fails in the middle? HBase Data Table

    HBase Index Table Redis Cache Retry Event Decaton Processor retry the remaining flow Follow
  27. What if the user attempts a new action during the

    retry? HBase Data Table HBase Index Table Redis Cache Previous Retry Event Decaton Processor Unfollow again The next request may be processed faster than the previous retry event
  28. What if the user attempts a new action during the

    retry? HBase Data Table HBase Index Table Redis Cache Previous Retry Event Decaton Processor Unfollow again The next request may be processed faster than the previous retry event overwriting stale data?!
  29. What if the user makes a new action during the

    retry? action timestamp unfollow 2 Next request has been processed faster HBase table
  30. What if the user makes a new action during the

    retry? action timestamp unfollow 2 HBase sorts the versions of a cell from newest to oldest by sorting the timestamp regardless of the put order. action timestamp unfollow 2 follow 1 Previous retry event has been processed
  31. How about redis cache? HBase Data Table HBase Index Table

    Redis Cache Set the cache only when processed without retry
  32. How about redis cache? HBase Data Table HBase Index Table

    Redis Cache Decaton Processor Retry Event
  33. How about redis cache? HBase Data Table HBase Index Table

    Redis Cache Decaton Processor Invalidate cache only (cache will be set by next read) Retry Event
  34. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  35. Optimized structure for read performance HBase Data Table HBase Index

    Table Redis Cache HBase Count Table edge data optimized indexes real time count cache Basic strategy: increment / decrement count when follow actions occur
  36. Concurrency problem What if one influencer get so many followers

    at once user follower count influencer Influencer’s count can be a hotspot
  37. Simple approach? user follower count influencer If each request tries

    to lock and increments the count sequentially, the performance would not be scalable. Lock & +1 Wait
  38. How about also using kafka & decaton here? +1 -1

    +1 +1 -1 +1 Decaton Processor A Kafka partition
  39. How about also using kafka & decaton here? (Kafka’s characteristic)

    +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition
  40. How about also using kafka & decaton here? (Kafka’s characteristic)

    1. All messages with the same key will go to the same partition. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition
  41. How about also using kafka & decaton here? (Kafka’s characteristic)

    1. All messages with the same key will go to the same partition. 2. Each Kafka partition will be assigned to exactly one consumer. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition
  42. How about using kafka & decaton also here? = The

    Influencer’s follower count events can be delivered to a dedicated consumer one by one in the order that they are enqueued. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition
  43. How about using kafka & decaton also here? When events

    are delivered to a dedicated consumer one by one in the order that they are enqueued, we can avoid the lock contention issue. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition
  44. But is it scalable? What would happen if an influencer’s

    follower count events were flooding in a partition? Other users’ events produced to the partition can be affected also right…? +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition
  45. But is it scalable? Decaton supports concurrent processing of records

    consumed from one partition. We can change the thread count dynamically during runtime according to the traffic. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition Multithreading
  46. But is it scalable? Furthermore, Decaton also supports Task Compaction.

    If tasks are flooding, we can use the task compaction feature to combine the same user’s tasks within a specific time window. +1 -1 +1 +1 -1 +1 Decaton Processor Influencer’s tasks
  47. But is it scalable? Furthermore, Decaton also supports Task Compaction.

    If tasks are flooding, we can use the task compaction feature to combine the same user’s tasks within a specific time window. +1 -1 +1 +1 -1 +1 Decaton Processor +2 Compact the tasks within specific time window Influencer’s tasks
  48. How about reliability? What if server fails to produce the

    event? Kafka producer will try to produce the event again using kafka’s built-in retry logic. However, the retry count could be exhausted due to various problems such as network issues. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server
  49. How about reliability? What if server fails to produce the

    event? Even when exhausting kafka’s default retry attempts, Failure Logging Producer serialize the task to the local file system, and automatically retries with a daemon +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server Failure Logging Producer
  50. How about reliability? What if Processor fails to process the

    event? Sometimes, Decaton Processor may fail to process the events. Thanks to the Decaton’s `Retry Queuing`, the event can be eventually processed successfully. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server HBase Redis
  51. Checking inconsistency However, it is very difficult to make sure

    that the retry logic only inputs the data once. For example, if there is a network issue, Decaton Processor may not be able to receive a response from the DB, but the DB might process the request successfully. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server HBase Redis
  52. Checking inconsistency Thanks to the HBase’s timestamp behavior, putting the

    record with the same timestamp doesn’t matter. However, incrementing count twice can cause an unexpected inconsistency. +1 -1 +1 +1 -1 +1 Decaton Processor IMF Kafka Cluster Platform Server HBase Redis
  53. Checking inconsistency We check count inconsistency when the follow list

    api is getting the list data. When the follow list is fetched from HBase, inconsistencies are checked while minimizing additional storage I/O for the sampled targets. Another component Platform Server Query follow list by api HBase Redis follow list Checking inconsistency
  54. Producing repair event We produce repair events for each inconsistent

    count data periodically. When Decaton processor consumes the repair event, the processor fetches & counts the user’s whole follow list and stores the right count again asynchronously. Platform Server HBase Index Table fetch & count the user’s whole follow list Found inconsistency Repair Event Decaton Processor HBase Count Table repair follow count
  55. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  56. Generalized follow event for multiple purposes We can generalize the

    event’s format for multiple purposes. +1 -1 +1 +1 -1 +1 Decaton Processor A Kafka partition
  57. Generalized follow event for multiple purposes Not just for real

    time count, but also for other components. follow unfollow follow follow unfollow follow IMF Kakfa Cluster Decaton Processor … Timeline Consumer
  58. Platform produces generalized events When user A follows user B

    Follows (Kafka key : from user’s id) A B
  59. Platform produces generalized events When user A follows user B

    Platform Server Follows (Kafka key : from user’s id) A B
  60. Platform produces generalized events When user A follows user B

    Platform Server Follows (Kafka key : from user’s id) A B Follow Event (forward) Action : follow Direction : forward From : user A To : user B Kafka partition assigned to A’s userId A B
  61. Platform produces generalized events When user A follows user B

    Platform Server Follows (Kafka key : from user’s id) A B Follow Event (forward) Action : follow Direction : forward From : user A To : user B Kafka partition assigned to A’s userId A B Follow Event (reverse) Action : follow Direction : reverse From : user B To : user A Kafka partition assigned to B’s userId A B
  62. Consumer chooses which events to consume Follow count processor consumes

    both events Follow Event (forward) Action : follow Direction : forward From : user A To : user B Follow Event (reverse) Action : follow Direction : reverse From : user B To : user A Decaton Processor Decaton Processor Direction : forward => Increment following count of from user Direction : reverse => Increment follower count of from user
  63. Consumer chooses which events to consume Follow Event (forward) Action

    : follow Direction : forward From : A user To : B user Follow Event (reverse) Action : follow Direction : reverse From : B user To : A user Timeline Consumer Build data for timeline feeds and so on Ignore reverse event depending on the service characteristics
  64. Building own data for each component's purpose Another component follow

    unfollow follow follow unfollow follow IMF Kakfa Cluster Platform components LINE Channel
  65. Building own data for each component's purpose Another component follow

    unfollow follow follow unfollow follow IMF Kakfa Cluster Platform components The api can be used to get specific user’s follow list directly with the LINE Channel API (with authentication) + Follow list api LINE Channel
  66. Building own data for each component's purpose Another component follow

    unfollow follow follow unfollow follow IMF Kakfa Cluster By combining Kafka event and on-demand api call, components can build data for their own purposes. + Follow list api Platform components LINE Channel
  67. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  68. Summary 100M+ followers Decaton + Failure logging producer + IMF

    Kafka => Reliable event delivery Scalability Eventual Consistency
  69. Summary 100M+ followers Decaton + Failure logging producer + IMF

    Kafka => Reliable event delivery Scalability Eventual Consistency … Reliability
  70. Current Follow Relationships Statistics 6.5B 8ms Follow relationships Avg API

    response time Follow events / min 200,000 Based on the design concepts, we are currently operating the service stably.
  71. Agenda › Background › Design for scalability › Maintaining data

    consistency › Real time count › Event delivery to various components › Summary › Future improvements
  72. Current design’s limitations Another component follow unfollow follow follow unfollow

    follow IMF Kakfa Cluster Platform components + Follow list api
  73. Future improvement: rich api Another component follow unfollow follow follow

    unfollow follow IMF Kakfa Cluster Platform components + rich apis with various parameters
  74. Other future improvements › More flexible secondary indexes for the

    various requirements › Enhanced inconsistency checker for the whole data (Currently, we are checking inconsistencies only for the sampled targets during runtime) › Improvements for the future hotspots related to the influencers › More features for the users (block unwanted follow requests, search, …) › Recommending accounts to follow based on user interests › Anti-abuse › and any improvements or challenges that arise in the future…