Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HBase and Kafka data pipeline and applications for LINE Messaging Platform

HBase and Kafka data pipeline and applications for LINE Messaging Platform

Shinya Yoshida
LINE / LINE Platform Development Center1 Messaging Platform Development Z Part team HBase Unit / Serverside Engineer

https://linedevday.linecorp.com/2021/ja/sessions/166
https://linedevday.linecorp.com/2021/en/sessions/166
https://linedevday.linecorp.com/2021/ko/sessions/166

LINE DEVDAY 2021

November 10, 2021
Tweet

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Transcript

  1. Agenda - HBase at LINE Messaging Platform - HBase and

    Kafka data pipeline - HBase and Kafka data pipeline applications
  2. About me and HBase Unit - I joined in 2018

    as new grad - Member of HBase Unit for LINE Messaging Platform Server app 1 Server app 2  Server
  3. Which data we store in our HBase Friend message Chat

    meta RECEIVED_MESSAGE SEND_MESSAGE
  4. HBase Architecture HMaster NameNode JournalNode ZKQuorum Client Controller nodes 3,5,7,...

    Worker nodes 4~ HRegionServer HRegionServer HRegionServer DataNode DataNode DataNode
  5. HBase Architecture ※3 replica HMaster NameNode JournalNode ZKQuorum Client Controller

    nodes 3,5,7,... Worker nodes 4~ HRegionServer HRegionServer HRegionServer DataNode DataNode DataNode Block c Block a Block b Block b Block c Block a Block a Block c Block b
  6. HBase Architecture ※3 replica HMaster NameNode JournalNode ZKQuorum Client Controller

    nodes 3,5,7,... Worker nodes 4~ DataNode DataNode DataNode Block c Block a Block b Block b Block c Block a Block a Block c Block b HRegionServer HRegionServer HRegionServer Region 1 Region 2 Region 3 Region 4 Region 3 Region 4
  7. HBase Architecture ※3 replica HMaster NameNode JournalNode ZKQuorum Client Controller

    nodes 3,5,7,... Worker nodes 4~ DataNode DataNode DataNode Block c Block a Block b Block b Block c Block a Block a Block c Block b HRegionServer HRegionServer HRegionServer Region 1 Region 2 Region 3 Region 4 Region 3 Region 4 Region 1 Region 2
  8. HBase internal write flow RegionServer A Region 1 Client HDFS

    WALs memstore HFiles Client send mutation RegionServer B
  9. HBase internal write flow RegionServer A Region 1 Client HDFS

    WALs memstore HFiles Client send mutation Append to Write Ahead Log Update memstore RegionServer B
  10. HBase internal write flow RegionServer A Region 1 Client HDFS

    WALs memstore HFiles Client send mutation Append to Write Ahead Log Update memstore Flush memstore to HFile RegionServer B
  11. Restore memstore from WAL on regionserver failure RegionServer A Region

    1 HDFS WALs memstore HFiles RegionServer B Region 1 memstore
  12. Restore memstore from WAL on regionserver failure RegionServer A Region

    1 HDFS WALs memstore HFiles RegionServer B Region 1 memstore Restore memstore
  13. HBase replication and reliability RegionServer A Source cluster HDFS ZooKeeper

    Destination cluster Replication Source WALEntry Replication Endpoint RegionServers WALs of A
  14. HBase replication and reliability RegionServer A Source cluster HDFS ZooKeeper

    Destination cluster Replication Source WALEntry Replication Endpoint RegionServers retries retries WALs of A
  15. HBase replication and reliability RegionServer A Source cluster HDFS ZooKeeper

    Destination cluster Replication Source WALEntry Replication Endpoint Replication offset of A RegionServers retries retries WALs of A
  16. HBase replication and reliability RegionServer A Source cluster HDFS ZooKeeper

    Destination cluster Replication Source WALEntry Replication Endpoint Replication offset of A RegionServers retries retries WALs of A
  17. HBase replication and reliability RegionServer B Source cluster HDFS ZooKeeper

    Destination cluster Replication Source WALs of A WALEntry Replication Endpoint Replication offset of A RegionServers retries retries
  18. Setup replication and usecase $ hbase shell > add_peer ‘1’,

    CLUSTER_KEY => “backup001.linecorp.com,...:2181:/hbase” User cluster Backup cluster DR cluster Tokyo region Osaka region
  19. Pluggable ReplicationEndpoint* Example: Logging WALs $ hbase shell > add_peer

    ‘1’, ENDPOINT_CLASSNAME => “com.linecorp.hbase.LoggingReplicationEndpoint” * https://issues.apache.org/jira/browse/HBASE-11367
  20. In 2017 - We were using HBase 0.90.6-cdh3u5 released in

    2012, no longer supported by community - Replicated to HBase 0.94 cluster for statistical analysis Replication Server 0.90.6-cdh3u5 stats 0.94
  21. In 2017 - We were migrating from HBase 0.90.6-cdh3u5 to

    HBase 1.2.5 Replication Dual write Copy data Server stats 0.94 1.2.5 0.90.6-cdh3u5
  22. In 2017 - Needed to replicate to Stats cluster so

    that we keep the statistical analysis Replication Replication Dual write Copy data Server stats 0.94 1.2.5 0.90.6-cdh3u5
  23. In 2017 - HBase 1.2.5 official replication doesn’t support replication

    to HBase 0.94 Replication Replication Dual write Copy data Incompatible Server stats 0.94 1.2.5 0.90.6-cdh3u5
  24. Why cannot replicate to 0.94 from 1.2.5 From “HBASE AT

    LINE 2017” by Tomu Tsuruhara at LINE DEVELOPER DAY 2017 Release Date Version 2011 2012 2013 2014 2015 2016 2017 ★0.90 ★0.92 ★0.94 ★0.90.6-cdh3u5 ★0.96 ★0.98 ★1.0 ★1.1 ★1.2 ★1.3 Wire Protocol Change API Clean Up Singularity
  25. 1.2.5 The pipeline and the first application - It was

    difficult to migrate stats cluster side for various reason - Replicate from HBase 1.2.5 to HBase 0.94 through Kafka changing the protocol Replication Custom Replication Endpoint Dual write Replayer Custom Protocol Use HBase 0.94 client and protocol Copy data Server 0.90.6-cdh3u5 stats 0.94
  26. Kafka Kafka brokers Topic Partition 1 Partition 2 Partition 3

    Producer Producer Producer Consumer Consumer key:value key:value key:partition a:3 b:1 c:2 d:3 ...
  27. Protocol for the pipeline - To avoid contamination by HBase

    1.2.5 client at replayer for HBase 0.94 - Defined by Protocol Buffers contains - WAL meta data - Cell - Almost the same with HBase 1.2.5’s protocol
  28. ReplicationEndpoint producing to Kafka - Use Pluggable ReplicationEndpoint - Topic

    per table - <topic-prefix>-<table-name>-<topic-suffix> - Kafka key - Encoded region name (Region identifier) - Rowkey Replication Source Kafka Replication Endpoint
  29. Setup KafkaReplicationEndpoint $ hbase shell > add_peer '1’, ENDPOINT_CLASSNAME =>’com.linecorp.hbase.KafkaReplicationEndpoint’,

    CONFIG => { ”kafka.config.bootstrap.servers" => ”kafka001.linecorp.com,...", ”kafka.config.linger.ms" => "1000", ”kafka.config.acks" => "all", ”kafka.config.retries" => "100" , ”kafka.config.client.id" => "linehbase-wal-replicator", "topic.name.prefix" => "linehbase-wal", "topic.name.suffix" => "v1” }
  30. The replayer for HBase 0.94 - Consume WAL compatible protobuf

    data - Convert it to HBase 0.94‘s mutations (Put, Delete and so on) - Write them using HBase 0.94’s library
  31. HBase and Kafka data pipeline - Such kind of pipeline

    is called as “Change data capture” - Strength! - Easy to interact the database mutations - High reliability thanks to HBase Replication implementation and Kafka - Weakness☹ - Asynchronous, so there might be delay - 100ms~ - Cannot get other rowkeys or columns at the time on the mutation - Need aggregation or interact with database at consumer side
  32. Without HBase and Kafka data pipeline Server Tables • Added

    Kafka path for every HBase write path? • Retry for Kafka failure? • Won’t it affect to service? • Durability when server failure while sending to Kafka?
  33. HBase and Kafka data pipeline: Reliability Server Tables • Added

    Kafka path for every HBase write path? →Yes, adding peer • Retry for Kafka failure? →Yes, Kafka client retry + retry in replication source • Won’t it affect to service? →No issue for short failure • Durability when server failure while sending to Kafka? →No issue thanks to replication failover RegionServer ZooKeeper Replication Source WALEntry Replication Endpoint Replication offset retries retries
  34. Applications - We use this pipeline for several years and

    develop applications - 20+ target tables - 1.2M WAL messages / sec at peak - Introduce 4 kinds of our usecase and applications so far - Replication or data migration that the built-in HBase replication cannot handle - Applications running business logic considering WAL as an event stream - Near-realtime statistics analysis - Abuser detection at storage side
  35. Replication or data migration Replayer HBase 0.94 client Kerberos authenticated

    Other middleware 1.2.5 non-secure Kerberos-secured 0.94
  36. Applications with WALs UserSettings - User settings service manages settings

    for each user as key-value format - Use it not only in Messaging Platform, but also in other services - Other service want to know settings changes Family app service user-settings service user-settings Get latest settings
  37. Applications with WALs UserSettings user-settings service WAL Consumer Event Producer

    Service A consumer Service B consumer WAL WAL settings event settings event user-settings
  38. Near-realtime statistic analysis - Traffic bursts 3x~4x of daily peak

    at 00:00 New Year - For New Year Greeting: Akeome LINE - Monitoring various metrics on new year bursting - Message count - Important metrics because the load is proportional to message count (and it’s fun) - High resolution: every 1 sec, 100 ms - Near-realtime: <= 10 seconds delay
  39. Abuser detection - Various Abusers in LINE Messaging Platform -

    Detecting them by various aspect - For persistent storage, HBase - Long term and massive data storing abusing pattern is critical - Not only disk usage, but also HBase performance - Might affect to many other users
  40. Abuser detection WAL Consumer Count aggregation PenaltyGateway WAL WAL 1m

    count 1d count 2w count Count changelog Ban abuser Store penalty Read penalty Block request Penalty rules Server Tables user penalties
  41. Future works - Expand usage of HBase and Kafka data

    pipeline - Secondary index (Materialized view) - Incremental backup
  42. Secondary index - HBase only support index by row, column:

    Key → Value - For example, Alice become a friend of Bob - Store Alice → Bob in HBase - Lookup from Bob is not supported - Need secondary index for reverse lookup: Value → Keys - Apache Phoenix provides an option for HBase with SQL - Overhead - Overkill for just secondary index - Using Redis, Cassandra for such purpose - Want secondary index in HBase for some reasons - Reliability - Performance - Consistency model - ...
  43. Secondary index server WAL Consumer Build secondary index Value →

    Keys Key → Value Value → Keys Tables Key → Value
  44. HBase’s incremental backup HBase F Take a full backup Time

    WALs HFiles Cron job t2 t1 t3 MR Job Storage (HDFS, Amazon S3, ...) Take incremental backups
  45. HBase’s incremental backup: pain point HBase Time WALs HFiles Cron

    job t3 MR Job Bug released Remains all WALs until cron job runs Extra load on the cluster Restore from backup Lost sound data F t1 t2
  46. Incremental backup using pipeline )#BTF 8"- $POTVNFS F Take a

    snapshot Time 8"-T No impact to HBase Storage (HDFS, Amazon S3, ...)
  47. Incremental backup using pipeline )#BTF 8"- $POTVNFS F Time 8"-T

    t1 t2 t3 Make HFiles for fast restore Storage (HDFS, Amazon S3, ...)
  48. Conclusion - HBase and Kafka data pipeline for LINE Messaging

    Platform - Using HBase WAL and replication - Powerful and reliable way to interact with DB mutation - Our actual use case of the pipeline - Replication or data migration that the built-in HBase replication cannot handle - Applications running business logic considering WAL as an event stream - Near-realtime statistics analysis - Abuser detection at storage side - Possible use cases - Secondary index - Incremental backup - What’s your idea?