HBase and Kafka data pipeline and applications for LINE Messaging Platform

Agenda - HBase at LINE Messaging Platform - HBase and
Kafka data pipeline - HBase and Kafka data pipeline applications

HBase at LINE Messaging Platform

About me and HBase Unit - I joined in 2018
as new grad - Member of HBase Unit for LINE Messaging Platform Server app 1 Server app 2 Server

Which data we store in our HBase Friend message Chat
meta RECEIVED_MESSAGE SEND_MESSAGE

HBase Architecture HMaster NameNode JournalNode ZKQuorum Client Controller nodes 3,5,7,...
Worker nodes 4~ HRegionServer HRegionServer HRegionServer DataNode DataNode DataNode

HBase Architecture ※3 replica HMaster NameNode JournalNode ZKQuorum Client Controller
nodes 3,5,7,... Worker nodes 4~ HRegionServer HRegionServer HRegionServer DataNode DataNode DataNode Block c Block a Block b Block b Block c Block a Block a Block c Block b

nodes 3,5,7,... Worker nodes 4~ DataNode DataNode DataNode Block c Block a Block b Block b Block c Block a Block a Block c Block b HRegionServer HRegionServer HRegionServer Region 1 Region 2 Region 3 Region 4 Region 3 Region 4

nodes 3,5,7,... Worker nodes 4~ DataNode DataNode DataNode Block c Block a Block b Block b Block c Block a Block a Block c Block b HRegionServer HRegionServer HRegionServer Region 1 Region 2 Region 3 Region 4 Region 3 Region 4 Region 1 Region 2

HBase internal write flow RegionServer A Region 1 Client HDFS
WALs memstore HFiles Client send mutation RegionServer B

WALs memstore HFiles Client send mutation Append to Write Ahead Log Update memstore RegionServer B

WALs memstore HFiles Client send mutation Append to Write Ahead Log Update memstore Flush memstore to HFile RegionServer B

Restore memstore from WAL on regionserver failure RegionServer A Region
1 HDFS WALs memstore HFiles RegionServer B Region 1 memstore

Restore memstore from WAL on regionserver failure RegionServer A Region
1 HDFS WALs memstore HFiles RegionServer B Region 1 memstore Restore memstore

HBase replication and reliability RegionServer A Source cluster HDFS ZooKeeper
Destination cluster Replication Source WALEntry Replication Endpoint RegionServers WALs of A

Destination cluster Replication Source WALEntry Replication Endpoint RegionServers retries retries WALs of A

Destination cluster Replication Source WALEntry Replication Endpoint Replication offset of A RegionServers retries retries WALs of A

HBase replication and reliability RegionServer B Source cluster HDFS ZooKeeper
Destination cluster Replication Source WALs of A WALEntry Replication Endpoint Replication offset of A RegionServers retries retries

Setup replication and usecase $ hbase shell > add_peer ‘1’,
CLUSTER_KEY => “backup001.linecorp.com,...:2181:/hbase” User cluster Backup cluster DR cluster Tokyo region Osaka region

Pluggable ReplicationEndpoint* Example: Logging WALs $ hbase shell > add_peer
‘1’, ENDPOINT_CLASSNAME => “com.linecorp.hbase.LoggingReplicationEndpoint” * https://issues.apache.org/jira/browse/HBASE-11367

HBase and Kafka pipeline The first application

In 2017 - We were using HBase 0.90.6-cdh3u5 released in
2012, no longer supported by community - Replicated to HBase 0.94 cluster for statistical analysis Replication Server 0.90.6-cdh3u5 stats 0.94

In 2017 - We were migrating from HBase 0.90.6-cdh3u5 to
HBase 1.2.5 Replication Dual write Copy data Server stats 0.94 1.2.5 0.90.6-cdh3u5

In 2017 - Needed to replicate to Stats cluster so
that we keep the statistical analysis Replication Replication Dual write Copy data Server stats 0.94 1.2.5 0.90.6-cdh3u5

In 2017 - HBase 1.2.5 official replication doesn’t support replication
to HBase 0.94 Replication Replication Dual write Copy data Incompatible Server stats 0.94 1.2.5 0.90.6-cdh3u5

Why cannot replicate to 0.94 from 1.2.5 From “HBASE AT
LINE 2017” by Tomu Tsuruhara at LINE DEVELOPER DAY 2017 Release Date Version 2011 2012 2013 2014 2015 2016 2017 ★0.90 ★0.92 ★0.94 ★0.90.6-cdh3u5 ★0.96 ★0.98 ★1.0 ★1.1 ★1.2 ★1.3 Wire Protocol Change API Clean Up Singularity

1.2.5 The pipeline and the first application - It was
difficult to migrate stats cluster side for various reason - Replicate from HBase 1.2.5 to HBase 0.94 through Kafka changing the protocol Replication Custom Replication Endpoint Dual write Replayer Custom Protocol Use HBase 0.94 client and protocol Copy data Server 0.90.6-cdh3u5 stats 0.94

Kafka Kafka brokers Topic Partition 1 Partition 2 Partition 3
Producer Producer Producer Consumer Consumer key:value key:value key:partition a:3 b:1 c:2 d:3 ...

Protocol for the pipeline - To avoid contamination by HBase
1.2.5 client at replayer for HBase 0.94 - Defined by Protocol Buffers contains - WAL meta data - Cell - Almost the same with HBase 1.2.5’s protocol

ReplicationEndpoint producing to Kafka - Use Pluggable ReplicationEndpoint - Topic
per table - <topic-prefix>-<table-name>-<topic-suffix> - Kafka key - Encoded region name (Region identifier) - Rowkey Replication Source Kafka Replication Endpoint

Setup KafkaReplicationEndpoint $ hbase shell > add_peer '1’, ENDPOINT_CLASSNAME =>’com.linecorp.hbase.KafkaReplicationEndpoint’,
CONFIG => { ”kafka.config.bootstrap.servers" => ”kafka001.linecorp.com,...", ”kafka.config.linger.ms" => "1000", ”kafka.config.acks" => "all", ”kafka.config.retries" => "100" , ”kafka.config.client.id" => "linehbase-wal-replicator", "topic.name.prefix" => "linehbase-wal", "topic.name.suffix" => "v1” }

The replayer for HBase 0.94 - Consume WAL compatible protobuf
data - Convert it to HBase 0.94‘s mutations (Put, Delete and so on) - Write them using HBase 0.94’s library

HBase and Kafka data pipeline - Such kind of pipeline
is called as “Change data capture” - Strength! - Easy to interact the database mutations - High reliability thanks to HBase Replication implementation and Kafka - Weakness☹ - Asynchronous, so there might be delay - 100ms~ - Cannot get other rowkeys or columns at the time on the mutation - Need aggregation or interact with database at consumer side

Without HBase and Kafka data pipeline Server Tables • Added
Kafka path for every HBase write path? • Retry for Kafka failure? • Won’t it affect to service? • Durability when server failure while sending to Kafka?

HBase and Kafka data pipeline: Reliability Server Tables • Added
Kafka path for every HBase write path? →Yes, adding peer • Retry for Kafka failure? →Yes, Kafka client retry + retry in replication source • Won’t it affect to service? →No issue for short failure • Durability when server failure while sending to Kafka? →No issue thanks to replication failover RegionServer ZooKeeper Replication Source WALEntry Replication Endpoint Replication offset retries retries

HBase and Kafka pipeline Applications

Applications - We use this pipeline for several years and
develop applications - 20+ target tables - 1.2M WAL messages / sec at peak - Introduce 4 kinds of our usecase and applications so far - Replication or data migration that the built-in HBase replication cannot handle - Applications running business logic considering WAL as an event stream - Near-realtime statistics analysis - Abuser detection at storage side

Replication or data migration 1.2.5 non-secure Kerberos-secured 0.94

Replication or data migration Replayer HBase 0.94 client 1.2.5 non-secure
Kerberos-secured 0.94 Kerberos authenticated

Replication or data migration Replayer HBase 0.94 client Kerberos authenticated
Other middleware 1.2.5 non-secure Kerberos-secured 0.94

Applications with WALs UserSettings - User settings service manages settings
for each user as key-value format - Use it not only in Messaging Platform, but also in other services - Other service want to know settings changes Family app service user-settings service user-settings Get latest settings

Applications with WALs UserSettings user-settings service WAL Consumer Event Producer
Service A consumer Service B consumer WAL WAL settings event settings event user-settings

Near-realtime statistic analysis - Traffic bursts 3x~4x of daily peak
at 00:00 New Year - For New Year Greeting: Akeome LINE - Monitoring various metrics on new year bursting - Message count - Important metrics because the load is proportional to message count (and it’s fun) - High resolution: every 1 sec, 100 ms - Near-realtime: <= 10 seconds delay

Near-realtime statistic analysis WAL Consumer Count in 100ms bucket WAL
WAL SEND_MESSAGE Server operation

400K msgs at 00:00:03

Abuser detection - Various Abusers in LINE Messaging Platform -
Detecting them by various aspect - For persistent storage, HBase - Long term and massive data storing abusing pattern is critical - Not only disk usage, but also HBase performance - Might affect to many other users

Abuser detection WAL Consumer Count aggregation PenaltyGateway WAL WAL 1m
count 1d count 2w count Count changelog Ban abuser Store penalty Read penalty Block request Penalty rules Server Tables user penalties

Future works - Expand usage of HBase and Kafka data
pipeline - Secondary index (Materialized view) - Incremental backup

Secondary index - HBase only support index by row, column:
Key → Value - For example, Alice become a friend of Bob - Store Alice → Bob in HBase - Lookup from Bob is not supported - Need secondary index for reverse lookup: Value → Keys - Apache Phoenix provides an option for HBase with SQL - Overhead - Overkill for just secondary index - Using Redis, Cassandra for such purpose - Want secondary index in HBase for some reasons - Reliability - Performance - Consistency model - ...

Secondary index server WAL Consumer Build secondary index Value →
Keys Key → Value Value → Keys Tables Key → Value

HBase’s incremental backup HBase F Take a full backup Time
WALs HFiles Cron job t2 t1 t3 MR Job Storage (HDFS, Amazon S3, ...) Take incremental backups

HBase’s incremental backup: pain point HBase Time WALs HFiles Cron
job t3 MR Job Bug released Remains all WALs until cron job runs Extra load on the cluster Restore from backup Lost sound data F t1 t2

Incremental backup using pipeline )#BTF 8"- $POTVNFS F Take a
snapshot Time 8"-T No impact to HBase Storage (HDFS, Amazon S3, ...)

Incremental backup using pipeline )#BTF 8"- $POTVNFS F Time 8"-T
t1 t2 t3 Make HFiles for fast restore Storage (HDFS, Amazon S3, ...)

8"-T Incremental backup using pipeline: restore )#BTF F Time Bug
released Restore from backup t3 t1 t2

Conclusion - HBase and Kafka data pipeline for LINE Messaging
Platform - Using HBase WAL and replication - Powerful and reliable way to interact with DB mutation - Our actual use case of the pipeline - Replication or data migration that the built-in HBase replication cannot handle - Applications running business logic considering WAL as an event stream - Near-realtime statistics analysis - Abuser detection at storage side - Possible use cases - Secondary index - Incremental backup - What’s your idea?

HBase and Kafka data pipeline and applications ...

HBase and Kafka data pipeline and applications for LINE Messaging Platform

More Decks by LINE DEVDAY 2021

Other Decks in Technology

Featured

Transcript