Traffic intensive storages at LINE's Messaging Application

Agenda › Introduction › LINE Messaging service storage requirements ›
Achieving storage requirements with Apache HBase › Make cluster stronger by evaluation › The importance of a good schema

LINE Messaging Application › 200 million Active Users aprox. ›
Around 4 billion messages sent everyday. › More than a messaging application: › Many family services: News, Music, LIVE, Pay, etc…

LINE Messaging App simplified logic send message Messaging Application Backend
event: message received read message event: message sent event: message checked update profile event: profile updated event: notified update profile event: notified read message Basically a big synchronization service. A B

Overview of LINE Messaging backend architecture Backend Application LEGY Asynchronous
task processor Apache Kafka Redis Redis Redis Storages play a very important role in LINE Messaging service. Apache HBase Cluster

Overview of LINE Messaging backend architecture Backend Application LEGY Asynchronous
task processor Apache Kafka Redis Redis Redis In-memory data structure store Used as a database and cache Distributed key/value store that provides random, realtime read/write access on top of HDFS (Hadoop Distributed File System) persistence layer Messages Events User data Social Graph … Apache HBase Cluster

+BWB "QQMJDBUJPO $MVTUFS Redis in-house sharded cluster Overview of LINE
Messaging Storages 3FEJT$MVTUFS.POJUPS master slave shard-1 $MVTUFS.BOBHFS4FSWFS ;PPLFFQFS TIBSETJOGP master slave shard-2 master slave shard-2 -*/& 3FEJT $MJFOU Sync Update Monitoring Health Check Redis official cluster gossip protocol 3FEJT $MJFOU Masters Slaves +BWB "QQMJDBUJPO

+BWB "QQMJDBUJPO $MVTUFS Redis in-house sharded cluster 3FEJT$MVTUFS.POJUPS master slave
shard-1 $MVTUFS.BOBHFS4FSWFS master slave shard-2 master slave shard-2 -*/& 3FEJT $MJFOU Update Monitoring Health Check Redis official cluster gossip protocol 3FEJT $MJFOU Masters Slaves +BWB "QQMJDBUJPO Sync ;PPLFFQFS TIBSETJOGP Overview of LINE Messaging Storages

+BWB "QQMJDBUJPO Apache HBase ;PPLFFQFS DPOpHVSBUJPO TZODISPOJ[BUJPO FUDʜ
.BTUFS )#BTF $MJFOU RegionServer RegionServer RegionServer meta table tableA region 2 tableA region 1 Lookup Master, meta table HDFS )BEPPQ%JTUSJCVUFE'JMF4ZTUFN )%'4 Overview of LINE Messaging Storages

Datanode Datanode Datanode blockA replica = 3 HDFS /BNFOPEF blockA
blockA blockA Datanode blockB blockB blockB Apache HBase +BWB "QQMJDBUJPO .BTUFS )#BTF $MJFOU RegionServer RegionServer RegionServer Lookup Master, meta table ;PPLFFQFS DPOpHVSBUJPO TZODISPOJ[BUJPO FUDʜ Overview of LINE Messaging Storages

LINE Messaging service storage requirements 200M AU (aprox.) Storage Storage
Storage Maintainable and scalable Avoid unnecessary cost Performant Reliable Highly available Data consistency 1.6 Petabytes of data for service purposes 2.7 trillion requests / day aprox. x3 traffic at peak on New Year (100 million requests / sec) …

Maintainable and scalable LINE Messaging service storage requirements › Redis
in-house sharded cluster: › No dynamic resizing. › Lack of possibility for version upgrades. › High operational cost. › Redis official cluster (from Redis 3.x): › Dynamic resizing. › Consumes more memory. › Performance degradation when cluster becomes big. › Gossip traffic eats lots of network in big cluster. › In general cost of memory > cost of disk. › Apache HBase › Horizontal scalability. › Rolling upgrade version. › Easy to operate (+ developed in-house automation tools). HBase node Performance HBase node HBase node Performance Redis in-house cluster Redis official cluster Apache HBase cluster old cluster new cluster "QQMJDBUJPO background migration

Highly Available and fault tolerant LINE Messaging service storage requirements
Backend Application Critical places with high traffic and high volume of data Storage Storage Storage Backend Application Backend Application

LINE Messaging service storage requirements Backend Application Critical places with
high traffic and high volume of data Storage Storage Storage Backend Application Backend Application busy! waiting waiting Threads waiting waiting Highly Available and fault tolerant

LINE Messaging service storage requirements Apache HBase Cluster-A Backend Application
› Client short timeout. But sometimes not enough. › Keep recovery time at its minimum possible. › Redis: short circuit breaker for fast failure: › Redis is single threaded instance can make application wait for thousands of Redis responses. › When time responses increases, temporary mark the shard as failed and stop sending requests to it. › HBase Dual clusters: › Store same data. › Requests will select fastest/higher priority/merge. › Mostly for immutable data or where punctual eventual consistency is tolerated. Threads waiting busy! Redis Redis Apache HBase Cluster-B RegionServer RegionServer RegionServer Redis Cache Critical places with high traffic and high volume of data Highly Available and fault tolerant

LINE Messaging service storage requirements Data Consistency Backend Application Redis
Storage Cluster Active Users Redis usage Early LINE Messaging Application: most of our data in memory. Primary storage

LINE Messaging service storage requirements Data Consistency Apache HBase Cluster
async sync data Backend Application Redis Storage Cluster › No transactions between Redis and HBase. › It suffers from race conditions. Need to keep both storages consistent over time. › Still all in memory (expensive). › Big technical debt. Primary storage Asynchronous task processor Apache Kafka retry

LINE Messaging service storage requirements Data Consistency: Redis as cache
- HBase as primary storage Apache HBase Cluster sync data Backend Application Redis Storage Cache Cluster Primary storage Asynchronous task processor Apache Kafka retry › Single source of truth. › Reduced memory usage. › No data loss thanks to HBase persistence and HDFS data replication.

LINE Messaging service storage requirements Increasing responsibility and usage of
HBase › HBase becomes primary storage for our core features. › New features are built on top of HBase. › Started to be used not only for messaging, but for new services and modules. › Redis as cache still serves most of the reads. Bigger impact on user experience HBase Requirements and standards 200M AU (aprox.)

Horizontal scalability. Reduced cost by reducing memory usage. Reliability and
performance: ? Data consistency and reduce risk of data loss. › Make our cluster more reliable and performant: Evaluate every version, settings and features we use. › Performance depends on how we use it: How we model and access data play an important role. High availability by dualizing clusters in critical areas. LINE Messaging service storage requirements

Evaluate new versions and features Apache Kafka topic Intercept Serialize
Put, Get, Scan…. with Protobuf Replay RPC (Put, Get, Scan….) 3FQMBZFS DPOTVNFS Shadow HBase RPCs Application Backend HBase Cluster (prod env) HBase Cluster (test env)

Apache Kafka topic Intercept Serialize Put, Get, Scan…. with Protobuff
Replay RPC (Put, Get, Scan….) 3FQMBZFS DPOTVNFS Shadow HBase RPCs Application Backend HBase Cluster HBase Cluster (test env) Evaluate new versions and features › Replay RPCs › Deploy new version or feature. › Detect and fix bugs, backport, etc… › Contribute to the open source project. › Safely apply on production!

Apache Kafka topic Intercept Serialize Put, Get, Scan…. with Protobuff
Replay RPC (Put, Get, Scan….) 3FQMBZFS DPOTVNFS Shadow HBase RPCs Application Backend HBase RegionServer HBase RegionServer (test env) Evaluate new versions and features Recent work: Improve cluster reliability by reducing performance spikes: › Access to disk is more unpredictable than memory access. › Networks could be occasionally unstable.

HBase RegionServer slow network )%'4$MJFOU Apache HBase HDFS Datanode B
… … Hedged Reads slow or flaky disk Disk

HBase RegionServer Hedged Reads slow network Apache HBase HDFS Datanode
Datanode Datanode B … B … … … B … … slow or flaky disk Disk )%'4$MJFOU

How to test? #define _GNU_SOURCE #include <dlfcn.sh> … typedef ssize_t
(*real_read_t)(int, void *, size_t); // http://man7.org/linux/man-pages/man2/read.2.html ssize_t read(int fd, void *data, size_t size) { // Our malicious code sleep(3); // Behave just like the regular syscall would return ((real_read_t)dlsym(RTLD_NEXT, “read”))(fd, data, size); } Testing Hedged Reads › Hedged Reads may not trigger with usual traffic. › Need to simulate slow network or flaky disk. › We can use LD_PRELOAD. This tells Unix dynamic linker to load your code before any other library. › In our case, we want to sleep when read(2) is called. gcc –shared –fPIC –o inject_read.so inject_read.c -ldl › Add in your hadoop-env.sh export LD_PRELOAD=${path_to_file}/inject_read.so

Testing Hedged Reads Hedged Reads working!

Testing Hedged Reads Issue: RPC active handlers increase

Testing Hedged Reads: deadlock "RpcServer.default.RWQ.Fifo.write.handler=27,queue=2,port=11471" …  java.lang.Thread.State: WAITING (parking)  -
parking to wait for <0x00007f5395078198> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:724)  … "RpcServer.default.RWQ.Fifo.read.handler=309,queue=26,port=11471" …  java.lang.Thread.State: WAITING (parking)  - parking to wait for <0x00007f5395078198> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)  … "RpcServer.default.RWQ.Fifo.read.handler=330,queue=34,port=11471" #378 daemon prio=5 os_prio=0 tid=0x00007f63afa57000   nid=0xce06 waiting on condition [0x00007f52bbf01000]  java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)   - parking to wait for <0x00007f55a244c520> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)   at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)   at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)   at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:193)   at org.apache.hadoop.hdfs.DFSInputStream.getFirstToComplete(DFSInputStream.java:1435)   at org.apache.hadoop.hdfs.DFSInputStream.hedgedFetchBlockByteRange(DFSInputStream.java:1400)   at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1538)   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1507)   …

CompletionService<..> hedgedService = new CompletionService(..); // submit the first read 
hedgedService.submit(readTask); try { future = hedgedService.poll(timeout); } catch(ExecutionException e) { // Ignore } result = hedgedService.take(); // BlockingQueue.take() !! // hang forever because there is no completed task in the BlockingQueue Testing Hedged Reads: deadlock HBase RegionServer )%'4$MJFOU Datanode Read 1- HBase acquires read lock 2- Submit task to read from Datanode 3- No actual result from Datanode 4- Results blocking queue is empty Read lock is never released! We were affected by HDFS-11303 Hedged read might hang infinitely if read data from all DN failed. Backported HDFS-11303 to our Hadoop internal LINE branch. lock.readLock().lock();

Testing Hedged Reads Contributing back to the Open Source community:
› Let the community know about this: HBASE-24469 Hedged read might hang infinitely if read data from all DN failed. › Fixed metrics: metrics are mentioned in the official book, so they must be there! Spent lots of time figuring out if the testing process or our settings were wrong. › Added metrics back to HBase 1.x: HBASE-24435 Bring back hedged reads metrics to branch-1 › Exposed new metric for main branch: HBASE-24994 Add hedgedReadOpsInCurThread metric

› Testing new versions and evaluating them: › Contributions: ›
HBASE-23205 Correctly update the position of WALs currently being replicated › HBASE-22715 All scan requests should be handled by scan handler threads in RWQueueRpcExecutor › HBASE-21418 Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row › HBASE-24994 Add hedgedReadOpsInCurThread metric › HBASE-24435 Bring back hedged reads metrics to branch-1 › HBASE-24402 Moving the meta region causes MetricsException when using above 2.6.0 hadoop version › … › Reported: › HBASE-24903 'scandetail' log message is missing when responseTooSlow happens in the rpc that closes the scanner › HBASE-21738 Remove all the CSLM#size operation in our memstore because it's an quite time consuming › HBASE-24469 Hedged read might hang infinitely if read data from all DN failed › … › Backports to our branches: › HBASE-24742 Improve performance of SKIP vs SEEK logic › HBASE-24282 'scandetail' log message is missing when responseTooSlow happens on the first scan rpc call › HBASE-21748 Remove all the CLSM#size operation in our memstore because it's an quite time consuming › HDFS-11303 Hedged read might hang infinitely if read data from all DN failed › … Made our clusters stronger by: Evaluate new versions and features

Are we ready yet? › Cluster might be fast, reliable,
healthy but we could still we see bad performance. › Table schema design plays an important role in performance. › It is important to understand the technology internals.

The importance of a good schema How data is organized
in HBase: $PMVNO'BNJMZ" $PMVNO $PMVNO $PMVNO BCD YZ[ $PMVNO'BNJMZ# $PMVNO $PMVNO WBMVF" WBMVF# SPX SPX SPX ʜ version: 10 (usually timestamp) version: 9 version: 8 Row Key Understanding HBase internals CELL: immutable and versioned

The importance of a good schema HBase write path: Write
123 flush() Disk (HDFS Files) Immutable )'JMF )'JMF { compact! )'JMF HBase RegionServer Understanding HBase internals .FN4UPSF Memory Data: Immutable CELLs sorted by - row key - column - version

HBase RegionServer The importance of a good schema HBase read
path: Read 123 )'JMF )'JMF )'JMF merge! Understanding HBase internals .FN4UPSF Memory Disk (HDFS Files) Immutable Data: Immutable CELLs sorted by - row key - column - version

The importance of a good schema Study case: Message Id
list table Message Id List table: A list of message Ids per chat userId1 : userId2 … messageId: 7 messageId: 11 … User A reads up to messageId: 11 We want to scan by messageId range event: Mark as read 2 WOW Read 17:23 PM A B Awesome! Read 17:23 PM

RegionServer1 regionA The importance of a good schema Study case:
Message Id list table MessageIdList table: A list of message Ids per chat $PMVNO'BNJMZ DPM NFTTBHF*E ʜ DPM NFTTBHF*E WBMVFTFOEFS6TFS*E WFSTJPO NFTTBHF*E WBMVFTFOEFS6TFS*E WFSTJPO NFTTBHF*E IBTI VTFS*E" VTFS*E# ʜ Row Key RegionServer1 regionB hash(111):222 hash(333):555 userId A < UserId B

RegionServer1 regionA RegionServer1 regionB The importance of a good schema
Study case: Message Id list table MessageIdList table: A list of message Ids per chat $PMVNO'BNJMZ DPM NFTTBHF*E ʜ DPM NFTTBHF*E WBMVFTFOEFS6TFS*E WFSTJPO NFTTBHF*E ʜ WBMVFTFOEFS6TFS*E WFSTJPO NFTTBHF*E Using messageId for the version will allow us to do range scans: scan from version 123 to version 245 Row Key hash(111):222 hash(333):555 userId A < UserId B IBTI VTFS*E" VTFS*E# ʜ

list table Hint: the spikes disappear after we flush() the MemStore to disk. Slow mark as read will impact users experience and our backends.

list table HBase read/write path: Read 123 Disk (HDFS File) )'JMF )'JMF )'JMF merge! RegionServer .FN4UPSF Memory

list table MemStore: In-memory sorted store implemented as a SkipList. ∞ Sorted

list table MemStore: In-memory sorted store implemented as a SkipList. ∞ Sorted › Insert / Delete / Read: time complexity O(N) › Insert 7

list table ∞ Sorted MemStore: In-memory sorted store implemented as a SkipList.

list table ∞ › Read 8 › Insert / Delete / Read: time complexity O(Log N) › Scales better than the simple Linked List! Sorted MemStore: In-memory sorted store implemented as a SkipList.

list table ∞ › Read Cell8 $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM $FMM › Insert / Delete / Read: time complexity O(Log N) › Scales better than the simple Linked List! Sorted by: RowKey, Column, Version MemStore: In-memory sorted store implemented as a SkipList.

list table Scan rowkey “hash(333):555” from version 1000 to version 1010  3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ Memstore: SkipList sorted cells

list table seek to row Scan rowkey “hash(333):555” from version 1000 to version 1010  3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ O(Log N) Memstore: SkipList

list table 3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ Memstore: SkipList seek next column 1000 <= version <= 1010 Hope you have more luck in the next column Scan rowkey “hash(333):555” from version 1000 to version 1010 

list table Scan rowkey “hash(333):555” from version 1000 to version 1010  3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ Memstore: SkipList seek next column

list table Scan rowkey “hash(333):555” from version 1000 to version 1010  3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ Memstore: SkipList seek next column O(Log N)

list table 3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ Memstore: SkipList seek next column Java implementation: ConcurrentSkipListMap.tailMap // internally makes immutable SubSkipList // needs to traverse the SkipList Scan rowkey “hash(333):555” from version 1000 to version 1010  O(Log N)

list table 3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ Memstore: SkipList seek next column seek next column seek next column … seek next column … include include Java implementation: ConcurrentSkipListMap.tailMap // needs to traverse the SkipList O(Log N) Scan rowkey “hash(333):555” from version 1000 to version 1010  … …

list table Memstore: SkipList O(Log N) O(Log N) O(Log N) O(Log N) Scan rowkey “hash(333):555” from version 1000 to version 1010  3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ … … include include … … Java implementation: ConcurrentSkipListMap.tailMap // needs to traverse the SkipList O(Log N)

list table Memstore: SkipList O(Log N) O(Log N) O(Log N) O(Log N) M O(M * Log N) Scan rowkey “hash(333):555” from version 1000 to version 1010  › Tried to fix: HBASE-21418 Reduce a number of reseek operations in MemstoreScanner when seek point is close to the current row 3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ … … include include … … Java implementation: ConcurrentSkipListMap.tailMap // needs to traverse the SkipList O(Log N)

list table HBase read/write path: Read/Write 123 flush() Disk (HDFS File) )'JMF )'JMF )'JMF RegionServer Spikes disappear when flush() But why? .FN4UPSF Memory

3PX,FZ $PMVNO 7FSTJPO NFTTBHF*E IBTI ʜ IBTI
IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ The importance of a good schema Study case: Message Id list table Load data block from disk Scan rowkey “hash(333):555” from version 1000 to version 1010  Data block ByteBuffer // position 0 HFile (Disk)

IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ The importance of a good schema Study case: Message Id list table Load data block from disk Scan rowkey “hash(333):555” from version 1000 to version 1010  Data block HFile (Disk) ByteBuffer - position 0 blockSeek position: 0

IBTI IBTI IBTI IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI IBTI ʜ ʜ IBTI IBTI ʜ ʜ ʜ IBTI ʜ The importance of a good schema Study case: Message Id list table Load data block from disk Scan rowkey “hash(333):555” from version 1000 to version 1010  Data block HFile (Disk) ByteBuffer - position 780 blockSeek advance advance advance … advance O(N) position: 780 Include position: 790 Include … advance advance

list table MessageIdList table: A list of message Ids per chat $PMVNO'BNJMZ DPM NFTTBHF*E ʜ DPM NFTTBHF*E WBMVFTFOEFS6TFS*E WFSTJPO NFTTBHF*E ʜ WBMVFTFOEFS6TFS*E WFSTJPO NFTTBHF*E IBTI MPXFS6TFS*E IJHIFS6TFS*E ʜ $PMVNO'BNJMZ DPM&.15: WBMVFTFOEFS6TFS*E WFSTJPONFTTBHF*E IBTI MPXFS6TFS*EIJHIFS6TFS*E NFTTBHF*E ʜ IBTI MPXFS6TFS*EIJHIFS6TFS*E NFTTBHF*E ʜ Row Key Row Key Scan from rowkey “hash(lowerUserId:higherUserId):1000” to rowkey “hash(lowerUserId:higherUserId):1011”  Scan rowkey “hash(lowerUserId):higherUserId” from version 1000 to version 1010 

list table Include Scan from rowkey “hash(lowerUserId:higherUserId):1000” to rowkey “hash(lowerUserId:higherUserId);1011”  … Include 3PX,FZ $PMVNO IBTI ʜ IBTI TFOEFS*E IBTI TFOEFS*E IBTI TFOEFS*E IBTI TFOEFS*E IBTI ʜ ʜ IBTI ʜ ʜ IBTI ʜ ʜ IBTI TFOEFS*E IBTI ʜ ʜ IBTI TFOEFS*E IBTI TFOEFS*E ʜ ʜ IBTI ʜ Memstore: SkipList seek to row Problem solved! O(Log N)

Wrap up › High demanding storage requirements: › Offer the
best to our users while avoiding unnecessary cost. › Need to be performant, reliable, highly available and scalable. › Need to protect our data against data inconsistencies. › Make our clusters reliable: › Test and evaluate every version or feature carefully. › Build a safe testing environment as similar as production. › Good data schema design is key for performance: › Must understand technology internals to make good design.

Future challenges › Better transactions. › Secondary Index. Overcome Key
Value storages limitations: › Disaster Recovery for now. › Machines are underutilized. › Nature of Messaging service makes active-active multi DC very challenging. Multi Data Center architecture: Apache HBase Apache HBase latency replication JP1 DC JP2 DC async Eventual consistency › No such high performance requirements. › But need better consistency and transactional features. › And still requires scalability. › Need to consider and explore new storages. Adapt better to projects with different needs:

Thank you

Traffic intensive storages at LINE's Messaging ...

Traffic intensive storages at LINE's Messaging Application

More Decks by LINE DevDay 2020

Other Decks in Technology

Featured

Transcript