Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handle a huge traffic of messages with HBase

Handle a huge traffic of messages with HBase

LINE Thailand Developer Conference 2019
https://www.facebook.com/events/410021356453349/

LINE Developers
PRO

June 04, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. PICHET ITNGAM Software Engineer LINE THAILAND HANDLE A HUGE TRAFFIC

    OF MESSAGES WITH HBASE
  2. Users messages messages Business messages messages Old Platform

  3. ~44 Million Users messages Business messages messages Old Platform messages

    RDBMS
  4. Old Platform OA+ Platform Thailand Thailand Other Other Other Other

    + + +
  5. RDBMS > 200 Million Users ? messages Business messages messages

    OA+ Platform messages
  6. VOLUME 2V PROBLEMS VELOCITY

  7. RDBMS Limitations: 
 Maximum size for a table? 32 TB

    Maximum size for a row? 400 GB RDBMS - VOLUME?
  8. Slaves Multi-master RDBMS - VELOCITY? replicate Read (Scalable) Write (Bottleneck)

    Scalable problem Replication lag problem
  9. VOLUME RDBMS? VELOCITY

  10. VOLUME 2V PROBLEMS VELOCITY

  11. TECHNICAL CHALLENGES High volume of messages High velocity of messages

    Read and write instantly Horizontal scalability High availability
  12. HBase

  13. HBase Data Model

  14. Key ra rb rc CF1 colA colB colC CF1 colA

    colB colC val val val val val val val val val val HBase Data Model Row Key Column Family Column Qualifier Flexible schema
  15. Let’s see an example

  16. Key i s HBase Data Model Table: users create ‘users’,

    ‘i’, ‘s’ Table name Column Family Column Family
  17. Key 11 i first last s Pichet Itngam HBase Data

    Model Table: users put ‘users’, ‘11’, ‘i:first’, ‘Pichet’ Table name Row key Column Family Column Qualifier Value put ‘users’, ‘11’, ‘i:last’, ‘Itngam’
  18. Key 11 12 13 i first middle last s friends

    relatives wives Pichet Itngam 50 1 Geza 4 Chris Evans 102 HBase Data Model Table: users 20 Ben Affleck
  19. How about messages table?

  20. Key m o HBase Data Model create ‘messages’, ‘m’, ‘o’

    Table: messages Table name Column Family Column Family
  21. Key bucket01 m m1 o Hello HBase Data Model put

    ‘messages’, ‘bucket01’, ‘m:m1’, ‘Hello’ Table: messages Table name Row key Column Family Column Qualifier Value
  22. Key bucket01 m m1 o Hello HBase Data Model put

    ‘messages’, ‘bucket01’, ‘m:m2’, ‘Hi’ Table: messages put ‘messages’, ‘bucket01’, ‘m:m3’, ‘Hey’ m2 m3 I want to.. How mu..
  23. Key bucket01 m m1 o Hello HBase Data Model put

    ‘messages’, ‘bucket02’, ‘m:m1’, ‘How’ Table: messages put ‘messages’, ‘bucket02’, ‘m:m2’, ‘Why’ m2 m3 I want to.. How mu.. bucket02 Here is .. Thank y..
  24. Key bucket01 bucket02 bucket03 m m1 m2 m3 o view

    click mark Hello How mu.. 2450 yes Thank y.. 10 Shipping.. I got it.. 415 30 HBase Data Model Table: messages I want to.. Here is .. Do you.. Separates messages into a bucket like paginations
  25. HBase Architecture

  26. HBase Architecture Master Servers ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS)
  27. HIGH VOLUME OF MESSAGES HIGH VELOCITY OF MESSAGES READ AND

    WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  28. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS)
  29. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION 1GB max 1,000 regions REG ION S ERVE R max 1,000 regions
  30. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION bucket04 bucket05 bucket06 1GB max 1,000 regions REG ION S ERVE R max 1,000 regions
  31. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION 1GB bucket04 bucket05 bucket06 RE GI ON 1GB max 1,000 regions REG ION S ERVE R max 1,000 regions
  32. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION 1GB max 1,000 regions REG ION S ERVE R bucket04 bucket05 bucket06 REG ION 1GB max 1,000 regions
  33. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER {row key} {row key} {row key} REGION {row key} {row key} {row key} {row key} 1GB {row key} {row key} {row key} RE GI ON {row key} {row key} {row key} {row key} 1GB max 1,000 regions REG ION S ERVE R {row key} {row key} {row key} REG ION {row key} {row key} {row key} {row key} 1GB {row key} {row key} {row key} REG ION {row key} {row key} {row key} {row key} 1GB max 1,000 regions
  34. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  35. HIGH VOLUME OF MESSAGES Solved by regions split when data

    grows too large HIGH VELOCITY OF MESSAGES READ AND WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  36. HIGH VELOCITY OF MESSAGES HIGH VELOCITY OF MESSAGES READ AND

    WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  37. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  38. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active Region Server Data

    Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Client assign regions to region servers 1) Where is .META location 2) Get Regions server from .META (then cached) Row Key Value table,key,region Region Server 3) Put or Get row from Region server directly
  39. Region Server Data Node (HDFS) Region Server Data Node (HDFS)

    Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Put messages directly Put messages directly Put messages directly Put messages directly
  40. REG ION S ERVE R REG ION Memstore Memstore WAL

    Write Ahead Log HD FS DA TA N OD E HFile HFile 3) Acknowledge 1) Put messages directly Client For recovery All changes written in memory 2) Write
  41. RE GI ON SE RV ER RE GI ON Memstore

    Memstore WAL Write Ahead Log HD FS DATA NO DE HFile HFile Key bucket01 CF:CQ version value m:m1 hello v1 bucket01 m:m2 hi v1 bucket01 m:m3 hey v1 Key bucket01 CF:CQ version value o:view 243 v1 bucket01 o:click 1 v1 bucket01 o:mark true v1 Each Memstore per column family Key Value Value Key Flush messages
 to disk Short name of column family reduce the size of each key
  42. HIGH VELOCITY OF MESSAGES Solved by read and write operates

    with Region server directly included with the power of Memstore HIGH VOLUME OF MESSAGES READ AND WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  43. READ AND WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY HIGH VELOCITY

    OF MESSAGES HIGH VOLUME OF MESSAGES
  44. REG ION S ERVE R REG ION BlockCache Memstore HDF

    S DAT A NO DE HFile Client 1) Look up the Row cell at BlockCache 2) Looks in the Memstore to get recently changed 3) If not found both BlockCache and Memstore Get messages 4) Cache key and value here (LRU evicted)
  45. READ AND WRITE INSTANTLY Solved by BlockCache and Memstore HIGH

    VELOCITY OF MESSAGES HIGH VOLUME OF MESSAGES HORIZONTAL SCALABILITY HIGH AVAILABILITY
  46. HORIZONTAL SCALABILITY READ AND WRITE INSTANTLY HIGH VELOCITY OF MESSAGES

    HIGH VOLUME OF MESSAGES HIGH AVAILABILITY
  47. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  48. Region Server Data Node (HDFS) Region Server Data Node (HDFS)

    Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Get/Put messages directly Get/Put messages directly Get/Put messages directly Get/Put messages directly
  49. Horizontal scalability (Linear) + Region Server Data Node (HDFS) Region

    Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Get/Put messages directly Get/Put messages directly Get/Put messages directly Get/Put messages directly Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Get/Put messages directly Get/Put messages directly Get/Put messages directly Get/Put messages directly
  50. HORIZONTAL SCALABILITY Solved by distributed computing by the Hadoop ecosystem

    READ AND WRITE INSTANTLY HIGH VELOCITY OF MESSAGES HIGH VOLUME OF MESSAGES HIGH AVAILABILITY
  51. HIGH AVAILABILITY HORIZONTAL SCALABILITY READ AND WRITE INSTANTLY HIGH VELOCITY

    OF MESSAGES HIGH VOLUME OF MESSAGES
  52. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  53. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active Region Server Region

    Server Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  54. REGION SE RV ER RE GI ON BlockCache Memstore HDF

    S PR I MAR Y DA T A NOD E HFile WAL RE GION S ERVE R HD FS SECO NDARY DATA NODE REG ION S ERVE R HDFS TERT IARY DATA NODE HFile replicate HFile replicate
  55. REGION SE RV ER REGI ON BlockCache Memstore HDFS DATA

    N OD E HFile WAL REG ION S ERVE R REG ION BlockCache Memstore HDF S DAT A NO DE HFile WAL ZooKeeper Hmaster active Hmaster active No Heartbeat Start recovery process
  56. HIGH AVAILABILITY Solved by ZooKeeper, HDFS replication and WAL HORIZONTAL

    SCALABILITY READ AND WRITE INSTANTLY HIGH VELOCITY OF MESSAGES HIGH VOLUME OF MESSAGES
  57. TECHNICAL CHALLENGES High volume of messages High velocity of messages

    Read and write instantly Horizontal scalability High availability
  58. > 200 Million Users messages Business messages messages OA+ Platform

    messages HBase
  59. HBASE Pros - Designed for scale - Scale automatically -

    Built-in recovery - Read-write realtime Cons - Require resources - WAL recovery slow - No fancy SQL, only CRUD - Has “Scan” but poor perf
  60. RDBMS HBase Size of data < Limit Size of data

    >= TB, PB Read > Write Heavy read or/and write Has resources Finalised business 
 requirements
  61. ACKNOWLEDGMENTS LINE THAILAND DEVELOPER CONFERENCE 2019 ENDLESS_POSSIBILITIES_ WITH_LINE_API

  62. THANK_YOU