Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Handle a huge traffic of messages with HBase

Handle a huge traffic of messages with HBase

LINE Thailand Developer Conference 2019
https://www.facebook.com/events/410021356453349/

Avatar for LINE Developers

LINE Developers

June 04, 2019
Tweet

More Decks by LINE Developers

Other Decks in Technology

Transcript

  1. RDBMS Limitations: 
 Maximum size for a table? 32 TB

    Maximum size for a row? 400 GB RDBMS - VOLUME?
  2. TECHNICAL CHALLENGES High volume of messages High velocity of messages

    Read and write instantly Horizontal scalability High availability
  3. Key ra rb rc CF1 colA colB colC CF1 colA

    colB colC val val val val val val val val val val HBase Data Model Row Key Column Family Column Qualifier Flexible schema
  4. Key i s HBase Data Model Table: users create ‘users’,

    ‘i’, ‘s’ Table name Column Family Column Family
  5. Key 11 i first last s Pichet Itngam HBase Data

    Model Table: users put ‘users’, ‘11’, ‘i:first’, ‘Pichet’ Table name Row key Column Family Column Qualifier Value put ‘users’, ‘11’, ‘i:last’, ‘Itngam’
  6. Key 11 12 13 i first middle last s friends

    relatives wives Pichet Itngam 50 1 Geza 4 Chris Evans 102 HBase Data Model Table: users 20 Ben Affleck
  7. Key m o HBase Data Model create ‘messages’, ‘m’, ‘o’

    Table: messages Table name Column Family Column Family
  8. Key bucket01 m m1 o Hello HBase Data Model put

    ‘messages’, ‘bucket01’, ‘m:m1’, ‘Hello’ Table: messages Table name Row key Column Family Column Qualifier Value
  9. Key bucket01 m m1 o Hello HBase Data Model put

    ‘messages’, ‘bucket01’, ‘m:m2’, ‘Hi’ Table: messages put ‘messages’, ‘bucket01’, ‘m:m3’, ‘Hey’ m2 m3 I want to.. How mu..
  10. Key bucket01 m m1 o Hello HBase Data Model put

    ‘messages’, ‘bucket02’, ‘m:m1’, ‘How’ Table: messages put ‘messages’, ‘bucket02’, ‘m:m2’, ‘Why’ m2 m3 I want to.. How mu.. bucket02 Here is .. Thank y..
  11. Key bucket01 bucket02 bucket03 m m1 m2 m3 o view

    click mark Hello How mu.. 2450 yes Thank y.. 10 Shipping.. I got it.. 415 30 HBase Data Model Table: messages I want to.. Here is .. Do you.. Separates messages into a bucket like paginations
  12. HBase Architecture Master Servers ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS)
  13. HIGH VOLUME OF MESSAGES HIGH VELOCITY OF MESSAGES READ AND

    WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  14. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS)
  15. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION 1GB max 1,000 regions REG ION S ERVE R max 1,000 regions
  16. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION bucket04 bucket05 bucket06 1GB max 1,000 regions REG ION S ERVE R max 1,000 regions
  17. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION 1GB bucket04 bucket05 bucket06 RE GI ON 1GB max 1,000 regions REG ION S ERVE R max 1,000 regions
  18. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER bucket01 bucket02 bucket03 REGION 1GB max 1,000 regions REG ION S ERVE R bucket04 bucket05 bucket06 REG ION 1GB max 1,000 regions
  19. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active REGION SE RV

    ER {row key} {row key} {row key} REGION {row key} {row key} {row key} {row key} 1GB {row key} {row key} {row key} RE GI ON {row key} {row key} {row key} {row key} 1GB max 1,000 regions REG ION S ERVE R {row key} {row key} {row key} REG ION {row key} {row key} {row key} {row key} 1GB {row key} {row key} {row key} REG ION {row key} {row key} {row key} {row key} 1GB max 1,000 regions
  20. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  21. HIGH VOLUME OF MESSAGES Solved by regions split when data

    grows too large HIGH VELOCITY OF MESSAGES READ AND WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  22. HIGH VELOCITY OF MESSAGES HIGH VELOCITY OF MESSAGES READ AND

    WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  23. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  24. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active Region Server Data

    Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Client assign regions to region servers 1) Where is .META location 2) Get Regions server from .META (then cached) Row Key Value table,key,region Region Server 3) Put or Get row from Region server directly
  25. Region Server Data Node (HDFS) Region Server Data Node (HDFS)

    Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Put messages directly Put messages directly Put messages directly Put messages directly
  26. REG ION S ERVE R REG ION Memstore Memstore WAL

    Write Ahead Log HD FS DA TA N OD E HFile HFile 3) Acknowledge 1) Put messages directly Client For recovery All changes written in memory 2) Write
  27. RE GI ON SE RV ER RE GI ON Memstore

    Memstore WAL Write Ahead Log HD FS DATA NO DE HFile HFile Key bucket01 CF:CQ version value m:m1 hello v1 bucket01 m:m2 hi v1 bucket01 m:m3 hey v1 Key bucket01 CF:CQ version value o:view 243 v1 bucket01 o:click 1 v1 bucket01 o:mark true v1 Each Memstore per column family Key Value Value Key Flush messages
 to disk Short name of column family reduce the size of each key
  28. HIGH VELOCITY OF MESSAGES Solved by read and write operates

    with Region server directly included with the power of Memstore HIGH VOLUME OF MESSAGES READ AND WRITE INSTANTLY HORIZONTAL SCALABILITY HIGH AVAILABILITY
  29. REG ION S ERVE R REG ION BlockCache Memstore HDF

    S DAT A NO DE HFile Client 1) Look up the Row cell at BlockCache 2) Looks in the Memstore to get recently changed 3) If not found both BlockCache and Memstore Get messages 4) Cache key and value here (LRU evicted)
  30. READ AND WRITE INSTANTLY Solved by BlockCache and Memstore HIGH

    VELOCITY OF MESSAGES HIGH VOLUME OF MESSAGES HORIZONTAL SCALABILITY HIGH AVAILABILITY
  31. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  32. Region Server Data Node (HDFS) Region Server Data Node (HDFS)

    Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Get/Put messages directly Get/Put messages directly Get/Put messages directly Get/Put messages directly
  33. Horizontal scalability (Linear) + Region Server Data Node (HDFS) Region

    Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Get/Put messages directly Get/Put messages directly Get/Put messages directly Get/Put messages directly Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Get/Put messages directly Get/Put messages directly Get/Put messages directly Get/Put messages directly
  34. HORIZONTAL SCALABILITY Solved by distributed computing by the Hadoop ecosystem

    READ AND WRITE INSTANTLY HIGH VELOCITY OF MESSAGES HIGH VOLUME OF MESSAGES HIGH AVAILABILITY
  35. Master Servers ZooKeeper HBase Architecture ZooKeeper ZooKeeper Hmaster active Hmaster

    active Name Node Region Server Data Node (HDFS) Slave Servers Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Server Data Node (HDFS) Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  36. ZooKeeper ZooKeeper ZooKeeper Hmaster active Hmaster active Region Server Region

    Server Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region Region Region Server Region Region Region Region Region
  37. REGION SE RV ER RE GI ON BlockCache Memstore HDF

    S PR I MAR Y DA T A NOD E HFile WAL RE GION S ERVE R HD FS SECO NDARY DATA NODE REG ION S ERVE R HDFS TERT IARY DATA NODE HFile replicate HFile replicate
  38. REGION SE RV ER REGI ON BlockCache Memstore HDFS DATA

    N OD E HFile WAL REG ION S ERVE R REG ION BlockCache Memstore HDF S DAT A NO DE HFile WAL ZooKeeper Hmaster active Hmaster active No Heartbeat Start recovery process
  39. HIGH AVAILABILITY Solved by ZooKeeper, HDFS replication and WAL HORIZONTAL

    SCALABILITY READ AND WRITE INSTANTLY HIGH VELOCITY OF MESSAGES HIGH VOLUME OF MESSAGES
  40. TECHNICAL CHALLENGES High volume of messages High velocity of messages

    Read and write instantly Horizontal scalability High availability
  41. HBASE Pros - Designed for scale - Scale automatically -

    Built-in recovery - Read-write realtime Cons - Require resources - WAL recovery slow - No fancy SQL, only CRUD - Has “Scan” but poor perf
  42. RDBMS HBase Size of data < Limit Size of data

    >= TB, PB Read > Write Heavy read or/and write Has resources Finalised business 
 requirements