Slide 1

Slide 1 text

Log-Structured Merge-Tree (LSM Tree)

Slide 2

Slide 2 text

Target and Learning Outcome Target Web Developers who know RDBMS but don't use NoSQL/NewSQL database Learning Outcome Learn technology behind NoSQL/NewSQL databses so that we can have a idea of how/in what situation we use them

Slide 3

Slide 3 text

Introduction Have you heard about NoSQL database or NewSQL database? NoSQL: Elasticsearch, MongoDB(WiredTiger), HBase, Cassandra Embedded DB: LevelDB, RocksDB, SQLite (as an extension) NewSQL: Google Spanner, TiDB(MySQL compatible), YugaByteDB(Postgress compatible) All these databases use LSM Tree to store data

Slide 4

Slide 4 text

History of LSM Tree Log-Structured Merge Trees (1996) introduced in Paper BIgtable (2006) the first famous product which uses LSM Tree LevelDB (2011) made by Jeff Dean for Chrome internal database Rocks DB (2013) forked from LevelDB by Facebook and used in many database products Google Spanner (2017) the most famous globally distributed RDBMS

Slide 5

Slide 5 text

LSM Tree Data Structure is a data structure to implement a key-value store consists of Memtable, WAL(Write Ahead Log), SSTable

Slide 6

Slide 6 text

Before explaining LSM Tree

Slide 7

Slide 7 text

Log Structured Data Store Naive Idea to implement a Key Value Store Append new key-value pair at the end of a file -> O(1) Get the latest value of the key -> O(n) Demo

Slide 8

Slide 8 text

Log Structured data store Need index to find a key efficiently Use hash map to store key and byte offset of the key in a file

Slide 9

Slide 9 text

Log Structured data store Segmentation and Compaction Keep the latest values for each key Need to store all keys in memory

Slide 10

Slide 10 text

Log Structured data store Merge Can save disk and make read faster with compaction and merge

Slide 11

Slide 11 text

LSM Tree Data Structure SSTable

Slide 12

Slide 12 text

SSTable What if keys in a file are sorted? Can use sparseIndex Don't need to store all keys

Slide 13

Slide 13 text

SSTable can keep keys sorted when merging and compacting file can merge two segment files like a merge sort

Slide 14

Slide 14 text

Memtable Store it into an in-memory a skip list or a balanced tree data structure when a write comes in

Slide 15

Slide 15 text

Memtable e.g. SkipList

Slide 16

Slide 16 text

Memtable e.g. Red-Black Tree

Slide 17

Slide 17 text

Memtable Important properties Fast Search and Insertion Can reteieve all values sorted in O(n) Dump memtable into SSTable when memory usage reaches a threshold

Slide 18

Slide 18 text

LSM Tree Data Structure

Slide 19

Slide 19 text

Additional Data Structure (Bloomfilter) It takes time to find a key especially when it doesn't exists We can use bloomfilter to know whether it exists or not

Slide 20

Slide 20 text

Comparison with B+ Tree

Slide 21

Slide 21 text

B+ Tree B+ tree is a generalization of binary serach tree in which a node can have more than two children

Slide 22

Slide 22 text

Comparison with B+ Tree Fast Read -> B+ Tree Fast write -> LSM Tree Low Disk Usage -> LSM Tree

Slide 23

Slide 23 text

Comparison with B+ Tree https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM

Slide 24

Slide 24 text

Extra HBase Implementation My Implementation