Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Study of Log Structured Merge Tree

Study of Log Structured Merge Tree

467edb2e221a47426f19405f53aa51ca?s=128

nakaly

June 01, 2021
Tweet

Transcript

  1. Log-Structured Merge-Tree (LSM Tree)

  2. Target and Learning Outcome Target Web Developers who know RDBMS

    but don't use NoSQL/NewSQL database Learning Outcome Learn technology behind NoSQL/NewSQL databses so that we can have a idea of how/in what situation we use them
  3. Introduction Have you heard about NoSQL database or NewSQL database?

    NoSQL: Elasticsearch, MongoDB(WiredTiger), HBase, Cassandra Embedded DB: LevelDB, RocksDB, SQLite (as an extension) NewSQL: Google Spanner, TiDB(MySQL compatible), YugaByteDB(Postgress compatible) All these databases use LSM Tree to store data
  4. History of LSM Tree Log-Structured Merge Trees (1996) introduced in

    Paper BIgtable (2006) the first famous product which uses LSM Tree LevelDB (2011) made by Jeff Dean for Chrome internal database Rocks DB (2013) forked from LevelDB by Facebook and used in many database products Google Spanner (2017) the most famous globally distributed RDBMS
  5. LSM Tree Data Structure is a data structure to implement

    a key-value store consists of Memtable, WAL(Write Ahead Log), SSTable
  6. Before explaining LSM Tree

  7. Log Structured Data Store Naive Idea to implement a Key

    Value Store Append new key-value pair at the end of a file -> O(1) Get the latest value of the key -> O(n) Demo
  8. Log Structured data store Need index to find a key

    efficiently Use hash map to store key and byte offset of the key in a file
  9. Log Structured data store Segmentation and Compaction Keep the latest

    values for each key Need to store all keys in memory
  10. Log Structured data store Merge Can save disk and make

    read faster with compaction and merge
  11. LSM Tree Data Structure SSTable

  12. SSTable What if keys in a file are sorted? Can

    use sparseIndex Don't need to store all keys
  13. SSTable can keep keys sorted when merging and compacting file

    can merge two segment files like a merge sort
  14. Memtable Store it into an in-memory a skip list or

    a balanced tree data structure when a write comes in
  15. Memtable e.g. SkipList

  16. Memtable e.g. Red-Black Tree

  17. Memtable Important properties Fast Search and Insertion Can reteieve all

    values sorted in O(n) Dump memtable into SSTable when memory usage reaches a threshold
  18. LSM Tree Data Structure

  19. Additional Data Structure (Bloomfilter) It takes time to find a

    key especially when it doesn't exists We can use bloomfilter to know whether it exists or not
  20. Comparison with B+ Tree

  21. B+ Tree B+ tree is a generalization of binary serach

    tree in which a node can have more than two children
  22. Comparison with B+ Tree Fast Read -> B+ Tree Fast

    write -> LSM Tree Low Disk Usage -> LSM Tree
  23. Comparison with B+ Tree https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM

  24. Extra HBase Implementation My Implementation