Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Study of Log Structured Merge Tree

nakaly
June 01, 2021

Study of Log Structured Merge Tree

nakaly

June 01, 2021
Tweet

More Decks by nakaly

Other Decks in Technology

Transcript

  1. Target and Learning Outcome Target Web Developers who know RDBMS

    but don't use NoSQL/NewSQL database Learning Outcome Learn technology behind NoSQL/NewSQL databses so that we can have a idea of how/in what situation we use them
  2. Introduction Have you heard about NoSQL database or NewSQL database?

    NoSQL: Elasticsearch, MongoDB(WiredTiger), HBase, Cassandra Embedded DB: LevelDB, RocksDB, SQLite (as an extension) NewSQL: Google Spanner, TiDB(MySQL compatible), YugaByteDB(Postgress compatible) All these databases use LSM Tree to store data
  3. History of LSM Tree Log-Structured Merge Trees (1996) introduced in

    Paper BIgtable (2006) the first famous product which uses LSM Tree LevelDB (2011) made by Jeff Dean for Chrome internal database Rocks DB (2013) forked from LevelDB by Facebook and used in many database products Google Spanner (2017) the most famous globally distributed RDBMS
  4. LSM Tree Data Structure is a data structure to implement

    a key-value store consists of Memtable, WAL(Write Ahead Log), SSTable
  5. Log Structured Data Store Naive Idea to implement a Key

    Value Store Append new key-value pair at the end of a file -> O(1) Get the latest value of the key -> O(n) Demo
  6. Log Structured data store Need index to find a key

    efficiently Use hash map to store key and byte offset of the key in a file
  7. Log Structured data store Segmentation and Compaction Keep the latest

    values for each key Need to store all keys in memory
  8. Log Structured data store Merge Can save disk and make

    read faster with compaction and merge
  9. SSTable What if keys in a file are sorted? Can

    use sparseIndex Don't need to store all keys
  10. SSTable can keep keys sorted when merging and compacting file

    can merge two segment files like a merge sort
  11. Memtable Store it into an in-memory a skip list or

    a balanced tree data structure when a write comes in
  12. Memtable Important properties Fast Search and Insertion Can reteieve all

    values sorted in O(n) Dump memtable into SSTable when memory usage reaches a threshold
  13. Additional Data Structure (Bloomfilter) It takes time to find a

    key especially when it doesn't exists We can use bloomfilter to know whether it exists or not
  14. B+ Tree B+ tree is a generalization of binary serach

    tree in which a node can have more than two children
  15. Comparison with B+ Tree Fast Read -> B+ Tree Fast

    write -> LSM Tree Low Disk Usage -> LSM Tree