Upgrade to Pro — share decks privately, control downloads, hide ads and more …

HanoiDB - and Other Riak Hacks

HanoiDB - and Other Riak Hacks

Keynote presentation by Kresten Krab Thorup from Erlang Workshop 2012, at ICFP Copenhagen. Code at http://github.com/krestenkrab/hanoidb

Kresten Krab Thorup

September 14, 2012
Tweet

More Decks by Kresten Krab Thorup

Other Decks in Programming

Transcript

  1. 2 Riak HanoiDB Erjang / Triq Common Medicine Card Riak

    Java Client DeltaZip Relai Sync iOS/ Android RiakMongo Adapter BTree Backend
  2. 3 Riak HanoiDB Erjang / Triq Common Medicine Card Riak

    Java Client DeltaZip Relai iOS/ Android RiakMongo Adapter BTree Backend Sync
  3. Distributed Architecture 5 ▪ Availability: Run in multiple data centers

    ▪ Scalability: Prepare the system for expected growth
  4. Data Kinds 6 Person Data ~5M entities read/write Audit Log

    ~5B entries mostly write Bitcask LevelDB
  5. Riak Backends 7 Bitcask Keys in Memory Pure Erlang (almost)

    LevelDB Keys on Disk C++ (mostly) Both use log structured [append-only] files, and thus need to “garbage collect” old data files occasionally.
  6. Random I/O 2.7 MB/s Sequential I/O 213 MB/s Price 0.3

    $/MB Spinning Disk 100 x performance!
  7. Random 60-300 MB/s Sequential 293-366 MB/s Price 1 $/MB Solid

    Disk seq io not much faster but 3x expensive
  8. HanoiDB ▪ “LevelDB re-implemented in Erlang” ▪ Log-structured storage ▪

    write-once B-trees ▪ organized in “doubling size” levels ▪ Incremental garbage collection ▪ log(N) worst-time insert/lookup
  9. HanoiDB compressed block Key-Sequence Datasets on Indelible Storage [Malcolm C.

    Easton, IBM J. RES. DEVELOP. VOL. 30 NO. 3 MAY 1986] meta data + bloom filter
  10. HanoiDB merge merge merge for every insert, do one merge

    step at each level => guarantees “room” [1] J. L. Bentley and J. B. Saxe*. Decomposable searching problems I: Static-to-dynamic transformation. J. Algorithms 1(4):301–358, 1980. [2] M. H. Overmars* and J. van Leeuwen. Worst-case optimal insertion and deletion methods for decomposable searching problems. Inform. Process. Lett. 12:168–173, 1981.
  11. HanoiDB the nursery “emulates” the top X levels with an

    in-memory structure and a transaction log nursery tx log level file append when full
  12. hanoidb «gen_server» nursery hanoidb_level #8 «plain_fsm» A-8.data [random,read] top hanoidb_level

    #9 hanoidb_level #10 next next B-8.data [random,read] API process data structure file Legend lookup
  13. Read Caching ▪ Hard to cache in Erlang! ▪ Live

    terms add GC load ▪ ETS impose copying cost ▪ Binaries-as-a-data structure ▪ Currently: No caching ▪ Spawn process for each lookup
  14. hanoidb «gen_server» nursery hanoidb_level #8 «plain_fsm» A-8.data [random,read] hanoidb_merger «plain_fsm»

    merge_pid top hanoidb_level #9 hanoidb_merger merge_pid hanoidb_level #10 hanoidb_merger merge_pid next next B-8.data [random,read] X-8.data [seq,write] B-8.data [seq,read] A-8.data [seq,read] API nursery.data [append] nursery.log [append] same A+B files opened twice! insert
  15. hanoidb «gen_server» nursery hanoidb_level #8 «plain_fsm» A-8.data [random,read] top hanoidb_level

    #9 next B-8.data [random,read] API hanoidb_folder BF-8.data [seq,read] hanoidb_folder AF-8.data [seq,read] folding hanoidb_fold_merger {source_order, [pid()]} {fold_result, Key, Value} [back pressure] {level_results, pid(), [{Key,Value}*100]} [back pressure] hard link range fold
  16. HanoiDB ▪ Pure Erlang = Easy to Improve & fix

    ▪ Low memory usage ▪ 2000 lines Erlang vs 30.000 lines C++ ▪ CRC/corruption strategies; Full Riak 2i support; expiry support
  17. Erlang ▪ Erlang’s I/O system does all the heavy lifting;

    writing this with “async callbacks” would have been a pain. ▪ Many small processes governing files/ resources ▪ Error handling for process composites