Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building a High-Performance Key/Value Store in Go

Building a High-Performance Key/Value Store in Go

In this talk we explore the internals of a high-performance key/value store written in Go. The audience will learn the basic design used to store and retrieve data, as well the techniques used to achieve high performance.

Marty Schoch

July 14, 2017
Tweet

More Decks by Marty Schoch

Other Decks in Technology

Transcript

  1. General Purpose Key-Value Stores Key Value hatter mad head attached

    rabbit early Get / Set / Delete Values by Key
  2. General Purpose Key-Value Stores Key Value hatter mad head attached

    rabbit early Get / Set / Delete Values by Key GET hatter → mad
  3. General Purpose Key-Value Stores Key Value hatter mad head attached

    rabbit early late Get / Set / Delete Values by Key SET rabbit late
  4. General Purpose Key-Value Stores Key Value hatter mad rabbit late

    head attached Get / Set / Delete Values by Key DELETE head
  5. General Purpose Key-Value Stores Key Value hatter mad rabbit late

    Atomic Batch Updates tea party cat grinning hare march
  6. General Purpose Key-Value Stores Key Value cat grinning hare march

    hatter mad rabbit late tea party Atomic Batch Updates
  7. General Purpose Key-Value Stores Isolated Read Snapshots Iterator Started Key

    Value cat grinning hare march hatter mad rabbit late tea party
  8. General Purpose Key-Value Stores Isolated Read Snapshots Iterator Started Key

    Value cat grinning caterpillar smoking hare march hatter mad rabbit late tea party Concurrent Mutation
  9. General Purpose Key-Value Stores Isolated Read Snapshots Iterator Started Key

    Value cat grinning caterpillar smoking hare march hatter mad rabbit late tea party Concurrent Mutation (not seen)
  10. Off-The-Shelf Solutions BoltDB? • Go • b+tree • Great Read

    Performance https://github.com/boltdb/bolt
  11. Off-The-Shelf Solutions RocksDB? • C++ • LSM (leveldb) • Better

    Read/Write Perf Balance • cgo https://github.com/facebook/rocksdb/
  12. Off-The-Shelf Solutions GoLevelDB? • Go • LSM (leveldb) • Unable

    to Tune Adequately https://github.com/syndtr/goleveldb
  13. Off-The-Shelf Solutions Badger? • Go • WiscKey • Not Available

    at the Time https://github.com/dgraph-io/badger
  14. Simplicity Damian Gryski - Slices https://go-talks.appspot.com/github.com/dgryski/talks/dotgo-2016/slices.slide Performance through cache friendliness

    Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures. "Notes on C Programming" (Rob Pike, 1989)
  15. Special Purpose Key-Value Store Index Write Throughput Read after Write

    Latency • Persistence Decoupled from Read and Write
  16. Special Purpose Key-Value Store Index Write Throughput Read after Write

    Latency • Persistence Decoupled from Read and Write • Willing to use All System RAM* * Bounded by some Quota
  17. Set( []byte("name"), []byte("marty") ) data []byte name marty 0 20

    24 29 Build 2 uint64s of metadata 20 Data Offset 64 bits
  18. Set( []byte("name"), []byte("marty") ) data []byte name marty 0 20

    24 29 Build 2 uint64s of metadata 20 Data Offset Operation 4 bits s 64 bits
  19. Set( []byte("name"), []byte("marty") ) data []byte name marty 0 20

    24 29 Build 2 uint64s of metadata 20 Data Offset Operation 4 bits 24 bits s 4 64 bits Key Len
  20. Set( []byte("name"), []byte("marty") ) data []byte name marty 0 20

    24 29 Build 2 uint64s of metadata 20 Data Offset Operation 4 bits 24 bits 28 bits s 4 5 64 bits Key Len Val Len
  21. Set( []byte("name"), []byte("marty") ) data []byte name marty 0 20

    24 29 Build 2 uint64s of metadata 20 Data Offset Operation 4 bits 24 bits 28 bits s 4 5 – 8 bits 64 bits Key Len Val Len
  22. Set( []byte("name"), []byte("marty") ) data []byte name marty 0 20

    24 29 20 s 4 5 meta []uint64 Append uint64s to meta slice
  23. type segment struct { data []byte meta []uint64 } Sorting

    only shuffles integers, not bytes!
  24. Persist each Segment in the Segment Stack Header meta []uint64

    data []byte func (f *File) Write(b []byte) (n int, err error)
  25. func Uint64SliceToByteSlice(in []uint64) ([]byte, error) { inHeader := (*reflect.SliceHeader)(unsafe.Pointer(&in)) var

    out []byte outHeader := (*reflect.SliceHeader)(unsafe.Pointer(&out)) outHeader.Data = inHeader.Data outHeader.Len = inHeader.Len * 8 outHeader.Cap = inHeader.Cap * 8 return out, nil }
  26. Persist the Segment Stack Header meta []uint64 data []byte []byte

    func (f *File) Write(b []byte) (n int, err error)
  27. Persist the Segment Stack Header meta []uint64 data []byte []byte

    not portable func (f *File) Write(b []byte) (n int, err error)
  28. In-memory and Disk Segments Get/Iterator implementations the same, no new

    code collection segmentStack []byte []byte mmap
  29. High Performance - Minimize GC Impact • Slices of uint64

    and byte type segment struct { data []byte meta []uint64 }
  30. High Performance - Minimize GC Impact • Slices of uint64

    and byte • Integer offsets into slices, not pointers type segment struct { data []byte meta []uint64 } data []byte name marty 20 s 4 5 – meta []uint64
  31. High Performance - Memory Allocation • 2 uints of meta

    per op type segment struct { data []byte meta []uint64 }
  32. High Performance - Memory Allocation • 2 uints of meta

    per op • len(data) = len(key) + len(val) type segment struct { data []byte meta []uint64 }
  33. High Performance - Memory Allocation type Batch interface { AllocSet(key,

    val []byte) error AllocDel(key []byte) error }
  34. High Performance - Unsafe • Faster serialization, but loss of

    portability • Deeply unsatisfying https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html Rob Pike – The byte order fallacy "Whenever I see code that asks what the native byte order is, it's almost certain the code is either wrong or misguided."
  35. Moss Competitor @mossbro Moss Fanboy Moss is the best, fastest!

    #winning @mossbro Moss Fanboy Competition is slow. Sad!
  36. Moss Competitor @mossbro Moss Fanboy Moss is the best, fastest!

    #winning @mossbro Moss Fanboy Competition is slow. Sad!