Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Writing a High Performance Database in Go

Writing a High Performance Database in Go

My talk from GopherCon 2014.

benbjohnson

April 24, 2014
Tweet

More Decks by benbjohnson

Other Decks in Technology

Transcript

  1. Writing a
    High Performance
    Database in Go

    View Slide

  2. Two Meanings of
    “Database”

    View Slide

  3. Database
    Server

    View Slide

  4. Database
    Library

    View Slide

  5. You may never write
    a database but...

    View Slide

  6. HOW WE
    ACCESS DATA
    AFFECTS US ALL!

    View Slide

  7. Why write a
    database in Go?

    View Slide

  8. Things that need to
    be really f*cking fast
    Things that need to
    be pretty fast

    View Slide

  9. Things that need to
    be really f*cking fast
    Things that need to
    be pretty fast
    User Management
    Schema Management
    Query Parsing
    Backup / Recovery
    Bulk Data Insertion
    etc...

    View Slide

  10. Things that need to
    be really f*cking fast
    Things that need to
    be pretty fast
    Query Execution
    User Management
    Schema Management
    Query Parsing
    Backup / Recovery
    Bulk Data Insertion
    etc...

    View Slide

  11. There’s more to databases than speed

    View Slide

  12. There’s more to databases than speed
    Easy Deployment

    View Slide

  13. There’s more to databases than speed
    Easy Deployment
    User friendly API

    View Slide

  14. There’s more to databases than speed
    Easy Deployment
    User friendly API
    Simple debugging

    View Slide

  15. How do you make
    the fast parts fast?

    View Slide

  16. Option #1: CGO

    View Slide

  17. Pro: Integrate with
    tons of existing libraries

    View Slide

  18. Con: Overhead incurred
    with each C function call

    View Slide

  19. LuaJIT
    Easy to integrate, good community
    Half the speed of C, weird caveats

    View Slide

  20. LLVM
    Really, really fast
    Really, really complicated

    View Slide

  21. The point isn’t to just use C

    View Slide

  22. The point is that C is an option

    View Slide

  23. Option #2: Pure Go

    View Slide

  24. Bolt

    View Slide

  25. Basics of Bolt
    Pure Go port of LMDB
    Memory-mapped B+tree
    MVCC, ACID transactions
    Zero copy reads

    View Slide

  26. Batch Work
    Together

    View Slide

  27. Batch Size
    1
    Bolt Batch Benchmarks
    Performance
    10
    100
    1000
    Baseline
    9x Baseline
    45x Baseline
    90x Baseline
    Disclaimer: YMMV

    View Slide

  28. Use a channel to stream changes
    Transaction Coalescing
    Group changes into single transaction
    Either all changes commit or rollback

    View Slide

  29. Encoding
    Matters!

    View Slide

  30. JSON Baseline
    gogoprotobuf 20x JSON
    Cap’n Proto 60x JSON
    Encoding Performance
    Disclaimer: YMMV

    View Slide

  31. See also: Albert Strasheim’s
    “Serialization in Go” Talk
    Encoding Performance
    http://www.slideshare.net/albertstrasheim/serialization-in-go
    https://github.com/cloudflare/goser

    View Slide

  32. Here’s a crazy
    idea...

    View Slide

  33. Direct map to
    your data file

    View Slide

  34. // Create a byte slice with the same size as type T.
    var value = make([]byte, unsafe.Sizeof(T{})
    // Map a typed pointer from the byte slice and update it.
    var t = (*T)unsafe.Pointer(&value[0])
    t.ID = 123
    t.MyIntValue = 20
    // Insert value into database.
    db.Update(func(tx *bolt.Tx) error {
    return tx.Bucket(“T”).Put([]byte(“123”), value)
    })
    Map a struct to a []byte

    View Slide

  35. // Start a read transaction.
    db.View(func(tx *bolt.Tx) error {
    c := tx.Bucket(“T”).Cursor()
    // Iterate over each value in the bucket.
    for k, v := c.First(); k != nil; k, v = c.Next() {
    var t = (*T)unsafe.Pointer(&value[0])
    // ... do something with “t” ...
    }
    return nil
    })
    Map a []byte to a struct

    View Slide

  36. No encoding/decoding
    Pros:
    Insert 100k values/sec
    Read 20M values/sec

    View Slide

  37. Fixed struct layout
    Cons:
    Machine specific endianness
    People will think you’re crazy

    View Slide

  38. Your CPU can do 3 billion
    operations per second so
    USE IT!

    View Slide

  39. How to think about
    performance optimization

    View Slide

  40. Self-actualization
    Hierarchy of Need
    Esteem
    Love/Belonging
    Safety
    Physiological

    View Slide

  41. Self-actualization
    Hierarchy of Need
    Esteem
    Love/Belonging
    Safety
    Physiological

    View Slide

  42. Memory Access
    Hierarchy of SPEED
    Mutexes
    Memory Allocation
    Disk I/O
    Network I/O

    View Slide

  43. Go can be extremely
    fast... if you know
    how to optimize it!

    View Slide

  44. Questions
    @benbjohnson

    View Slide