Slide 1

Slide 1 text

Writing a High Performance Database in Go

Slide 2

Slide 2 text

Two Meanings of “Database”

Slide 3

Slide 3 text

Database Server

Slide 4

Slide 4 text

Database Library

Slide 5

Slide 5 text

You may never write a database but...

Slide 6

Slide 6 text

HOW WE ACCESS DATA AFFECTS US ALL!

Slide 7

Slide 7 text

Why write a database in Go?

Slide 8

Slide 8 text

Things that need to be really f*cking fast Things that need to be pretty fast

Slide 9

Slide 9 text

Things that need to be really f*cking fast Things that need to be pretty fast User Management Schema Management Query Parsing Backup / Recovery Bulk Data Insertion etc...

Slide 10

Slide 10 text

Things that need to be really f*cking fast Things that need to be pretty fast Query Execution User Management Schema Management Query Parsing Backup / Recovery Bulk Data Insertion etc...

Slide 11

Slide 11 text

There’s more to databases than speed

Slide 12

Slide 12 text

There’s more to databases than speed Easy Deployment

Slide 13

Slide 13 text

There’s more to databases than speed Easy Deployment User friendly API

Slide 14

Slide 14 text

There’s more to databases than speed Easy Deployment User friendly API Simple debugging

Slide 15

Slide 15 text

How do you make the fast parts fast?

Slide 16

Slide 16 text

Option #1: CGO

Slide 17

Slide 17 text

Pro: Integrate with tons of existing libraries

Slide 18

Slide 18 text

Con: Overhead incurred with each C function call

Slide 19

Slide 19 text

LuaJIT Easy to integrate, good community Half the speed of C, weird caveats

Slide 20

Slide 20 text

LLVM Really, really fast Really, really complicated

Slide 21

Slide 21 text

The point isn’t to just use C

Slide 22

Slide 22 text

The point is that C is an option

Slide 23

Slide 23 text

Option #2: Pure Go

Slide 24

Slide 24 text

Bolt

Slide 25

Slide 25 text

Basics of Bolt Pure Go port of LMDB Memory-mapped B+tree MVCC, ACID transactions Zero copy reads

Slide 26

Slide 26 text

Batch Work Together

Slide 27

Slide 27 text

Batch Size 1 Bolt Batch Benchmarks Performance 10 100 1000 Baseline 9x Baseline 45x Baseline 90x Baseline Disclaimer: YMMV

Slide 28

Slide 28 text

Use a channel to stream changes Transaction Coalescing Group changes into single transaction Either all changes commit or rollback

Slide 29

Slide 29 text

Encoding Matters!

Slide 30

Slide 30 text

JSON Baseline gogoprotobuf 20x JSON Cap’n Proto 60x JSON Encoding Performance Disclaimer: YMMV

Slide 31

Slide 31 text

See also: Albert Strasheim’s “Serialization in Go” Talk Encoding Performance http://www.slideshare.net/albertstrasheim/serialization-in-go https://github.com/cloudflare/goser

Slide 32

Slide 32 text

Here’s a crazy idea...

Slide 33

Slide 33 text

Direct map to your data file

Slide 34

Slide 34 text

// Create a byte slice with the same size as type T. var value = make([]byte, unsafe.Sizeof(T{}) // Map a typed pointer from the byte slice and update it. var t = (*T)unsafe.Pointer(&value[0]) t.ID = 123 t.MyIntValue = 20 // Insert value into database. db.Update(func(tx *bolt.Tx) error { return tx.Bucket(“T”).Put([]byte(“123”), value) }) Map a struct to a []byte

Slide 35

Slide 35 text

// Start a read transaction. db.View(func(tx *bolt.Tx) error { c := tx.Bucket(“T”).Cursor() // Iterate over each value in the bucket. for k, v := c.First(); k != nil; k, v = c.Next() { var t = (*T)unsafe.Pointer(&value[0]) // ... do something with “t” ... } return nil }) Map a []byte to a struct

Slide 36

Slide 36 text

No encoding/decoding Pros: Insert 100k values/sec Read 20M values/sec

Slide 37

Slide 37 text

Fixed struct layout Cons: Machine specific endianness People will think you’re crazy

Slide 38

Slide 38 text

Your CPU can do 3 billion operations per second so USE IT!

Slide 39

Slide 39 text

How to think about performance optimization

Slide 40

Slide 40 text

Self-actualization Hierarchy of Need Esteem Love/Belonging Safety Physiological

Slide 41

Slide 41 text

Self-actualization Hierarchy of Need Esteem Love/Belonging Safety Physiological

Slide 42

Slide 42 text

Memory Access Hierarchy of SPEED Mutexes Memory Allocation Disk I/O Network I/O

Slide 43

Slide 43 text

Go can be extremely fast... if you know how to optimize it!

Slide 44

Slide 44 text

Questions @benbjohnson