Slide 1

Slide 1 text

Experiences Building InfluxDB with #golang Paul Dix CEO of InfluxDB http://influxdb.com paul@influxdb.com @pauldix

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

An open source time series, metrics, and events database.

Slide 4

Slide 4 text

Written in Go

Slide 5

Slide 5 text

Self-contained binary. Install and it’s ready to run with no additional services required.

Slide 6

Slide 6 text

Uses embedded storage engine. Either LevelDB, RocksDB, HyperLevelDB, or LMDB.

Slide 7

Slide 7 text

Single server or distributed in a cluster* Clustering still experimental

Slide 8

Slide 8 text

First commit on September 26th, 2013

Slide 9

Slide 9 text

But it’s based off work that started here

Slide 10

Slide 10 text

We’ve been running #golang in production since January 2013

Slide 11

Slide 11 text

Our experience with Go has been extremely positive

Slide 12

Slide 12 text

Go has many strengths, here are some favorites

Slide 13

Slide 13 text

Simplicity Go is easy to understand and the basics can be learned in a day

Slide 14

Slide 14 text

Simplicity leads to more readable code

Slide 15

Slide 15 text

Readable code is easier to understand

Slide 16

Slide 16 text

Scala def binarySearch[A <% Ordered[A]](! a: IndexedSeq[A], v: A) = {! ! def recurse(low: Int, ! high: Int): Option[Int] = (low + high) / 2 match {! ! case _ if high < low => None! case mid if a(mid) > v => recurse(low, mid - 1)! case mid if a(mid) < v => recurse(mid + 1, high)! case mid => Some(mid)! }! recurse(0, a.size - 1)! }!

Slide 17

Slide 17 text

Go func binarySearch(! a []float64, value float64, low int, high int) int {! ! if high < low {! return -1! }! mid := (low + high) / 2! if a[mid] > value {! return binarySearch(a, value, low, mid-1)! } else if a[mid] < value {! return binarySearch(a, value, mid+1, high)! }! return mid! }!

Slide 18

Slide 18 text

I find the Go version much easier to read and understand

Slide 19

Slide 19 text

You end up reading much more code than you write

Slide 20

Slide 20 text

Understandable code is easier to maintain

Slide 21

Slide 21 text

Performance Go has great performance for network services

Slide 22

Slide 22 text

Robustness Go’s implementation is very sturdy

Slide 23

Slide 23 text

Simple Deployment Self-contained binaries mean you can copy a single file

Slide 24

Slide 24 text

No external dependencies is a big win for InfluxDB

Slide 25

Slide 25 text

Zero dependency deployment is a big win for server software and DevOps tools

Slide 26

Slide 26 text

Here are a few of the lessons we’ve learned

Slide 27

Slide 27 text

Vendor your dependencies

Slide 28

Slide 28 text

We use goraft. There were many times we wasted a day troubleshooting a bug introduced from “go get” against a dependency when building on a new server.

Slide 29

Slide 29 text

Create a vendor directory in your project and import from there. Read more here: https://bitly.com/greetech06_vendoring

Slide 30

Slide 30 text

Learn the idioms* *This is something I’m still working on!

Slide 31

Slide 31 text

Channels of Channels // Execute event loop for the given state.! l.ch = make(chan chan struct{})! ! switch state {! case Follower:! go l.followerLoop(l.ch)! case Candidate:! go l.candidateLoop(l.ch)! case Leader:! go l.leaderLoop(l.ch)! }! }! From Ben Johnson’s Streaming Raft: https://github.com/influxdb/influxdb/blob/streaming-raft/raft/log.go

Slide 32

Slide 32 text

// in the loop! for {! // Check if the loop has been closed.! select {! case ch := <-done:! close(ch)! return! default:! }! ! // do stuff ! }!

Slide 33

Slide 33 text

func (l *Log) setState(state State) {! // Stop previous state.! if l.ch != nil {! ch := make(chan struct{})! l.ch <- ch! <-ch! l.ch = nil! }! ! // restart loop! }!

Slide 34

Slide 34 text

use sync.Mutex for caches or state Read more here: https://bitly.com/greetech06_mutex

Slide 35

Slide 35 text

Declare Directionality of Channels in Function Definitions

Slide 36

Slide 36 text

There are many more

Slide 37

Slide 37 text

Go is still a new language that we’re all learning

Slide 38

Slide 38 text

Do pull requests and code reviews

Slide 39

Slide 39 text

Garbage collection can be your enemy

Slide 40

Slide 40 text

Time series data is many small objects. In testing we saw significant GC pauses with heaps as small as 2GB.

Slide 41

Slide 41 text

How do we cache the most recent data at scale?

Slide 42

Slide 42 text

Go 1.4+ GC plan may help Read more here: http://bit.ly/greetech06_gc

Slide 43

Slide 43 text

I have doubts that any GC’d section of memory can be fast on > 10GB heaps

Slide 44

Slide 44 text

We’re thinking about hacking around it

Slide 45

Slide 45 text

! // allocate slab! ! slabSize := 1024! ! b, _ := syscall.Mmap(-1, 0, slabSize,! ! ! syscall.PROT_READ|syscall.PROT_WRITE|syscall.PROT_EXEC,! ! ! syscall.MAP_ANON|syscall.MAP_PRIVATE)! ! ! // store values! ! someVals := []float64{23, 92, 1, 0, 3}! ! offset := 0! ! for _, v := range someVals {! ! ! x := (*float64)(unsafe.Pointer(&b[offset]))! ! ! *x = v! ! ! offset += 8! ! }! ! ! // read them out! ! offset = 0! ! for offset < len(b) {! ! ! x := (*float64)(unsafe.Pointer(&b[offset]))! ! ! offset += 8! ! ! fmt.Println("n: ", *x)! ! }!

Slide 46

Slide 46 text

We can hide entire sections of memory from the GC

Slide 47

Slide 47 text

Bad idea?

Slide 48

Slide 48 text

Go has great concurrency primitives

Slide 49

Slide 49 text

Go has great networking libraries

Slide 50

Slide 50 text

But network and distributed programming is still hard

Slide 51

Slide 51 text

Example: Influx kept crashing filling up file descriptors

Slide 52

Slide 52 text

Client libraries were opening thousands of connections with keep-alives

Slide 53

Slide 53 text

No keep alive timeouts so we do this type HttpServer struct {! // ...! readTimeout time.Duration! // ...! }! ! // later we create the server with timeout! srv := &http.Server{! Handler: p, ReadTimeout: h.readTimeout}!

Slide 54

Slide 54 text

But timeout prevents large requests from finishing

Slide 55

Slide 55 text

Still a problem

Slide 56

Slide 56 text

Example: Influx cluster

Slide 57

Slide 57 text

How to detect when slow, or unresponsive? ! _, err = conn.Write(data)!

Slide 58

Slide 58 text

We’ll timeout! ! conn.SetWriteDeadline(! time.Now().Add(writeTimeout))! ! ! _, err = conn.Write(data)!

Slide 59

Slide 59 text

What should timeout be? ! conn.SetWriteDeadline(! time.Now().Add(writeTimeout))! ! ! _, err = conn.Write(data)!

Slide 60

Slide 60 text

Timeout Problems • GC Pauses

Slide 61

Slide 61 text

Timeout Problems • GC Pauses • Large Requests

Slide 62

Slide 62 text

Timeout Problems • GC Pauses • Large Requests • Overloaded network

Slide 63

Slide 63 text

Timeout Problems • GC Pauses • Large Requests • Overloaded network • Slow server

Slide 64

Slide 64 text

Timeout Problems • GC Pauses • Large Requests • Overloaded network • Slow server • All occur under load

Slide 65

Slide 65 text

Timeout Problems • GC Pauses • Large Requests • Overloaded network • Slow server • All occur under load • All lead to connection flapping

Slide 66

Slide 66 text

No single solution, still hard in Go

Slide 67

Slide 67 text

Don’t be afraid to re-invent libraries

Slide 68

Slide 68 text

We started out using goraft

Slide 69

Slide 69 text

But now we’re implementing a new Raft library for Influx

Slide 70

Slide 70 text

Many community written libraries in Go are very young

Slide 71

Slide 71 text

Sometimes they won’t meet your needs

Slide 72

Slide 72 text

Sometimes you need to start over and create your own libraries!

Slide 73

Slide 73 text

A few performance tips from Ben Johnson

Slide 74

Slide 74 text

Use pprof memprofile early and often

Slide 75

Slide 75 text

Read more here: http://bit.ly/greetech06_slice type point struct {! val float64! time time.Time! }! ! func leakMemory() []*point {! points := make([]*point, 0)! // make some points! for i := 0; i < 1000; i++ {! points = append(! points, &point{float64(i), time.Now()})! }! // all points still kept in memory!! return points[10:20]! }!

Slide 76

Slide 76 text

Write benchmarks for multiple layers of your application so you can narrow down bottlenecks

Slide 77

Slide 77 text

Slices of structs is fast to allocate, lower GC overhead. Slices of pointers is faster to move around and sort.

Slide 78

Slide 78 text

Benchmark Against Go Source for Reference go test -bench=. For example: - 20ns for Mutex lock/unlock - 70ns for RWMutex lock/unlock - json encodes at 100MB/s - json decodes at 25MB/s

Slide 79

Slide 79 text

There are many other tips and tricks, but those are a few

Slide 80

Slide 80 text

InfluxDB still has work to do to use what we have learned since September 2013

Slide 81

Slide 81 text

Thank you Paul Dix paul@influxdb.com @pauldix