Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Experiences building InfluxDB with #golang

39b7a68b6cbc43ec7683ad0bcc4c9570?s=47 Paul Dix
September 24, 2014

Experiences building InfluxDB with #golang

Tech talk given at #greetech06 in Tokyo

39b7a68b6cbc43ec7683ad0bcc4c9570?s=128

Paul Dix

September 24, 2014
Tweet

Transcript

  1. Experiences Building InfluxDB with #golang Paul Dix CEO of InfluxDB

    http://influxdb.com paul@influxdb.com @pauldix
  2. None
  3. An open source time series, metrics, and events database.

  4. Written in Go

  5. Self-contained binary. Install and it’s ready to run with no

    additional services required.
  6. Uses embedded storage engine. Either LevelDB, RocksDB, HyperLevelDB, or LMDB.

  7. Single server or distributed in a cluster* Clustering still experimental

  8. First commit on September 26th, 2013

  9. But it’s based off work that started here

  10. We’ve been running #golang in production since January 2013

  11. Our experience with Go has been extremely positive

  12. Go has many strengths, here are some favorites

  13. Simplicity Go is easy to understand and the basics can

    be learned in a day
  14. Simplicity leads to more readable code

  15. Readable code is easier to understand

  16. Scala def binarySearch[A <% Ordered[A]](! a: IndexedSeq[A], v: A) =

    {! ! def recurse(low: Int, ! high: Int): Option[Int] = (low + high) / 2 match {! ! case _ if high < low => None! case mid if a(mid) > v => recurse(low, mid - 1)! case mid if a(mid) < v => recurse(mid + 1, high)! case mid => Some(mid)! }! recurse(0, a.size - 1)! }!
  17. Go func binarySearch(! a []float64, value float64, low int, high

    int) int {! ! if high < low {! return -1! }! mid := (low + high) / 2! if a[mid] > value {! return binarySearch(a, value, low, mid-1)! } else if a[mid] < value {! return binarySearch(a, value, mid+1, high)! }! return mid! }!
  18. I find the Go version much easier to read and

    understand
  19. You end up reading much more code than you write

  20. Understandable code is easier to maintain

  21. Performance Go has great performance for network services

  22. Robustness Go’s implementation is very sturdy

  23. Simple Deployment Self-contained binaries mean you can copy a single

    file
  24. No external dependencies is a big win for InfluxDB

  25. Zero dependency deployment is a big win for server software

    and DevOps tools
  26. Here are a few of the lessons we’ve learned

  27. Vendor your dependencies

  28. We use goraft. There were many times we wasted a

    day troubleshooting a bug introduced from “go get” against a dependency when building on a new server.
  29. Create a vendor directory in your project and import from

    there. Read more here: https://bitly.com/greetech06_vendoring
  30. Learn the idioms* *This is something I’m still working on!

  31. Channels of Channels // Execute event loop for the given

    state.! l.ch = make(chan chan struct{})! ! switch state {! case Follower:! go l.followerLoop(l.ch)! case Candidate:! go l.candidateLoop(l.ch)! case Leader:! go l.leaderLoop(l.ch)! }! }! From Ben Johnson’s Streaming Raft: https://github.com/influxdb/influxdb/blob/streaming-raft/raft/log.go
  32. // in the loop! for {! // Check if the

    loop has been closed.! select {! case ch := <-done:! close(ch)! return! default:! }! ! // do stuff ! }!
  33. func (l *Log) setState(state State) {! // Stop previous state.!

    if l.ch != nil {! ch := make(chan struct{})! l.ch <- ch! <-ch! l.ch = nil! }! ! // restart loop! }!
  34. use sync.Mutex for caches or state Read more here: https://bitly.com/greetech06_mutex

  35. Declare Directionality of Channels in Function Definitions

  36. There are many more

  37. Go is still a new language that we’re all learning

  38. Do pull requests and code reviews

  39. Garbage collection can be your enemy

  40. Time series data is many small objects. In testing we

    saw significant GC pauses with heaps as small as 2GB.
  41. How do we cache the most recent data at scale?

  42. Go 1.4+ GC plan may help Read more here: http://bit.ly/greetech06_gc

  43. I have doubts that any GC’d section of memory can

    be fast on > 10GB heaps
  44. We’re thinking about hacking around it

  45. ! // allocate slab! ! slabSize := 1024! ! b,

    _ := syscall.Mmap(-1, 0, slabSize,! ! ! syscall.PROT_READ|syscall.PROT_WRITE|syscall.PROT_EXEC,! ! ! syscall.MAP_ANON|syscall.MAP_PRIVATE)! ! ! // store values! ! someVals := []float64{23, 92, 1, 0, 3}! ! offset := 0! ! for _, v := range someVals {! ! ! x := (*float64)(unsafe.Pointer(&b[offset]))! ! ! *x = v! ! ! offset += 8! ! }! ! ! // read them out! ! offset = 0! ! for offset < len(b) {! ! ! x := (*float64)(unsafe.Pointer(&b[offset]))! ! ! offset += 8! ! ! fmt.Println("n: ", *x)! ! }!
  46. We can hide entire sections of memory from the GC

  47. Bad idea?

  48. Go has great concurrency primitives

  49. Go has great networking libraries

  50. But network and distributed programming is still hard

  51. Example: Influx kept crashing filling up file descriptors

  52. Client libraries were opening thousands of connections with keep-alives

  53. No keep alive timeouts so we do this type HttpServer

    struct {! // ...! readTimeout time.Duration! // ...! }! ! // later we create the server with timeout! srv := &http.Server{! Handler: p, ReadTimeout: h.readTimeout}!
  54. But timeout prevents large requests from finishing

  55. Still a problem

  56. Example: Influx cluster

  57. How to detect when slow, or unresponsive? ! _, err

    = conn.Write(data)!
  58. We’ll timeout! ! conn.SetWriteDeadline(! time.Now().Add(writeTimeout))! ! ! _, err =

    conn.Write(data)!
  59. What should timeout be? ! conn.SetWriteDeadline(! time.Now().Add(writeTimeout))! ! ! _,

    err = conn.Write(data)!
  60. Timeout Problems • GC Pauses

  61. Timeout Problems • GC Pauses • Large Requests

  62. Timeout Problems • GC Pauses • Large Requests • Overloaded

    network
  63. Timeout Problems • GC Pauses • Large Requests • Overloaded

    network • Slow server
  64. Timeout Problems • GC Pauses • Large Requests • Overloaded

    network • Slow server • All occur under load
  65. Timeout Problems • GC Pauses • Large Requests • Overloaded

    network • Slow server • All occur under load • All lead to connection flapping
  66. No single solution, still hard in Go

  67. Don’t be afraid to re-invent libraries

  68. We started out using goraft

  69. But now we’re implementing a new Raft library for Influx

  70. Many community written libraries in Go are very young

  71. Sometimes they won’t meet your needs

  72. Sometimes you need to start over and create your own

    libraries!
  73. A few performance tips from Ben Johnson

  74. Use pprof memprofile early and often

  75. Read more here: http://bit.ly/greetech06_slice type point struct {! val float64!

    time time.Time! }! ! func leakMemory() []*point {! points := make([]*point, 0)! // make some points! for i := 0; i < 1000; i++ {! points = append(! points, &point{float64(i), time.Now()})! }! // all points still kept in memory!! return points[10:20]! }!
  76. Write benchmarks for multiple layers of your application so you

    can narrow down bottlenecks
  77. Slices of structs is fast to allocate, lower GC overhead.

    Slices of pointers is faster to move around and sort.
  78. Benchmark Against Go Source for Reference go test -bench=. <pkg>

    For example: - 20ns for Mutex lock/unlock - 70ns for RWMutex lock/unlock - json encodes at 100MB/s - json decodes at 25MB/s
  79. There are many other tips and tricks, but those are

    a few
  80. InfluxDB still has work to do to use what we

    have learned since September 2013
  81. Thank you Paul Dix paul@influxdb.com @pauldix