Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

So You Wanna Go Fast?

Tyler Treat
September 29, 2017

So You Wanna Go Fast?

Go's simplicity and concurrency model make it an appealing choice for backend systems, but how does it fare for latency-sensitive applications? In this talk, we explore the other side of the coin by providing some tips on writing high-performance Go and lessons learned in the process. We do a deep dive on low-level performance optimizations in order to make Go a more compelling option in the world of systems programming, but we also consider the trade-offs involved.

Tyler Treat

September 29, 2017
Tweet

More Decks by Tyler Treat

Other Decks in Programming

Transcript

  1. @tyler_treat - Messaging Nerd @ Apcera - Working on nats.io

    - Distributed systems - bravenewgeek.com Tyler Treat
  2. @tyler_treat Measurement Techniques - pprof
 - memory
 - cpu
 -

    blocking - GODEBUG
 - gctrace
 - schedtrace
 - allocfreetrace - Benchmarking
 - Code-level: testing.B
 - System-level: HdrHistogram (https://github.com/codahale/hdrhistogram)
 bench (https://github.com/tylertreat/bench)
  3. @tyler_treat The only way to get good at something is

    to be really fucking bad at it
 for a long time.
  4. @tyler_treat “Instead of explicitly using locks to mediate access to

    shared data, Go encourages the use of channels to pass references to data between goroutines.” https://blog.golang.org/share-memory-by-communicating
  5. @tyler_treat type Stringer interface {
 String() string
 }
 type Binary

    uint64 200 b := Binary(200) https://research.swtch.com/interfaces
  6. @tyler_treat type Stringer interface {
 String() string
 }
 type Binary

    uint64
 func (i Binary) String() string { return strconv.FormatUint(uint64(i), 2) } 200 b := Binary(200) https://research.swtch.com/interfaces
  7. @tyler_treat s := Stringer(b) Stringer tab data .
 .
 .

    itable(Stringer, Binary) type fun[0] type(Binary) (*Binary).String type Stringer interface {
 String() string
 } https://research.swtch.com/interfaces
  8. @tyler_treat tab data 200 Binary s := Stringer(b) Stringer .


    .
 . itable(Stringer, Binary) type fun[0] type(Binary) (*Binary).String type Stringer interface {
 String() string
 } https://research.swtch.com/interfaces
  9. @tyler_treat “We generally don’t want sync/atomic to be used at

    all…Experience has shown us again and again that very very few people are capable of writing correct code that uses atomic operations…” —Ian Lance Taylor
  10. @tyler_treat 1. Assign a generation, G1, to each
 I-node (empty

    struct). 2. Add new node by copying I-node with updated branch and generation then GCAS, i.e. atomically:
 - compare I-nodes to detect tree
 mutations.
 - compare root generations to detect
 snapshots. @tyler_treat G2 G1 Ctrie
  11. @tyler_treat “Packages that import unsafe may depend on internal properties

    of the Go implementation. We reserve the right to make changes to the implementation that may break such programs.” https://golang.org/doc/go1compat
  12. @tyler_treat CPU reader reader CPU reader reader reader reader CPU

    reader reader CPU reader reader reader RWMutex
  13. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader
  14. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader
  15. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader
  16. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader
  17. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader
  18. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader
  19. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex
  20. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex
  21. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex
  22. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex
  23. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex
  24. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex
  25. @tyler_treat RWMutex CPU reader reader CPU reader reader reader reader

    reader writer CPU reader reader reader reader CPU reader writer reader reader RWMutex RWMutex RWMutex
  26. CPU reader CPU reader reader reader CPU reader reader CPU

    reader reader U writer CPU reader reader CPU reader reader reader reader reader writer CPU reader reader reader reader CPU reader writer reader reader CPU reader reader CPU reader reader reader writer reader reader CPU reader reader reader reader CPU reader reader reader reader reader reader reader reader reader reader reader reader U reader reader ader ader U reader reader ader ader ader reader ader CPU read read reader reader CPU read read reader reader CPU read reader read reader @tyler_treat
  27. @tyler_treat RWMutex1 RWMutex2 RWMutex3 RWMutexN … memory 24 bytes 64

    bytes (cache line size) Cache rules everything around me
  28. @tyler_treat padding … 64 bytes (cache line size) memory 24

    bytes RWMutex1 Cache rules everything around me