Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Profiling in Production (Alexander Else 2019)

Profiling in Production (Alexander Else 2019)

Avatar for GopherConAU

GopherConAU

October 31, 2019
Tweet

More Decks by GopherConAU

Other Decks in Programming

Transcript

  1. Using pprof to diagnose a production issue *Identify and fix

    a specific performance problem we had *What I learned about go from this experience Setting yourself up for success with pprof *Why you might want to profile your production services *How you should prepare to do it
  2. How can we dance when our [CPU fan] is turning?

    How can we sleep when our [CPU cores] are burning? — Midnight Oil
  3. Don’t use http.DefaultServeMux • Magically exposes things you don’t know

    about • https://mmcloughlin.com/posts/your-pprof-is-showing • Anything can register a route! Instead • Start with a clean mux and server mux !:= http.NewServeMux() srv !:= &http.Server{Addr: “8080”, Handler: mux} • Import net/http/pprof and expose what you need, or use middleware to do the lifting • eg. https://github.com/go-chi/chi/blob/master/middleware/profiler.go
  4. Make it private • Reveals a lot about your code

    and internal state • Increased surface area Instead • Restrict access to it • Put your pprof endpoints behind auth • Run on a non-public IP and port srv !:= &http.Server{Addr: “127.0.0.1:9090”, Handler: mux} • Respond to a signal $ kill -USR1 $PID
  5. Performance impact • There’s non-zero performance impact to your running

    process • Negligible in my cases - what about yours? Instead • Look at the perf impact under controlled conditions • Gain confidence that you can use it when you need it
  6. Convenience • People don’t like cumbersome tools and processes •

    More than 1 command is too much Instead • curl (with authentication!) • SSM • chatops
  7. Muscle memory • Tools are only useful if you can

    use them when you need to • You don’t want to read StackOverflow at 3am Instead • Regularly use your tools. It’s a dress rehearsal • Use your alerts and incident runbooks to tell you what to do • Do it the same way across all your systems