Building Resilient Services in Go

Building Resilient Services in Go

If your service is automatically monitored, then the answer is “yes!”. But what if your service isn’t monitored yet? Or what if your monitors alert you when the server is offline, but not on subtler problems like latency spikes or CPU load?

Fortunately, there’s a quick and easy way to get high-resolution metrics for observing your services and building scalable, resilient services in Go. When you combine the tools from the Go standard library with an intelligent observability pipeline, you can easily answer the questions you care about, like “Which servers are currently running near maximum capacity?”, or “Can our infrastructure handle tomorrow’s product launch?”.

94dcff33cbdf74b5d785369ac54bc1a8?s=128

Aditya Mukerjee

January 30, 2019
Tweet

Transcript

  1. Building Resilient Services in Go Aditya Mukerjee Observability Engineer at

    Stripe GoDays Berlin
  2. Observability measures how well internal states of a system can

    be inferred from knowledge of its external outputs @chimeracoder
  3. Go is used to build…. •Distributed systems •Reliable software •“The

    Cloud™” @chimeracoder
  4. 1. What should I monitor? 2. How do I monitor

    those things in Go? 3. What does the future of Go observability look like? @chimeracoder
  5. Let’s Create an API •Return a list of all Twitter

    followers •Record a copy to the database •Distributed! @chimeracoder API API API DB
  6. Service-Level Agreement: What we promise our clients @chimeracoder Service-Level Indicators:

    Data used to evaluate the SLA
  7. Service-Level Agreement: What we promise our clients @chimeracoder Service-Level Indicators:

    Data used to evaluate the SLA Service-Level Objective: What we target internally
  8. Service Indicators •Rate: Number of requests received •Errors: Number of

    responses written, broken down by HTTP status •Duration: Distribution of response latency @chimeracoder
  9. Every monitor involves a service-level indicator* @chimeracoder *for sufficiently broad

    definitions of “service”
  10. @chimeracoder Metrics, logs, and request traces are used to provide

    greater visibility beyond our service indicators
  11. Tool #1: Logs @chimeracoder

  12. Logging in Go •Use structured logging (e.g. logrus) instead of

    standard library @chimeracoder
  13. Logging in Go •Preserve contextual data – don’t just “check,

    log, and return” @chimeracoder
  14. @chimeracoder

  15. Tool #2: Metrics @chimeracoder

  16. Statsd protocol •Local service listening for metrics over UDP •Metric

    aggregation @chimeracoder
  17. @chimeracoder

  18. Aggregation Caveats •Cardinality: No aggregation by IP address (or even

    /24 subnets) •Host-local or fault tolerant: pick one! @chimeracoder
  19. https://veneur.org

  20. •Distributed statsd •Global metric aggregation (cross-server analysis) •Horizontally scalable •Fault-tolerant

    •Written in Go •Higher throughput •Tunable @chimeracoder
  21. Tool #3: Request Traces @chimeracoder

  22. @chimeracoder API API API DB

  23. @chimeracoder

  24. Tracing Your Context •Like profiling, but across servers •Take a

    snapshot of a request and inspect each function @chimeracoder
  25. Putting it all together: Logs, Metrics, and Traces @chimeracoder

  26. @chimeracoder

  27. Does it really have to be so complicated? @chimeracoder

  28. @chimeracoder Application logs metrics traces

  29. What’s the difference? •If you squint, it’s hard to tell

    them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder
  30. Standard Sensor Format @chimeracoder

  31. @chimeracoder

  32. Define and measure your service indicator metrics @chimeracoder

  33. The future of distributed systems is being written in Go

    @chimeracoder The future of observability will be written in Go, too
  34. What does the future of observability, written in Go, look

    like? @chimeracoder
  35. Thank you! Aditya Mukerjee @chimeracoder https://veneur.org #veneur on Freenode