Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building Resilient Services in Go

Building Resilient Services in Go

If your service is automatically monitored, then the answer is “yes!”. But what if your service isn’t monitored yet? Or what if your monitors alert you when the server is offline, but not on subtler problems like latency spikes or CPU load?

Fortunately, there’s a quick and easy way to get high-resolution metrics for observing your services and building scalable, resilient services in Go. When you combine the tools from the Go standard library with an intelligent observability pipeline, you can easily answer the questions you care about, like “Which servers are currently running near maximum capacity?”, or “Can our infrastructure handle tomorrow’s product launch?”.

Aditya Mukerjee

January 30, 2019
Tweet

More Decks by Aditya Mukerjee

Other Decks in Technology

Transcript

  1. Building Resilient Services in Go
    Aditya Mukerjee
    Observability Engineer at Stripe
    GoDays
    Berlin

    View full-size slide

  2. Observability measures how well internal states of a system
    can be inferred from knowledge of its external outputs
    @chimeracoder

    View full-size slide

  3. Go is used to build….
    •Distributed systems
    •Reliable software
    •“The Cloud™”
    @chimeracoder

    View full-size slide

  4. 1. What should I monitor?
    2. How do I monitor those things in Go?
    3. What does the future of Go observability look like?
    @chimeracoder

    View full-size slide

  5. Let’s Create an API
    •Return a list of all Twitter followers
    •Record a copy to the database
    •Distributed!
    @chimeracoder
    API
    API
    API
    DB

    View full-size slide

  6. Service-Level Agreement: What we promise our clients
    @chimeracoder
    Service-Level Indicators: Data used to evaluate the SLA

    View full-size slide

  7. Service-Level Agreement: What we promise our clients
    @chimeracoder
    Service-Level Indicators: Data used to evaluate the SLA
    Service-Level Objective: What we target internally

    View full-size slide

  8. Service Indicators
    •Rate: Number of requests received
    •Errors: Number of responses written, broken down by HTTP status
    •Duration: Distribution of response latency
    @chimeracoder

    View full-size slide

  9. Every monitor involves a service-level indicator*
    @chimeracoder
    *for sufficiently broad definitions of “service”

    View full-size slide

  10. @chimeracoder
    Metrics, logs, and request traces are used to provide greater
    visibility beyond our service indicators

    View full-size slide

  11. Tool #1: Logs
    @chimeracoder

    View full-size slide

  12. Logging in Go
    •Use structured logging (e.g. logrus) instead of standard library
    @chimeracoder

    View full-size slide

  13. Logging in Go
    •Preserve contextual data – don’t just “check, log, and return”
    @chimeracoder

    View full-size slide

  14. @chimeracoder

    View full-size slide

  15. Tool #2: Metrics
    @chimeracoder

    View full-size slide

  16. Statsd protocol
    •Local service listening for metrics over UDP
    •Metric aggregation
    @chimeracoder

    View full-size slide

  17. @chimeracoder

    View full-size slide

  18. Aggregation Caveats
    •Cardinality: No aggregation by IP address (or even /24 subnets)
    •Host-local or fault tolerant: pick one!
    @chimeracoder

    View full-size slide

  19. https://veneur.org

    View full-size slide

  20. •Distributed statsd
    •Global metric aggregation (cross-server analysis)
    •Horizontally scalable
    •Fault-tolerant
    •Written in Go
    •Higher throughput
    •Tunable
    @chimeracoder

    View full-size slide

  21. Tool #3: Request Traces
    @chimeracoder

    View full-size slide

  22. @chimeracoder
    API
    API
    API
    DB

    View full-size slide

  23. @chimeracoder

    View full-size slide

  24. Tracing Your Context
    •Like profiling, but across servers
    •Take a snapshot of a request and inspect each function
    @chimeracoder

    View full-size slide

  25. Putting it all together: Logs, Metrics, and Traces
    @chimeracoder

    View full-size slide

  26. @chimeracoder

    View full-size slide

  27. Does it really have to be so complicated?
    @chimeracoder

    View full-size slide

  28. @chimeracoder
    Application
    logs
    metrics
    traces

    View full-size slide

  29. What’s the difference?
    •If you squint, it’s hard to tell them apart
    •A log is a metric with “longer” information
    •A trace is a metric that allows “inner joins”
    @chimeracoder

    View full-size slide

  30. Standard Sensor Format
    @chimeracoder

    View full-size slide

  31. @chimeracoder

    View full-size slide

  32. Define and measure your service indicator metrics
    @chimeracoder

    View full-size slide

  33. The future of distributed systems is being written in Go
    @chimeracoder
    The future of observability will be written in Go, too

    View full-size slide

  34. What does the future of observability, written in Go, look like?
    @chimeracoder

    View full-size slide

  35. Thank you!
    Aditya Mukerjee
    @chimeracoder
    https://veneur.org
    #veneur on Freenode

    View full-size slide