Building Resilient Services in Go
Aditya Mukerjee
Observability Engineer at Stripe
GoDays
Berlin
Slide 2
Slide 2 text
Observability measures how well internal states of a system
can be inferred from knowledge of its external outputs
@chimeracoder
Slide 3
Slide 3 text
Go is used to build….
•Distributed systems
•Reliable software
•“The Cloud™”
@chimeracoder
Slide 4
Slide 4 text
1. What should I monitor?
2. How do I monitor those things in Go?
3. What does the future of Go observability look like?
@chimeracoder
Slide 5
Slide 5 text
Let’s Create an API
•Return a list of all Twitter followers
•Record a copy to the database
•Distributed!
@chimeracoder
API
API
API
DB
Slide 6
Slide 6 text
Service-Level Agreement: What we promise our clients
@chimeracoder
Service-Level Indicators: Data used to evaluate the SLA
Slide 7
Slide 7 text
Service-Level Agreement: What we promise our clients
@chimeracoder
Service-Level Indicators: Data used to evaluate the SLA
Service-Level Objective: What we target internally
Slide 8
Slide 8 text
Service Indicators
•Rate: Number of requests received
•Errors: Number of responses written, broken down by HTTP status
•Duration: Distribution of response latency
@chimeracoder
Slide 9
Slide 9 text
Every monitor involves a service-level indicator*
@chimeracoder
*for sufficiently broad definitions of “service”
Slide 10
Slide 10 text
@chimeracoder
Metrics, logs, and request traces are used to provide greater
visibility beyond our service indicators
Slide 11
Slide 11 text
Tool #1: Logs
@chimeracoder
Slide 12
Slide 12 text
Logging in Go
•Use structured logging (e.g. logrus) instead of standard library
@chimeracoder
Slide 13
Slide 13 text
Logging in Go
•Preserve contextual data – don’t just “check, log, and return”
@chimeracoder
Slide 14
Slide 14 text
@chimeracoder
Slide 15
Slide 15 text
Tool #2: Metrics
@chimeracoder
Slide 16
Slide 16 text
Statsd protocol
•Local service listening for metrics over UDP
•Metric aggregation
@chimeracoder
Slide 17
Slide 17 text
@chimeracoder
Slide 18
Slide 18 text
Aggregation Caveats
•Cardinality: No aggregation by IP address (or even /24 subnets)
•Host-local or fault tolerant: pick one!
@chimeracoder
Slide 19
Slide 19 text
https://veneur.org
Slide 20
Slide 20 text
•Distributed statsd
•Global metric aggregation (cross-server analysis)
•Horizontally scalable
•Fault-tolerant
•Written in Go
•Higher throughput
•Tunable
@chimeracoder
Slide 21
Slide 21 text
Tool #3: Request Traces
@chimeracoder
Slide 22
Slide 22 text
@chimeracoder
API
API
API
DB
Slide 23
Slide 23 text
@chimeracoder
Slide 24
Slide 24 text
Tracing Your Context
•Like profiling, but across servers
•Take a snapshot of a request and inspect each function
@chimeracoder
Slide 25
Slide 25 text
Putting it all together: Logs, Metrics, and Traces
@chimeracoder
Slide 26
Slide 26 text
@chimeracoder
Slide 27
Slide 27 text
Does it really have to be so complicated?
@chimeracoder
Slide 28
Slide 28 text
@chimeracoder
Application
logs
metrics
traces
Slide 29
Slide 29 text
What’s the difference?
•If you squint, it’s hard to tell them apart
•A log is a metric with “longer” information
•A trace is a metric that allows “inner joins”
@chimeracoder
Slide 30
Slide 30 text
Standard Sensor Format
@chimeracoder
Slide 31
Slide 31 text
@chimeracoder
Slide 32
Slide 32 text
Define and measure your service indicator metrics
@chimeracoder
Slide 33
Slide 33 text
The future of distributed systems is being written in Go
@chimeracoder
The future of observability will be written in Go, too
Slide 34
Slide 34 text
What does the future of observability, written in Go, look like?
@chimeracoder
Slide 35
Slide 35 text
Thank you!
Aditya Mukerjee
@chimeracoder
https://veneur.org
#veneur on Freenode