Slide 1

Slide 1 text

Building Resilient Services in Go Aditya Mukerjee Observability Engineer at Stripe GoDays Berlin

Slide 2

Slide 2 text

Observability measures how well internal states of a system can be inferred from knowledge of its external outputs @chimeracoder

Slide 3

Slide 3 text

Go is used to build…. •Distributed systems •Reliable software •“The Cloud™” @chimeracoder

Slide 4

Slide 4 text

1. What should I monitor? 2. How do I monitor those things in Go? 3. What does the future of Go observability look like? @chimeracoder

Slide 5

Slide 5 text

Let’s Create an API •Return a list of all Twitter followers •Record a copy to the database •Distributed! @chimeracoder API API API DB

Slide 6

Slide 6 text

Service-Level Agreement: What we promise our clients @chimeracoder Service-Level Indicators: Data used to evaluate the SLA

Slide 7

Slide 7 text

Service-Level Agreement: What we promise our clients @chimeracoder Service-Level Indicators: Data used to evaluate the SLA Service-Level Objective: What we target internally

Slide 8

Slide 8 text

Service Indicators •Rate: Number of requests received •Errors: Number of responses written, broken down by HTTP status •Duration: Distribution of response latency @chimeracoder

Slide 9

Slide 9 text

Every monitor involves a service-level indicator* @chimeracoder *for sufficiently broad definitions of “service”

Slide 10

Slide 10 text

@chimeracoder Metrics, logs, and request traces are used to provide greater visibility beyond our service indicators

Slide 11

Slide 11 text

Tool #1: Logs @chimeracoder

Slide 12

Slide 12 text

Logging in Go •Use structured logging (e.g. logrus) instead of standard library @chimeracoder

Slide 13

Slide 13 text

Logging in Go •Preserve contextual data – don’t just “check, log, and return” @chimeracoder

Slide 14

Slide 14 text

@chimeracoder

Slide 15

Slide 15 text

Tool #2: Metrics @chimeracoder

Slide 16

Slide 16 text

Statsd protocol •Local service listening for metrics over UDP •Metric aggregation @chimeracoder

Slide 17

Slide 17 text

@chimeracoder

Slide 18

Slide 18 text

Aggregation Caveats •Cardinality: No aggregation by IP address (or even /24 subnets) •Host-local or fault tolerant: pick one! @chimeracoder

Slide 19

Slide 19 text

https://veneur.org

Slide 20

Slide 20 text

•Distributed statsd •Global metric aggregation (cross-server analysis) •Horizontally scalable •Fault-tolerant •Written in Go •Higher throughput •Tunable @chimeracoder

Slide 21

Slide 21 text

Tool #3: Request Traces @chimeracoder

Slide 22

Slide 22 text

@chimeracoder API API API DB

Slide 23

Slide 23 text

@chimeracoder

Slide 24

Slide 24 text

Tracing Your Context •Like profiling, but across servers •Take a snapshot of a request and inspect each function @chimeracoder

Slide 25

Slide 25 text

Putting it all together: Logs, Metrics, and Traces @chimeracoder

Slide 26

Slide 26 text

@chimeracoder

Slide 27

Slide 27 text

Does it really have to be so complicated? @chimeracoder

Slide 28

Slide 28 text

@chimeracoder Application logs metrics traces

Slide 29

Slide 29 text

What’s the difference? •If you squint, it’s hard to tell them apart •A log is a metric with “longer” information •A trace is a metric that allows “inner joins” @chimeracoder

Slide 30

Slide 30 text

Standard Sensor Format @chimeracoder

Slide 31

Slide 31 text

@chimeracoder

Slide 32

Slide 32 text

Define and measure your service indicator metrics @chimeracoder

Slide 33

Slide 33 text

The future of distributed systems is being written in Go @chimeracoder The future of observability will be written in Go, too

Slide 34

Slide 34 text

What does the future of observability, written in Go, look like? @chimeracoder

Slide 35

Slide 35 text

Thank you! Aditya Mukerjee @chimeracoder https://veneur.org #veneur on Freenode