@rakyll
tracing?
What is tracing and why do we trace?
Slide 4
Slide 4 text
@rakyll
Slide 5
Slide 5 text
@rakyll
clogged?
Slide 6
Slide 6 text
@rakyll
leaking?
Slide 7
Slide 7 text
@rakyll
path and direction?
Slide 8
Slide 8 text
@rakyll
100% availability
(is a lie)
Slide 9
Slide 9 text
“
@rakyll
A service is available if users
cannot tell there was an outage.
Slide 10
Slide 10 text
@rakyll
Without an SLO, your team has no principled way of saying what level of
downtime is acceptable.
● Error rate
● Latency or throughput expectations
Service Level Objectives (SLOs)
Slide 11
Slide 11 text
@rakyll
28 ms
100 ms
172 ms
56 ms
356 ms
what user sees
what else we can see
sec.Check
auth.AccessToken
cache.Lookup
spanner.Query
GET /messages
Slide 12
Slide 12 text
@rakyll
182 ms
56 ms
245 ms
what user sees
what else we can see
sec.Check
auth.AccessToken
GET /messages
7 ms
cache.Lookup
Slide 13
Slide 13 text
@rakyll
latency...
Slide 14
Slide 14 text
@rakyll
Go is the language to write servers.
Many runtime activities occur during the program execution:
● scheduling
● memory allocation
● garbage collection
Hard to associate a request with its impact on the runtime.
Slide 15
Slide 15 text
@rakyll
clogged?
Slide 16
Slide 16 text
“
@rakyll
There is no easy way to tell why
latency is high for certain
requests. Is it due to GC,
scheduler or syscalls? Can you
review the code and tell us why?
-SRE
Slide 17
Slide 17 text
@rakyll
Execution tracer
$ go tool trace
● Reports fine-grained runtime events in the lifetime of a goroutine.
● Reports utilization of CPU cores.
But cannot easily tell how handling a request impacts the runtime.
Slide 18
Slide 18 text
@rakyll
28 ms
100 ms
172 ms
56 ms
356 ms
GET /messages
auth.AccessToken
cache.Lookup
spanner.Query
GET /messages
Slide 19
Slide 19 text
@rakyll
5
68µs 8
123µs
networking
serialization +
deserialization
garbage
collection
blocking
syscall
what actually happens
172 ms
auth.AccessToken
@rakyll
How?
● Mark sections in code using runtime/trace.
● Enable execution tracer temporarily and record data.
● Examine the recorded data.
Slide 22
Slide 22 text
@rakyll
Go 1.11 introduces...
● User regions, tasks and annotations.
● Association between user code and runtime.
● Association with distributed traces.
@rakyll
region #1
task #1
Go 1.11 runtime/trace
region #2 region #3
region #4
region #5
goroutine #1
goroutine #4
goroutine #5
Slide 25
Slide 25 text
@rakyll
import _ "net/http/pprof"
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
Slide 26
Slide 26 text
@rakyll
$ curl http://server:6060/debug/pprof/trace?seconds=5 -o
trace.out
$ go tool trace trace.out
2018/05/04 10:39:59 Parsing trace...
2018/05/04 10:39:59 Splitting trace...
2018/05/04 10:39:59 Opening browser. Trace viewer is listening on
http://127.0.0.1:51803
Slide 27
Slide 27 text
Execution tracer tasks for RPCs (/usertasks)
Slide 28
Slide 28 text
Execution tracer tasks for RPCs (/usertasks)
Slide 29
Slide 29 text
RPCs overlapping with garbage collection
Slide 30
Slide 30 text
Execution tracer regions (/userregions)
Slide 31
Slide 31 text
Region summary for conn.ready
Slide 32
Slide 32 text
@rakyll
Record in production
$ curl http://server/debug/pprof/trace?seconds=5 -o trace.out
$ go tool trace trace.out
Slide 33
Slide 33 text
@rakyll
Try It!
Install the Go 1.11 beta1!
golang.org/dl
Slide 34
Slide 34 text
@rakyll
$ go get go.opencensus.io/trace
import rt “runtime/trace”
ctx, span := trace.StartSpan(ctx, “/messages”)
defer span.End()
rt.WithRegion(ctx, “foo”, func(ctx) {
// Do something...
})
Slide 35
Slide 35 text
@rakyll
Limitations
● Execution tracer cannot do accounting for cross-goroutine
operations automatically.
● Exposition format is hard to parse if `go trace tool` is not used.