Slide 1

Slide 1 text

1 Exploring Go Runtime Metrics Chin-Ming Huang & Mohit Pokharna

Slide 2

Slide 2 text

2 Software engineer at Mercari since 2020, primarily focusing on ML-assisted Listing and Search backend. Chin-Ming Huang Software engineer at Mercari since 2019, primarily focusing on ML price and Search backend.
 Mohit Pokharna

Slide 3

Slide 3 text

3 What’s Runtime? Agenda The runtime/metrics Module 02 01

Slide 4

Slide 4 text

4 What’s Runtime?

Slide 5

Slide 5 text

5 Runtime ● Go Runtime is the library to execute Go programs. ● Runtime contains goroutine scheduler, memory allocator, garbage collector, etc.

Slide 6

Slide 6 text

6 Runtime Your program Go runtime OS syscall T T T T executable

Slide 7

Slide 7 text

7 A Glimpse - Get nCPU before Scheduling src/runtime/asm_arm64.s

Slide 8

Slide 8 text

8 src/runtime/os_darwin.go

Slide 9

Slide 9 text

9 src/runtime/os_darwin.go

Slide 10

Slide 10 text

10 /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/sysctl.h /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/sys/syscall.h

Slide 11

Slide 11 text

11 The runtime/metrics Module

Slide 12

Slide 12 text

12 Proposal: API for unstable runtime metrics (#37112, design doc) Before runtime/metrics, runtime metrics are exposed in two ways: 1. by struct-based sampling APIs, e.g. runtime.ReadMemStats and runtime/debug.GCStats. 2. via GODEBUG flags which emit strings containing metrics to standard error (e.g. gctrace, gcpacertrace, scavtrace).

Slide 13

Slide 13 text

13 MemStats is hard to evolve because it must obey the Go 1 compatibility rules. The existing metrics are confusing, but we can‘t change them. Some of the metrics are now meaningless (like EnableGC and DebugGC), and several have aged poorly (like hard-coding the number of size classes at 61, or only having a single pause duration per GC cycle). Hence, we tend to shy away from adding anything to this because we’ll have to maintain it for the rest of time. by struct-based sampling APIs

Slide 14

Slide 14 text

14 The gctrace format is unspecified, which means we can evolve it (and have completely changed it several times). But it‘s a pain to collect programmatically because it only comes out on stderr and, even if you can capture that, you have to parse a text format that changes. Hence, automated metric collection systems ignore gctrace. via GODEBUG flags

Slide 15

Slide 15 text

15 ● Sampling-based API is taken in opposition to a stream-based or event-based API. ● Consider “sets of metrics” as the unit of API instead of individual metrics. ● Metric name be built from two components: a forward-slash-separated path to a metric where each component is lowercase words separated by hyphens, and its unit. E.g. /memory/classes/heap/free:bytes New Design

Slide 16

Slide 16 text

16 Example Take one metric of heap for example:

Slide 17

Slide 17 text

17

Slide 18

Slide 18 text

18 Take a Rest..

Slide 19

Slide 19 text

19 Trace Read Method src/runtime /metrics.go src/runtime/metrics/sample.go

Slide 20

Slide 20 text

20 src/runtime /metrics.go

Slide 21

Slide 21 text

21 statAggregate

Slide 22

Slide 22 text

22 metricData src/runtime /metrics.go

Slide 23

Slide 23 text

23 agg.ensure(...) src/runtime /metrics.go

Slide 24

Slide 24 text

24 ● runtime/metrics provides an unified interface to access different metrics, including heap, sys, cpu and gc stats. ● Underneath, each metric is corrected from different part of runtime, so 2 steps are performed for metrics reading: ensure & compute. Remarks

Slide 25

Slide 25 text

25 Supported Metrics Metric Prefix Count Metrics /cgo 1 go-to-c-calls /cpu 11 gc, idle, scavenge, total, user /gc 22 cycles, scan, heap, gomemlimit, stack /godebug 23 GODEBUG /memory 14 heap, mcache, mspan, os-stack /sched 7 gomaxprocs, goroutines, latencies, pauses, mutex /sync 1 mutex wait.

Slide 26

Slide 26 text

26 Metrics: /cpu Metric Use Case /cpu/classes/total:cpu-seconds /cpu/classes/user:cpu-seconds /cpu/classes/idle:cpu-seconds Measure CPU utilization for Go code and/or Go runtime. /cpu/classes/gc/… Monitor CPU for garbage collection. /cpu/classes/scavenge/… Monitor CPU for returning unused memory to the underlying platform ** IMPORTANT: These metrics are overestimate, and not directly comparable to system CPU time measurements. Compare only with other /cpu/classes metrics.

Slide 27

Slide 27 text

27 Metrics: /gc Metric Use Case /gc/cycles/… Counts for GC cycles. /gc/heap/… Heap allocations /gc/scan/… The amount of space that GC is scannable. /gc/limiter/last-enabled:gc-cycle GC cycle the last time the GC CPU limiter was enabled. (OOM) /gc/gogc:percent Heap size target percentage. Set by: GOGC, runtime/debug.SetGCPercent /gc/gomemlimit:bytes Go runtime memory limit. Set by: GOMEMLIMIT, runtime/debug.SetMemoryLimit /gc/stack/starting-size:bytes The stack size of new goroutines. /gc/pauses:seconds Deprecated.

Slide 28

Slide 28 text

28 Metrics: /memory Metric Use Case /memory/classes/total:bytes All memory mapped by the Go runtime into the current process. /memory/classes/heap/… Details of memory managements, including objects, free, released, unused, stack, etc. (example) /memory/classes/metadata/… Memory that is used for runtime mcache, mspan, etc. /memory/classes/os-stacks:bytes Stack memory allocated by OS. (unstable) /memory/classes/profiling/buckets:bytes Memory that is used by the stack trace for profiling. /memory/classes/other:bytes Memory used for trace, debugging, profiling, etc.

Slide 29

Slide 29 text

29 Metrics: /sched Metric Use Case /sched/latencies:seconds Distribution of the time goroutines have spent in the scheduler in a runnable state before actually running. (example) /sched/goroutines:goroutines Count of live goroutines. /sched/gomaxprocs:thread The current runtime.GOMAXPROCS setting, or the number of operating system threads. /sched/pauses/… GC-related or non-GC-related stop-the-world stopping latencies.

Slide 30

Slide 30 text

30 ● /sync/mutex/wait/total:seconds Approximate cumulative time goroutines have spent blocked on a sync.Mutex, sync.RWMutex, or runtime-internal lock. ● /godebug/... count non-default behaviors due to GODEBUG setting. ● /cgo/go-to-c-calls:calls Count of calls made from Go to C by the current process. Metrics: Others

Slide 31

Slide 31 text

31 Q&A