Upgrade to Pro — share decks privately, control downloads, hide ads and more …

TEST - How a Failed Experiment Helped Me Unders...

Avatar for ScyllaDB ScyllaDB
October 09, 2024

TEST - How a Failed Experiment Helped Me Understand the Go Runtime in More Depth

This is a test description.

Avatar for ScyllaDB

ScyllaDB

October 09, 2024
Tweet

Other Decks in Business

Transcript

  1. A ScyllaDB Community How a failed experiment helped me understand

    the Go runtime in more depth Aadhav Vignesh Software Engineer
  2. Aadhav Vignesh (he/him) Software Engineer ▪ Love everything about Go!

    ~1.5 years of experience with the language ▪ Interested in distributed systems, databases and performance ▪ I work on random experiments for curiosity and fun! ▪ Love watching sports, and playing video games
  3. Agenda ▪ The experiment ▪ Getting information out of the

    Go runtime ▪ What worked, what didn’t ▪ Ongoing work ▪ A bit about Go runtime observability ▪ Understanding the existing ecosystem (pprof, etc.) ▪ Going beyond pprof
  4. The experiment ▪ Wanted to understand how the GC works

    as a Go newbie ▪ Found an animation of the Go GC’s tri-color abstraction, but wanted to try something out with real data ▪ Goal: build a tiny real-time GC viewer for Go (for fun) ▪ Started exploring the Go runtime to get more ideas ▪ Worked on a small prototype
  5. A high-level overview of the Go GC ▪ Go employs

    a concurrent, tri-color, mark-and-sweep GC • Two actors: mutators and collectors • Mutators are responsible for allocations/mutations in the heap; collectors work on finding and freeing garbage • Two phases: mark and sweep • Mark phase identifies accessible objects and marks them as active/in-use; sweep phase finds unmarked objects and determines them as garbage
  6. A high-level overview of the Go GC (..contd) ▪ What’s

    “tri-color”? • The GC separates objects into three different colored sets: white, grey, black • Black = object has been scanned completely and its descendants have been identified by the collector • Grey = roots discovered by the collector, but descendants not identified yet • White = hasn’t discovered by the collector yet; may or may not be reachable from the roots
  7. Getting information out of the Go runtime ▪ What information

    do I want? • At a high-level: “objects” ▪ Anything which would help me in getting an object graph representation ▪ Where to get this information from? • Option 1: runtime package? • Option 2: Through existing Go tooling • Option 3: ??
  8. Option 1: The runtime package ▪ The runtime package is

    a great place to get information about the Go runtime ▪ One candidate worth checking out is MemStats, a structure which keeps track of statistics related to heap allocs, GC times, etc. ▪ MemStats is neat, but it provides memory allocator statistics as (mostly) numeric data • Could be useful in most tools, but not suitable for this experiment • Spoiler: The MemStats documentation points to the right direction, but I realized it later
  9. Option 2: Existing Go tooling ▪ viewcore is a command-line

    tool for exploring core dumps of a Go process provided under golang.org/x/debug/cmd/viewcore ▪ However, viewcore has a few issues with parsing heap dumps and panics at a few areas (hasn’t been updated for the Go 1.22 heap layout) ▪ During my experimentation, viewcore broke at a lot of places ▪ The Go team is working on fixing these issues; Proposal: #57447
  10. Option 3: Heap dumps ▪ Go provides a way to

    get a heap dump using debug.WriteHeapDump ▪ The heap dump is a sequence of records; Go writes 17 types of records (object, itab, etc.) to the heap dump ▪ A heap dump is useful for extracting “objects” ▪ Possibly the only correct option for this project; a heap dump provides a view into the program’s memory at a point in time
  11. Other options ▪ Forking Go • Tempting, but the runtime

    moves a high velocity • Lots of tweaks need to be made to extract information out neatly • I wanted to have the visualization happen on normal Go installations too
  12. Implementation ▪ Read heap dumps generated by debug.WriteHeapDump ▪ Parse

    the heap dump based on the documented format • Wrote a small library for reading Go heap dumps: https://github.com/burntcarrot/heaputil ▪ Generate an object graph by utilizing records parsed from the heap dump ▪ In-progress: Color nodes in the object graph and animate using multiple heap dumps captured at intervals
  13. Ongoing work ▪ https://github.com/golang/go/issues/57447 • Proposal for making x/debug/internal/gocore and

    x/debug/internal/core to work with Go at tip. • There has been some progress made by the Go team ▪ https://github.com/cloudwego/goref • Heap analysis tool from ByteDance, uses Delve as a medium to retrieve type info ▪ https://github.com/adamroach/heapspurs • Provides a small toolkit to work with Go heap dumps
  14. The components ▪ The Go runtime can be broken down

    into these “components”: • GMP ▪ G = goroutines ▪ M = threads ▪ P = processors • Scheduler • Garbage Collector • netpoller • Concurrency primitives (channels, etc.) • etc.
  15. What’s available right now ▪ pprof • CPU • Memory

    • Goroutine • Block • Mutex ▪ Execution tracing ▪ GC traces ▪ and much more..
  16. CPU ▪ Helps in identifying which parts consume a lot

    of CPU time. ▪ Can be enabled through: • pprof.StartCPUProfile(w) • import _ "net/http/pprof" (GET /debug/pprof/profile?seconds=60) ▪ Observable parts: All Go code (spending time on CPU) ▪ Non-observable parts: I/O
  17. Memory ▪ Helps in identifying which parts perform more heap

    allocations. ▪ Can be enabled through: • pprof.StartMemProfile(w) • import _ "net/http/pprof" (GET /debug/pprof/allocs?seconds=60) ▪ Observable parts: Reachable objects ▪ Non-observable parts: CGO allocs, Object lifetime, unreachable objects
  18. Goroutines ▪ Used for getting information about active goroutines ▪

    Can be enabled through: • pprof.Lookup("goroutine").WriteTo(w, 0) • import _ "net/http/pprof" (GET /debug/pprof/goroutine) ▪ Observable parts: All goroutines present in the internal allg struct ▪ Non-observable parts: Short-lived goroutines (~1μs)
  19. Block profile ▪ The block profiler helps in identifying time

    spent by goroutines waiting for channel operations, mutex operations, etc. ▪ Can be enabled through: • pprof.Lookup("block").WriteTo(w, 0) ▪ Need to set runtime.SetBlockProfileRate(rate) before profiling • import _ "net/http/pprof" (GET /debug/pprof/block?seconds=60) ▪ Observable parts: Synchronization primitives (channels, mutexes, etc.) ▪ Non-observable parts: in-progress blocking events, I/O, sleep, etc.
  20. Mutex profile ▪ Useful for identifying time spent on lock

    contentions ▪ Can be enabled through: • pprof.Lookup("mutex").WriteTo(w, 0) ▪ Need to set runtime.SetMutexProfileFraction(rate) before profiling • import _ "net/http/pprof" (GET /debug/pprof/mutex) ▪ Observable parts: Mutexes (sync.Mutex/sync.RWMutex), etc. ▪ Non-observable parts: Spinning (locks), sync.WaitGroup, sync.Cond, etc.
  21. But why? ▪ pprof should be able to provide a

    sufficient/clear picture of what’s happening in your application ▪ In case you find yourself stuck in a situation where pprof can’t help, then the next slides might help ▪ The last option would be to fork the Go source and add manual debug logs or print statements; but it’s unlikely that you’d reach this point
  22. Execution tracer ▪ Helpful for finding information about voids observed

    in normal profiling ▪ Captures a ton of information including: • Goroutine events (creation, blocking, etc.) • GC events (STW, mark assist, etc.) • Syscalls • Etc. ▪ Can be enabled through: trace.Start(w); defer trace.Stop() ▪ Reduced overhead (#60773) (20-30% prior to Go 1.21, post Go 1.21 overhead = 1-2%) ▪ Alternative frontend: https://github.com/dominikh/gotraceui
  23. GC traces ▪ Helpful for getting granular information GC cycles

    ▪ Slightly less overhead than the execution tracer ▪ Can be enabled through: GODEBUG=gctrace=1 ▪ You can also get information about the GC pacer: GODEBUG=gctrace=1,gcpacertrace=1
  24. Scheduler traces ▪ Helpful for getting granular data about the

    scheduler like: • Processor information (idle Ps, etc.) • Thread information (idle threads, spinning threads, etc.) • Global and local runqueue for Gs (goroutines) • etc. ▪ Slightly less overhead than the execution tracer ▪ Can be enabled through: GODEBUG=schedtrace=1000 • Emits information every 1000 milliseconds, usage: GODEBUG=schedtrace=X, where X is duration
  25. fgprof ▪ Sampling Go profiler which captures information about both

    on-CPU and off-CPU time. ▪ CPU profiling provided by pprof can’t observe off-CPU time ▪ Useful for observing off-CPU time (I/O, etc.) ▪ Limitations: • fgprof is implemented as a background goroutine, so it relies on the Go scheduler, and any scheduler delays might result in slightly inaccurate data • Scales O(N) with number of goroutines, slightly less efficient on goroutine-heavy applications
  26. runtime/metrics ▪ Source for programatically getting metrics exported by the

    Go runtime ▪ 88 metrics exposed (Go 1.23.2), 56 (excluding /godebug metrics) ▪ Open proposal to define a recommended set of metrics: #67120
  27. Future opportunities ▪ eBPF is pretty promising, and some tooling

    utilizing it could aid in more Go runtime observability ▪ gocore and core packages once made public could be really useful too.
  28. Conclusion ▪ Ending up in a rabbit hole is a

    great way to learn a lot of things! ▪ Go has a great set of tooling, and provides some really nice ways to understand more about the Go runtime ▪ The contributors and the Go team deserve a huge kudos for supporting these tools!
  29. Further reading ▪ Go docs: • runtime/pprof: https://pkg.go.dev/runtime/pprof • runtime/metrics:

    https://pkg.go.dev/runtime/metrics • runtime/trace: https://pkg.go.dev/runtime/trace • Go environment variables: https://pkg.go.dev/runtime#hdr-Environment_Variables ▪ Felix Geisendörfer’s Go Profiler Notes: https://github.com/DataDog/go-profiler-notes
  30. P99 Conf Template #1A1047 #E100EC #1BC11B #282D79 #1275E3 Font usage

    Share Tech or Roboto Color Palette Table Column 1 Column 2 Column 3 Column 5 Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 #07D0D8
  31. P99 Conf Template Color Palette Details #1A1047 r26 g16 b71

    c100 m100 y34 k45 Black #E100EC r225 g0 b236 c34 n84 y0 k0 Pantone 2385C #1BC11B r27 g193 b27 c75 m0 y100 k0 Pantone 361C #282D79 r40 g45 b121 c100 m97 y20 k7 Pantone 2756C #1275E3 r18 g117 b227 c80 m54 y0 k0 Pantone 2727C #07D0D8 r7 g208 b216 c65 m0 y21 k0 Pantone 319C
  32. P99 Conf Template <here is some code> <styling> <use consolas

    for font when displaying code> <don’t go below 12pt font size>
  33. Slide title with white background Lorem ipsum dolor sit amet,

    consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ▪ Bullet 1 • Bullet 2 ▪ Bullet 3
  34. Slide title with white background Lorem ipsum dolor sit amet,

    consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ▪ Bullet 1 • Bullet 2 ▪ Bullet 3
  35. Slide title with white background Lorem ipsum dolor sit amet,

    consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ▪ Bullet 1 • Bullet 2 ▪ Bullet 3
  36. Slide title with white background Lorem ipsum dolor sit amet,

    consectetur adipiscing elit. Integer auctor eros eu faucibus sodales. Nunc dictum, urna id blandit pretium, mauris velit pulvinar ligula, interdum blandit sem tortor eget dolor. ▪ Bullet 1 • Bullet 2 ▪ Bullet 3