Upgrade to Pro — share decks privately, control downloads, hide ads and more …

[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope

[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope

cncf-canada-meetups

April 17, 2024
Tweet

More Decks by cncf-canada-meetups

Other Decks in Technology

Transcript

  1. Image credit: Oliver The Mighty Pig, Penguin Publishing Group ISBN:0803728867

    Mighty P was using Profiling ...and spewing out flame graphs flame graph
  2. Error logs pinpoint user issue Traces Metrics Logs Unexpected cpu

    spike Profiles Anomalous span reveals error cluster Code level root cause Profiling completes the story of why something went wrong and how to fix it
  3. What is Profiling? “Profiling” is a way to analyze how

    a program uses resources like CPU or memory at code-level granularity. It makes use of flamegraphs to help you pinpoint the parts of your application that use the most resources. Commonly used during application development, built into popular IDEs. Challenges: • The overhead of conventional profiles don’t allow for profiling in production • On-demand profiling is a reactive approach • Development environments don’t accurately mimic production.
  4. What changed? Profiling technology has advanced. The overhead of today’s

    profiling technologies allows for it to run in production, with minimal overhead. This allows for “Continuous profiling” which is a more powerful version of profiling which profiles applications periodically, adding the dimension of time. By understanding your system’s resource usage over time, you can then locate, debug, and fix issues related to performance.
  5. Cost cutting Getting a line-level breakdown of where resource hotspots

    are allows you to optimize them The value of Continuous Profiling Latency reduction Incident resolution For many businesses performance impact revenue - e-commerce, ads - gaming, streaming - HFT, fintech - rideshare Pinpoint memory leaks to specific parts of the code See root cause of CPU spikes See code level details when debugging services
  6. How to gather a profile? • Instrumenting the code base

    ◦ Tooling and formats depending on each language ecosystem ◦ Access to more detailed runtime information ◦ More flexibility: ▪ selectively profile and label specific sections of code ▪ send profiles at different intervals (further read: eBPF pros/cons) • eBPF based collection ◦ No insights into stacktrace runtime information for interpreted languages (better fit for compiled languages) ◦ Focus on CPU profiling ◦ Live profiling: doesn’t require code change or even restarts ◦ Kernel dependencies (v4.9 or more recent) and requires root access
  7. How to gather a profile? Let’s take a look at

    Go • Standard library includes CPU, Memory, Goroutine, Mutex and Block resources • Provides profiles using a HTTP interface ◦ Profiling data is returned using protobuf definition • Data meant to be consumed by the pprof CLI ◦ # Get a CPU profile over the last 2 seconds $ pprof "http://localhost:6060/debug/pprof/profile?seconds=2" # Get the heap memory allocations $ pprof "http://localhost:6060/debug/pprof/allocs" ◦ Common to use the -http parameter to view profiles using the web interface • Find more on Profiling in Go on https://pkg.go.dev/runtime/pprof#Profile
  8. Instrumentation of Go code package main import ( "log" "net/http"

    _ "net/http/pprof" "time" ) func main() { go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }() // spend 3 cpu cycles doALot() doLittle() } [...]
  9. What is measured in a profile? package main func main()

    { // work doALot() doLittle() } func prepare() { // work } func doALot() { prepare() // work } func doLittle() { prepare() // work }
  10. What is measured in a profile? Time on CPU Each

    measurement gets recorded on a stack-trace level package main func main() { // spend 3 cpu cycles doALot() doLittle() } func prepare() { // spend 5 cpu cycles } func doALot() { prepare() // spend 20 cpu cycles } func doLittle() { prepare() // spend 5 cpu cycles } main() 3 main() > doALot() > prepare() 5 main() > doALot() 20 main() > doLittle() > prepare() 5 main() > doLittle() 5
  11. Visualization of Profiles (try it yourself: flamegraph.com) Flamegraph • Whole

    width represent the total resources used (over the whole measurement duration) • Ability to spot higher usage nodes • Colours are grouped based on package package main func main() { // spend 3 cpu cycles doALot() doLittle() } func prepare(x) { // spend 5 cpu cycles } func doALot(65) { prepare(65) // spend 20 cpu cycles } func doLittle(26) { prepare(26) // spend 5 cpu cycles }
  12. What is our product today Open Source Project ~10,000 combined

    GitHub ⭐ Commercial Managed Offering An open source continuous profiling platform Grafana Cloud Profiles available in Grafana Cloud (available with free tier) • Fully managed Grafana and observability solution