[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope

Intro to Continuous Profiling and Grafana Pyroscope Steve Caron Staff
Solutions Engineer, Grafana Labs

Once upon a time... M L T

M was relying on Metrics

L was relying on Logs

T was relying on Traces

Image credit: Oliver The Mighty Pig, Penguin Publishing Group ISBN:0803728867
Mighty P was using Profiling ...and spewing out flame graphs flame graph

Error logs pinpoint user issue Traces Metrics Logs Unexpected cpu
spike Profiles Anomalous span reveals error cluster Code level root cause Profiling completes the story of why something went wrong and how to fix it

What is Profiling? “Profiling” is a way to analyze how
a program uses resources like CPU or memory at code-level granularity. It makes use of flamegraphs to help you pinpoint the parts of your application that use the most resources. Commonly used during application development, built into popular IDEs. Challenges: • The overhead of conventional profiles don’t allow for profiling in production • On-demand profiling is a reactive approach • Development environments don’t accurately mimic production.

What changed? Profiling technology has advanced. The overhead of today’s
profiling technologies allows for it to run in production, with minimal overhead. This allows for “Continuous profiling” which is a more powerful version of profiling which profiles applications periodically, adding the dimension of time. By understanding your system’s resource usage over time, you can then locate, debug, and fix issues related to performance.

Cost cutting Getting a line-level breakdown of where resource hotspots
are allows you to optimize them The value of Continuous Proﬁling Latency reduction Incident resolution For many businesses performance impact revenue - e-commerce, ads - gaming, streaming - HFT, fintech - rideshare Pinpoint memory leaks to specific parts of the code See root cause of CPU spikes See code level details when debugging services

How to gather a profile? • Instrumenting the code base
◦ Tooling and formats depending on each language ecosystem ◦ Access to more detailed runtime information ◦ More flexibility: ▪ selectively profile and label specific sections of code ▪ send profiles at different intervals (further read: eBPF pros/cons) • eBPF based collection ◦ No insights into stacktrace runtime information for interpreted languages (better fit for compiled languages) ◦ Focus on CPU profiling ◦ Live profiling: doesn’t require code change or even restarts ◦ Kernel dependencies (v4.9 or more recent) and requires root access

How to gather a profile? Let’s take a look at
Go • Standard library includes CPU, Memory, Goroutine, Mutex and Block resources • Provides profiles using a HTTP interface ◦ Profiling data is returned using protobuf definition • Data meant to be consumed by the pprof CLI ◦ # Get a CPU profile over the last 2 seconds $ pprof "http://localhost:6060/debug/pprof/profile?seconds=2" # Get the heap memory allocations $ pprof "http://localhost:6060/debug/pprof/allocs" ◦ Common to use the -http parameter to view profiles using the web interface • Find more on Profiling in Go on https://pkg.go.dev/runtime/pprof#Profile

Instrumentation of Go code package main import ( "log" "net/http"
_ "net/http/pprof" "time" ) func main() { go func() { log.Println(http.ListenAndServe("localhost:6060", nil)) }() // spend 3 cpu cycles doALot() doLittle() } [...]

What is measured in a profile? package main func main()
{ // work doALot() doLittle() } func prepare() { // work } func doALot() { prepare() // work } func doLittle() { prepare() // work }

What is measured in a profile? Time on CPU Each
measurement gets recorded on a stack-trace level package main func main() { // spend 3 cpu cycles doALot() doLittle() } func prepare() { // spend 5 cpu cycles } func doALot() { prepare() // spend 20 cpu cycles } func doLittle() { prepare() // spend 5 cpu cycles } main() 3 main() > doALot() > prepare() 5 main() > doALot() 20 main() > doLittle() > prepare() 5 main() > doLittle() 5

Visualization of Profiles (try it yourself: flamegraph.com) Flamegraph • Whole
width represent the total resources used (over the whole measurement duration) • Ability to spot higher usage nodes • Colours are grouped based on package package main func main() { // spend 3 cpu cycles doALot() doLittle() } func prepare(x) { // spend 5 cpu cycles } func doALot(65) { prepare(65) // spend 20 cpu cycles } func doLittle(26) { prepare(26) // spend 5 cpu cycles }

What does “continuous profiling” look like? Resource usage over time
Query Flamegraph & table

What does “continuous profiling” look like?

Grafana Pyroscope

2023: Pyroscope joined Grafana Labs +

How Pyroscope works?

Pyroscope architecture

What is our product today Open Source Project ~10,000 combined
GitHub ⭐ Commercial Managed Offering An open source continuous profiling platform Grafana Cloud Profiles available in Grafana Cloud (available with free tier) • Fully managed Grafana and observability solution

“Rideshare Company” demo app

Pyroscope resources: client documentation • Client documentation - how to
send profiles to Grafana

More resources examples in grafana/pyroscope #pyroscope on https://grafana.slack.com/ 📖 https://grafana.com/docs/pyroscope/latest/
https://play.grafana.org/a/grafana-pyroscope-app

Thank you! Questions?

[ CNCF Q1 2024 ] Intro to Continuous Profiling ...

[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope

cncf-canada-meetups

More Decks by cncf-canada-meetups

Other Decks in Technology

Featured

Transcript

Intro to Continuous Profiling and Grafana Pyroscope Steve Caron Staff

Once upon a time... M L T

M was relying on Metrics

L was relying on Logs

T was relying on Traces

Image credit: Oliver The Mighty Pig, Penguin Publishing Group ISBN:0803728867

Error logs pinpoint user issue Traces Metrics Logs Unexpected cpu

What is Profiling? “Profiling” is a way to analyze how

What changed? Profiling technology has advanced. The overhead of today’s

Cost cutting Getting a line-level breakdown of where resource hotspots

How to gather a profile? • Instrumenting the code base

How to gather a profile? Let’s take a look at

Instrumentation of Go code package main import ( "log" "net/http"

What is measured in a profile? package main func main()

What is measured in a profile? Time on CPU Each

Visualization of Profiles (try it yourself: flamegraph.com) Flamegraph • Whole

What does “continuous profiling” look like? Resource usage over time

What does “continuous profiling” look like?

What does “continuous profiling” look like?

What does “continuous profiling” look like?

What does “continuous profiling” look like?

What does “continuous profiling” look like?

Grafana Pyroscope

2023: Pyroscope joined Grafana Labs +

How Pyroscope works?

Pyroscope architecture

What is our product today Open Source Project ~10,000 combined

“Rideshare Company” demo app

Demo

Pyroscope resources: client documentation • Client documentation - how to

More resources examples in grafana/pyroscope #pyroscope on https://grafana.slack.com/ 📖 https://grafana.com/docs/pyroscope/latest/

Thank you! Questions?