Taming performance issues into the wild: a practical guide to JVM profiling

Taming performance issues into the wild: a practical guide to
JVM profiling

2 Agenda ➢ (Java) Profiling Introduction ➢ Flame-graphs and table
(tree) views with examples ◦ Mixed-Mode Flame Graphs ➢ Challenges of Java Sampling Profiling ◦ Safepoints biasing ◦ Observer effect ◦ Native events (garbage collection, JNI, operating system calls, JIT) ◦ Skid native perf events ◦ Method invocations count ➢ Tools setup ➢ Introduction to application to be profiled ◦ Quarkus and its threading model ➢ Load generation tooling ➢ Profiling and investigation sessions

4 Resource consumption

5 Resource consumption

Y NOT?

7 Y NOT: LAZINESS?

8 Y NOT: BETTER NOT?

9 Y NOT: NOT MY THING?

10 Y NOT: TOO SMART?

11 Y YES? IT’S FUN TO KNOW

15 Our code Start here

16 CPU Flamegraph By Example • x-axis alphabetical stack sort
ie NOT A TIME SERIES! • Top edge shows who is running on-CPU • Top down shows ancestry eg b() is called by a() • Width proportional to samples presence eg c() has ~ twice the samples of d()

17 Mixed-Mode Flame Graphs • Colors ◦ green - Java
◦ aqua - Java Inlined(!!!!) ◦ orange - Kernel code ◦ red - C libraries (eg JNI) ◦ yellow - C++ (eg JVM) • Color intensity is randomized to differentiate frames • Thread ID is added as a base frame

INLINING is... 18 THE OPTIMIZATION.

Inlining is a way to optimize compiled source code at
runtime by replacing the invocations of the most often executed methods with its bodies. It's the responsibility of the Just-In-Time (JIT) compiler which tries to inline the methods that we call more often so that we can avoid the overhead of a method invocation.

20 Native frames...on a JVM?! • JNI code ◦ I/O
wrapper operations, user code calling native libs, etc • Intrinsics ◦ System::arrayCopy, Arrays::equals, ...vmSymbols.hpp • SIMD opportunity ◦ Arrays::fill • JVM C++ code ◦ JIT Compiler, GC, etc • OS/Kernel C code ◦ I/O OS/Kernel calls, page faults, interrupt handlers, etc

Profiling Challenges

Safewhat? 22 safepoint A point during program execution at which
all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. Both compiled Java code and C/C++ code be optimized between safepoints, but less so across safepoints. The JIT compiler emits a GC map at each safepoint. C/C++ code in the VM uses stylized macro-based conventions (e.g., TRAPS) to mark potential safepoints. - HotSpot Glossary of Terms -

23 • GC • Deoptimization • PrintThreads • PrintJNI •
FindDeadlock • ThreadDump Safepoint operations ie that requires a safepoint • EnableBiasLocking • RevokeBias • HeapDumper • GetAllStackTraces • GetStackTrace • [-XX:GuaranteedSafepointInterval=1000] • ...

24 Observer effects -Xlog:safepoint make easier to spot how much
Safepoint Biased profilers could affect a profiled program

Source

async-profiler: the hybrid 26

Why choose it? • AsyncGetCallTrace (no safepoint bias, but can
collect “corrupted” java frames) Linux Timer + Signal Handler (ITIMER_PROF/SIG_PROF) • Native using perf_events • out-of-the-box Flame-Graphs support • Open-Source • very low Observer effect • Java 6+ • can profile Java Monitor/ReentrantLock, allocations*, Cache Misses... 27

28 Open source ROCKS!

Hardware Event Skid Event skid is the recording of an
event not exactly on the code line that caused the event. It may even result in a caller function event being recorded in the callee function. Event skid is caused by a number of factors: • The delay in propagating the event out of the processor's microcode through the interrupt controller (APIC) and back into the processor. • The current instruction retirement cycle must be completed. • When the interrupt is received, the processor must serialize its instruction stream which causes a flushing of the execution pipeline.

32 Sampling vs. Wall-clock profiling

Quarkus threading model

Let’s get our hands dirty!!!

https://github.com/franz1981/quarkus-profiling-workshop Demo project + Tools

Taming performance issues into the wild: a prac...

Taming performance issues into the wild: a practical guide to JVM profiling

Mario Fusco

More Decks by Mario Fusco

Other Decks in Programming

Featured

Transcript