Slide 1

Slide 1 text

Taming performance issues into the wild: a practical guide to JVM profiling

Slide 2

Slide 2 text

2 Agenda ➢ (Java) Profiling Introduction ➢ Flame-graphs and table (tree) views with examples ○ Mixed-Mode Flame Graphs ➢ Challenges of Java Sampling Profiling ○ Safepoints biasing ○ Observer effect ○ Native events (garbage collection, JNI, operating system calls, JIT) ○ Skid native perf events ○ Method invocations count ➢ Tools setup ➢ Introduction to application to be profiled ○ Quarkus and its threading model ➢ Load generation tooling ➢ Profiling and investigation sessions

Slide 3

Slide 3 text

3

Slide 4

Slide 4 text

4 Resource consumption

Slide 5

Slide 5 text

5 Resource consumption

Slide 6

Slide 6 text

Y NOT?

Slide 7

Slide 7 text

7 Y NOT: LAZINESS?

Slide 8

Slide 8 text

8 Y NOT: BETTER NOT?

Slide 9

Slide 9 text

9 Y NOT: NOT MY THING?

Slide 10

Slide 10 text

10 Y NOT: TOO SMART?

Slide 11

Slide 11 text

11 Y YES? IT’S FUN TO KNOW

Slide 12

Slide 12 text

No content

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

No content

Slide 15

Slide 15 text

15 Our code Start here

Slide 16

Slide 16 text

16 CPU Flamegraph By Example ● x-axis alphabetical stack sort ie NOT A TIME SERIES! ● Top edge shows who is running on-CPU ● Top down shows ancestry eg b() is called by a() ● Width proportional to samples presence eg c() has ~ twice the samples of d()

Slide 17

Slide 17 text

17 Mixed-Mode Flame Graphs ● Colors ○ green - Java ○ aqua - Java Inlined(!!!!) ○ orange - Kernel code ○ red - C libraries (eg JNI) ○ yellow - C++ (eg JVM) ● Color intensity is randomized to differentiate frames ● Thread ID is added as a base frame

Slide 18

Slide 18 text

INLINING is... 18 THE OPTIMIZATION.

Slide 19

Slide 19 text

Inlining is a way to optimize compiled source code at runtime by replacing the invocations of the most often executed methods with its bodies. It's the responsibility of the Just-In-Time (JIT) compiler which tries to inline the methods that we call more often so that we can avoid the overhead of a method invocation.

Slide 20

Slide 20 text

20 Native frames...on a JVM?! ● JNI code ○ I/O wrapper operations, user code calling native libs, etc ● Intrinsics ○ System::arrayCopy, Arrays::equals, ...vmSymbols.hpp ● SIMD opportunity ○ Arrays::fill ● JVM C++ code ○ JIT Compiler, GC, etc ● OS/Kernel C code ○ I/O OS/Kernel calls, page faults, interrupt handlers, etc

Slide 21

Slide 21 text

Profiling Challenges

Slide 22

Slide 22 text

Safewhat? 22 safepoint A point during program execution at which all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. Both compiled Java code and C/C++ code be optimized between safepoints, but less so across safepoints. The JIT compiler emits a GC map at each safepoint. C/C++ code in the VM uses stylized macro-based conventions (e.g., TRAPS) to mark potential safepoints. - HotSpot Glossary of Terms -

Slide 23

Slide 23 text

23 ● GC ● Deoptimization ● PrintThreads ● PrintJNI ● FindDeadlock ● ThreadDump Safepoint operations ie that requires a safepoint ● EnableBiasLocking ● RevokeBias ● HeapDumper ● GetAllStackTraces ● GetStackTrace ● [-XX:GuaranteedSafepointInterval=1000] ● ...

Slide 24

Slide 24 text

24 Observer effects -Xlog:safepoint make easier to spot how much Safepoint Biased profilers could affect a profiled program

Slide 25

Slide 25 text

Source

Slide 26

Slide 26 text

async-profiler: the hybrid 26

Slide 27

Slide 27 text

Why choose it? ● AsyncGetCallTrace (no safepoint bias, but can collect “corrupted” java frames) Linux Timer + Signal Handler (ITIMER_PROF/SIG_PROF) ● Native using perf_events ● out-of-the-box Flame-Graphs support ● Open-Source ● very low Observer effect ● Java 6+ ● can profile Java Monitor/ReentrantLock, allocations*, Cache Misses... 27

Slide 28

Slide 28 text

28 Open source ROCKS!

Slide 29

Slide 29 text

29

Slide 30

Slide 30 text

30

Slide 31

Slide 31 text

Hardware Event Skid Event skid is the recording of an event not exactly on the code line that caused the event. It may even result in a caller function event being recorded in the callee function. Event skid is caused by a number of factors: ● The delay in propagating the event out of the processor's microcode through the interrupt controller (APIC) and back into the processor. ● The current instruction retirement cycle must be completed. ● When the interrupt is received, the processor must serialize its instruction stream which causes a flushing of the execution pipeline.

Slide 32

Slide 32 text

32 Sampling vs. Wall-clock profiling

Slide 33

Slide 33 text

No content

Slide 34

Slide 34 text

No content

Slide 35

Slide 35 text

Quarkus threading model

Slide 36

Slide 36 text

Let’s get our hands dirty!!!

Slide 37

Slide 37 text

https://github.com/franz1981/quarkus-profiling-workshop Demo project + Tools