Voyeurs in the JVM land

Slide 1

Slide 1 text

Slide 2

Slide 2 text

about me Jarek Pałka Allegro.tech, doing stu , back to coding, hell yeah!!! JDD, 4Developers and one more conference (still under development) where I serve as a dictator for life JVM, bytecode, parsers, graphs and other cool things (like ponies) owner at Symentis trainings, former chief architect, development manager, head of development, backend developer and performance guy

Slide 3

Slide 3 text

You are all invited!

Slide 4

Slide 4 text

agenda JDK with batteries included JVM logging and tracking Linux tools for curious other tools for weirdos

Slide 5

Slide 5 text

JDK with batteries included jps jmap jstack jstat jcmd

Slide 6

Slide 6 text

how it works JVM stores metrics in memory mapped les /tmp/hsperfdata_[username]/[pid]

Slide 7

Slide 7 text

test lsof +d /tmp/hsperfdata_jarek

Slide 8

Slide 8 text

jps lists all running JVM processes

Slide 9

Slide 9 text

jstack dumps stacks of all JVM threads (in a selected process) jstack -l [pid] # to include locks info

Slide 10

Slide 10 text

jmap prints heap information, histogram or dump heap content to a le

Slide 11

Slide 11 text

jmap -heap [pid] # to print heap usage jmap -histo [pid] # to print histogram jmap -dump: le=jvm.dump # to dump heap

Slide 12

Slide 12 text

jstat samples running JVM for selected metrics jstat -gc [pid] 1000

Slide 13

Slide 13 text

jcmd

Slide 14

Slide 14 text

one tool to rule them all, one stop shop for all commands available in JVM

Slide 15

Slide 15 text

let’s play with it jcmd [pid] help

Slide 16

Slide 16 text

JVM logging and tracking

Slide 17

Slide 17 text

JVM has tons of diagnostic options

Slide 18

Slide 18 text

garbage collection

Slide 19

Slide 19 text

jstat -gc [pid] [interval] or

Slide 20

Slide 20 text

-Xloggc:gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=10M

Slide 21

Slide 21 text

safepoint

Slide 22

Slide 22 text

what? — Nitsan Wakart Imagine if you will a JVM full of mutator threads, all busy, sweating, mutating the heap. Some of them have shared mutable state. They’re mutating each others state, concurrently, like animals. Some stand in corners mutating their own state (go blind they will). Suddenly a neon sign ashes the word PINEAPPLES. One by one the mutators stop their rampant heap romping and wait, sweat dripping. When the last mutator stops, a bunch of elves come in, empty the ashtrays, ll up all the drinks, mop up the puddles, and quickly as they can they vanish back to the north pole. The sign is turned o and the threads go back to it

Slide 23

Slide 23 text

— Nitsan Wakart At a safepoint the mutator thread is at a known and well de ned point in it’s interaction with the heap. This means that all the references on the stack are mapped (at known locations) and the JVM can account for all of them. As long as the thread remains at a safepoint we can safely manipulate the heap + stack such that the thread’s view of the world remains consistent when it leaves the safepoint.

Slide 24

Slide 24 text

-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 Debugging JVM safepoint pauses

Slide 25

Slide 25 text

just in time compilation

Slide 26

Slide 26 text

-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining -XX:+UnlockDiagnosticVMOptions -XX:+TraceClassLoading -XX:+LogCompilation -XX:LogFile=mylog le.log -XX:+PrintAssembly

Slide 27

Slide 27 text

TLAB

Slide 28

Slide 28 text

what? — Ross K A Thread Local Allocation Bu er (TLAB) is a region of Eden that is used for allocation by a single thread. It enables a thread to do object allocation using thread local top and limit pointers, which is faster than doing an atomic operation on a top pointer that is shared across threads. A thread acquires a TLAB at it’s rst object allocation after a GC scavenge. The size of the TLAB is computed via a somewhat complex process discribed below. The TLAB is released when it is full (or nearly so), or the next GC scavenge occurs. TLABs are allocated only in Eden, never from From-Space or the OldGen.

Slide 29

Slide 29 text

should I care? you want as much of allocations to happen in TLABs, period

Slide 30

Slide 30 text

-XX:+PrintTLAB The Real Thing

Slide 31

Slide 31 text

native memory tracking

Slide 32

Slide 32 text

Stackover ow Java process taking more memory than its max heap size

Slide 33

Slide 33 text

java -XX:NativeMemoryTracking=[o |summary|detail] jcmd [pid] VM.native_memory summary

Slide 34

Slide 34 text

a weapon of mass destruction

Slide 35

Slide 35 text

or pair made in heaven

Slide 36

Slide 36 text

FlightRecorder — Oracle Help Center Java Flight Recorder (JFR) is a tool for collecting diagnostic and pro ling data about a running Java application. It is integrated into the Java Virtual Machine (JVM) and causes almost no performance overhead, so it can be used even in heavily loaded production environments.

Slide 37

Slide 37 text

java -XX:+UnlockCommercialFeatures -XX:+FlightRecorder - XX:StartFlightRecording=duration=60s, lename=myrecording.jfr

Slide 38

Slide 38 text

warning as of now, you can’t use it to analyze production systems

Slide 39

Slide 39 text

until JDK 10 comes out, this is o cial statement now

Slide 40

Slide 40 text

java -XX:+UnlockCommercialFeatures -XX:+FlightRecorder jcmd [pid] JFR.start name=recording jcmd [pid] JFR.start name=recording lename=recording.jfr

Slide 41

Slide 41 text

Java Mission Control

Slide 42

Slide 42 text

Linux tools for curious sysstat sysdig perf

Slide 43

Slide 43 text

sysstat pidstat -t -d -p [pid] 1 # IO usage per thread pidstat -t -w -p [pid] 1 # task switching per thread pidstat -r -p [pid] 1 # page faults per process

Slide 44

Slide 44 text

warning forget about strace, ptrace syscall is not what you want :)

Slide 45

Slide 45 text

tracing syscalls

Slide 46

Slide 46 text

sysdig sysdig prod.pid=[pid] -w [pid].scap # record events csysdig -r [pid].scap # analyze

Slide 47

Slide 47 text

perf perf record -p [pid] -o [pid].perf # record events perf report -i [pid].perf # analyze

Slide 48

Slide 48 text

tools for weirdos honest pro ler amegraphs

Slide 49

Slide 49 text

honest pro ler it uses uno cial JVM API call AsyncGetCallTrace as opposed to other pro lers which use JVMTI (JVM tool interface)

Slide 50

Slide 50 text

here goes long boring discussion about complexity of OpenJDK global safepoint mechanism

Slide 51

Slide 51 text

— Honest pro ler wiki It accurately pro les applications, avoiding an inherent bias towards places that have safepoints. It pro les applications with signi cantly lower overhead than traditional pro ling techniques, making it suitable for use in production.

Slide 52

Slide 52 text

The Pros and Cons of AGCT

Slide 53

Slide 53 text

java -agentpath:../honest-pro ler/liblagent.so=logPath=honest.logs Main

Slide 54

Slide 54 text

tools I didn’t mention GCviewer JITWatch PrintAssembly Solaris Studio Censum Memory Analyzer Tool

Slide 55

Slide 55 text

Q&A

Slide 56

Slide 56 text

links JVM Anatomy Park Nitsan’s blog Chris Newland blog, JITwatch author Marcus Hirt blog, all stu JMC System calls in the Linux kernel sysdig perf: Linux pro ling with performance counters

Slide 57

Slide 57 text

Java Microbenchmark Harness

Slide 58

Slide 58 text

— Wes Dyer Make it correct, make it clear, make it concise, make it fast. In that order.

Slide 59

Slide 59 text

— JMH wiki JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.

Slide 60

Slide 60 text

mvn archetype:generate \ -DinteractiveMode=false \ -DarchetypeGroupId=org.openjdk.jmh \ -DarchetypeArtifactId=jmh-java-benchmark-archetype \ -DgroupId=org.sample \ -DartifactId=test \ -Dversion=1.0 http://openjdk.java.net/projects/code-tools/jmh/

Slide 61

Slide 61 text

benchmarks these are public non-static methods annotated with @Benchmark import org.openjdk.jmh.annotations.Benchmark; public class CodeBenchmark { @Benchmark public void testMethod(){ } }

Slide 62

Slide 62 text

managing state & life cycle more complex examples will need to work with some data (state), this is what for state objects are for

Slide 63

Slide 63 text

@State(Scope.Benchmark) public class CodeBenchmarkState{ public final ArrayList list = new ArrayList<>(); }

Slide 64

Slide 64 text

public class CodeBenchmark{ @Benchmark public void testMethod(CodeBenchmarkState state){ state.add(0); } }

Slide 65

Slide 65 text

note on scopes

Slide 66

Slide 66 text

Scope.Benchmark With benchmark scope, all instances of the same type will be shared across all worker threads

Slide 67

Slide 67 text

Scope.Group With group scope, all instances of the same type will be shared across all threads within the same group. Each thread group will be supplied with its own state object

Slide 68

Slide 68 text

Scope.Thread With thread scope, all instances of the same type are distinct, even if multiple state objects are injected in the same benchmark

Slide 69

Slide 69 text

lifecycle every state object can have @Setup and @TearDown xture methods

Slide 70

Slide 70 text

time for rst benchmark let’s compare iteration speed over primitive array, ArrayList and LinkedList

Slide 71

Slide 71 text

running benchmarks mvn package java -jar target/benchmark.jar

Slide 72

Slide 72 text

forks, warm ups and iterations by default JMH forks JVM for each run of benchmark, within each fork you have two phases warm up iteration number of repetitions of each phase can be controlled over command line

Slide 73

Slide 73 text

command line -f - number of forks -wi - number of warm ups -i - number of iterations

Slide 74

Slide 74 text

java -jar target/benchmark.jar -f 1 -i 5 -wi 5

Slide 75

Slide 75 text

parameterized tests JMH supports parameterized tests through @Param annotation Test parameters should be public non- nal elds on state objects they are injected right before call to setup xture methods

Slide 76

Slide 76 text

@State(Scope.Benchmark) public class CodeBenchmark { @Param{"0.1","0.2","0.5","0.75","1.0"} public float loadFactor; private Map map; @Setup public void setUp(){ map = new HashMap<>(16,loadFactor); } }

Slide 77

Slide 77 text

controlling parameters you overwrite values of the parameters with command line options java -jar target/benchmarks.jar -p loadFactor=0.8,0.9

Slide 78

Slide 78 text

dead code

Slide 79

Slide 79 text

… and black holes

Slide 80

Slide 80 text

one of the dangers JMH tries to mitigate is dead code optimization from JIT, to avoid it, consume return values from functions with black holes @Benchmark public void testMethod(Blackhole blackhole){ blackhole.consume(codeBenchmark()); }

Slide 81

Slide 81 text

asymmetric tests

Slide 82

Slide 82 text

sometimes you want to benchmark your concurrent code, like performance of read and write paths this is where @Group and @GroupThreads come in

Slide 83

Slide 83 text

@State(Scope.Benchmark) public class CodeBenchmark { @Benchmark @Group("benchmarkGroup") @GroupThreads(1) public void testWrites() { } @Benchmark @Group("bechmarkGroup") @GroupThreads(1) public void testReads(Blackhole blackhole) { } }

Slide 84

Slide 84 text

time for third benchmark compare performance of various thread-safe counter implementations public class Counter { private long counter; public void inc() { ++counter; } public long counter() { return counter; } }

Slide 85

Slide 85 text

pro lers they can provide some insights into your code java -jar benchmark.jar -lprof java -jar benchmark.jar -prof hs_gc

Slide 86

Slide 86 text

reporters and last but not least, writing test results to les java -jar benchmark.jar -lr java -jar benchmark.jar -rf csv -rff results.csv

Slide 87

Slide 87 text

tips and tricks on laptops governors can trick you, it’s easy to control them on linux with cpufreq-set