Slide 1

Slide 1 text

Copyright 2014 Kodewerk Ltd. All rights reserved The (not so) Dark Art of Performance Tuning From Newts to Newton Kirk Peperdine @kcpeppe Aleksey Shiplev @shipilev

Slide 2

Slide 2 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved About Me • Consultant (www.kodewerk.com) • performance tuning and training seminar • Co-author www.javaperformancetuning.com • Member of Java Champion program • Other stuff... (google is you care to)

Slide 3

Slide 3 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Disclaimer The resemblance of any opinion, recommendation or comment made during this presentation to performance tuning advice is merely coincidental.

Slide 4

Slide 4 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Our Typical Customer Application isn’t performing to project sponsors expectations Development team has been tuning for weeks some improvements but... Different team experts come to the table with different opinions finger pointing

Slide 5

Slide 5 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Where is the problem?

Slide 6

Slide 6 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Commonly Heard I see StringBuffer is being used all over, lets change it to StringBuilder I think our DBMS is the problem, we need to migrate to [buzzword] Not sure where the problem is but we’ve been changing code to make things better

Slide 7

Slide 7 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Better Action Understand your performance requirements typically response time or throughput Your goal is to improve this metric

Slide 8

Slide 8 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved What if the code you’ve changed is not used at all? it accounts for just a few microseconds of time Maybe a good idea is changes are small, isolated and painless to make

Slide 9

Slide 9 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved What if CPU utilization is 6.25% method precomutes something reused later? “I can see that method bar() is accounts for 5% of time, let’s remoe it”

Slide 10

Slide 10 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved What if you just depleted disk bandwidth IT had reshaped the network connection database just needs a cleanup? “I think our database is the problem! lets migrate to [buzzword]”

Slide 11

Slide 11 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Perf tuning 101 The space can be humongous you can’t traverse it all assume something is or isn’t part of the problem Prefer hypothesis free investigation be methodical step wise process to arrive at a conclusion

Slide 12

Slide 12 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Measure Don’t Guess®

Slide 13

Slide 13 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Typical Dev Teams Developments teams are often very skilled but don’t understand performance testing Performance data often taken out of context mis-understood completely missed

Slide 14

Slide 14 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved As developers we tend to be too focused on the code!!!!

Slide 15

Slide 15 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved public class Software { public static void main( String[] args) { System.out.println(“Software is abstract”); } }

Slide 16

Slide 16 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Hardware is Real!

Slide 17

Slide 17 text

Copyright 2014 Kodewerk Ltd. All rights reserved Physical Limits CPU capacity throughput : clock speed granularity: cache line Memory capacity : volume Bus throughput : clock speed width : 32 bits

Slide 18

Slide 18 text

Copyright 2014 Kodewerk Ltd. All rights reserved Limits of Hardware disk ~1 Bbit/sec (SATA ~3Gbits/sec) granularity: disk sector (512 bytes) Network frame buffer/packet frequency (clock)

Slide 19

Slide 19 text

Copyright 2014 Kodewerk Ltd. All rights reserved Other Limits other hardware devices video/sound heat battery time

Slide 20

Slide 20 text

Copyright 2014 Kodewerk Ltd. All rights reserved public class Software { public static void main( String[] args) { System.out.println(“Software is abstract”); } }

Slide 21

Slide 21 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved But we already have measurements!

Slide 22

Slide 22 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Yeah but…. ! do you have a context to understand the measurements?

Slide 23

Slide 23 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Are they the right measurements

Slide 24

Slide 24 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Are you blind to the real problem?

Slide 25

Slide 25 text

Copyright 2014 Kodewerk Ltd. All rights reserved Application Developers Live Here business logic, non-shareable soft resources

Slide 26

Slide 26 text

Copyright 2014 Kodewerk Ltd. All rights reserved Question? Which is faster? a) Bubble sort b) Quick sort In Big O notation... - Bubble sort is N^2 - Quick sort of Nlog(N)

Slide 27

Slide 27 text

Copyright 2014 Kodewerk Ltd. All rights reserved

Slide 28

Slide 28 text

Copyright 2014 Kodewerk Ltd. All rights reserved However! bubble quick

Slide 29

Slide 29 text

Copyright 2014 Kodewerk Ltd. All rights reserved Application Performance is rooted in dynamics dynamics algorithmic strength

Slide 30

Slide 30 text

Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application Application Runs Here CPU, memory, disk I/O network I/O, Locks manage memory, execution dynamics algorithmic strength

Slide 31

Slide 31 text

Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application Process Diagnostic Model CPU, memory, disk I/O network I/O, Locks manage memory, execution Actors usage patterns algorithmic strength

Slide 32

Slide 32 text

Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application Actors Mixing in the dynamics CPU, memory, disk I/O network I/O. Locks manage memory, execution usage patterns Actors drives application Application drives JVM JVM drives OS/Hardware Hardware consumed function of actors function of application cannot exceed capacity algorithmic strength

Slide 33

Slide 33 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Hardware counters show how the hardware is being consumed subscription rates to non-shareable components tells us what our application is doing Hardware consumtion is a fuction of load Hardware Patterns

Slide 34

Slide 34 text

Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application Actors Dominating Consumer CPU, memory, disk I/O network I/O. Locks manage memory, execution usage patterns Activity that dominates how the CPU is utilized Determine by analyizing breakout of CPU counters garbage collection logs algorithmic strength

Slide 35

Slide 35 text

Copyright 2014 Kodewerk Ltd. All rights reserved Dominating Consumer Choices OS/Hardware Application Actors CPU, memory, disk I/O network I/O, Locks manage memory, execution algorithmic strength usage patterns Application JVM Liveliness System JVM

Slide 36

Slide 36 text

Copyright 2014 Kodewerk Ltd. All rights reserved Dominating Consumer Conditions sys cpu > 10% of user cpu user CPU ~= 100% memory efficient? GC Logs Application JVM Liveliness System system profiling: netstat, mpstat, iostat, sar, strace, gc logs, etc… Thread starvation Thread dump app/CPU profiling GC tuning, pool sizes, collectors, ... Memory profiling, size frequency, life span,... yes yes yes no no no

Slide 37

Slide 37 text

Copyright 2014 Kodewerk Ltd. All rights reserved Expression of Consumption Application JVM Liveliness System passively dominant aggressivly dominant

Slide 38

Slide 38 text

Copyright 2014 Kodewerk Ltd. All rights reserved Measuing Consumption Application JVM Liveliness System Kernel time Idle User time

Slide 39

Slide 39 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved r b swpd free buff cache si so bi bo in cs us sy id wa ! 3 9 100 24496 11096 13267036 0 0 0 5 0 1 2 1 96 1! 3 2 100 23420 11088 13268328 0 0 0 0 77330 175352 17 26 39 17! 3 9 100 20836 11088 13270628 0 0 0 68 105118 227382 14 40 21 25! 8 4 100 23356 11080 13268272 0 0 0 0 80062 164387 12 30 29 30! 7 7 100 23180 11084 13267068 0 0 0 72 98353 234851 15 43 28 15! 11 2 100 25820 11088 13263676 0 0 0 120 100749 214921 11 42 17 30! 13 1 100 22316 11088 13267176 0 0 0 0 103878 246723 16 56 19 9 ! 4 3 100 21824 11088 13269140 0 0 0 0 48625 97288 15 16 9 60! 11 2 100 20932 11080 13269808 0 0 0 0 110760 236774 14 41 24 20! 1 12 100 23624 11084 13267488 0 0 0 204 69117 148611 15 27 25 33! 7 5 100 24996 11096 13267476 0 0 0 164 24495 48552 13 10 30 48! 1 12 100 20792 11096 13271872 0 0 0 0 25659 54331 8 9 26 56! 6 8 100 21984 11080 13269920 0 0 0 20 46309 101404 16 18 51 15! 4 9 100 22764 11080 13268956 0 0 16 0 88553 229557 17 35 38 11 Dominating Consumer???

Slide 40

Slide 40 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Performance benchmarking

Slide 41

Slide 41 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved

Slide 42

Slide 42 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Relevant: reproduces the phenomena Isolated: leavesout unwanted effects Measurable: provides the needed metrics Reliable: produces consisten results Experimental Setup You can’t go anywhere without a proper test environment

Slide 43

Slide 43 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Define: Unit of Test the thing you are testing Test Harness everything else!!!! Experimental Setup

Slide 44

Slide 44 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Hardware production like configuration hidden bottlenecks phantom bottlenecks Software test harness Load injector Relevant

Slide 45

Slide 45 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Software test harness load injector Data-Lite performance testing anti-pattern production like in volume and veracity beware of the effects of caching it is all over the place and it works!!!!! Relevant

Slide 46

Slide 46 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Quiet All activity counts against your hardware budget affects performance counters (global) introduces noise which will corrupt the diagnosis phantom bottlenecks Isolated

Slide 47

Slide 47 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Usage Patterns defines workload use case + tx rate + velocity Performance requirements Validation test the test make sure bottleneck isn’t in the test harness Measurable and Reliable

Slide 48

Slide 48 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Script usage patterns into a load test Install/configure app to the same as prod Setup monitoring OS performance counters and GC logging Lill everything else running on your system Spike test to ensure correctness validate to ensure test tests what needs to be tested Setup

Slide 49

Slide 49 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Benchmark benchmark = new Benchmark() benchmark.configure(); performance = benchmark.baseline(application); user.setHappy(performance.meets(requirements)); while (( ! user.isHappy()) && (user.hasMoney())) { Profiler profiler = performance.identifyDominatingConsumer(); profilingResults = benchmark.profile(profiler); application.fixUsing( profilingResults); while ( application.failsQA()) application.debug(); performance = benchmark.baseline(application); user.setHappy(performance.meets(requirements)); } Process

Slide 50

Slide 50 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Time for a demo

Slide 51

Slide 51 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Throughput (Bandwidth) rate at which operations are retired ops/sec, MB/sec, frags/sec easy to measure and interpret Time (latency) how much time on operation took generally hard to measure reliably Metrics

Slide 52

Slide 52 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved What is in a response time execution dead time response time = ∑ execution + ∑ dead time

Slide 53

Slide 53 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Many thing affect dead time and consequently execution time these can change from request to request Metrics

Slide 54

Slide 54 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Bandwidth vs Latency

Slide 55

Slide 55 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Reliability

Slide 56

Slide 56 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Any errors in the test environment will affect results throwing exceptions running out of memory other failures that short-circuit Coordinated omission will back pressure from the server interfere with the test harness’s ability to apply load? Ensure Test is Reliable

Slide 57

Slide 57 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Generational Counts Quiz how to identify a memory leak

Slide 58

Slide 58 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Generational Counts Answer

Slide 59

Slide 59 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Speeding up the Application

Slide 60

Slide 60 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Change something in some specific way! response to some localized condition How to Speed up the application

Slide 61

Slide 61 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Change something in some specific way! How to Speed up the application What prevents the application from performing? ! ! Where does it reside? ! ! How can we stop it from messing with performance?

Slide 62

Slide 62 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Identify transaction not meeting functional requirement pin point area of concern Identify dominating consumer determine the nature of the problem System -> Application -> Microarchitecture Top-Down

Slide 63

Slide 63 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Entry point is CPU utilization use tools to expose performance counter System Dominant/Liveliness

Slide 64

Slide 64 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Consider network, scheduling, swapping or other kernel activities System Dominant

Slide 65

Slide 65 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved In some cases hardware/OS configuration is needed May need to change the application Other contributors

Slide 66

Slide 66 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Unbalanced threading lots of voluntary context switches (thrashing) Lots of involuntary context switches (saturation) Scheduling

Slide 67

Slide 67 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Swapping is a performance killer for Java avoid swapping at all costs kill other processes to save memory Swapping

Slide 68

Slide 68 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Sometimes the kernel is your enemy Unusual API chooices from the JVM and/or application (un)known bugs Kernel

Slide 69

Slide 69 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Usual things when interacting with devices IRQ balancing sometimes this is expensive IRQ%, SOFT%

Slide 70

Slide 70 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Watch for Disk activity, throughput and IOPS IOWait%

Slide 71

Slide 71 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Is all that disk activity necessary? caching, buffering are your friends faster disk can solve throughput/IOPS problems IOWait%

Slide 72

Slide 72 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved More caching helps? free up memory for caches trade performance for consistency IOWait%

Slide 73

Slide 73 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved CPU are free and no one is using them easy to diagnose but commonly missed Idle%

Slide 74

Slide 74 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Running low-threaded application on multicore improve paralyzation of algorithms throw away CPUs Idle%

Slide 75

Slide 75 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved There are not enough threads ready to run Locking? Waiting for something else? Socket? Idle%

Slide 76

Slide 76 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Surprising case application is highly threaded misconfigured Garbage collector Idle%

Slide 77

Slide 77 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Application/JVM

Slide 78

Slide 78 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Application/JVM is finally busy This is where most people start and profilers start to be actually useful user%

Slide 79

Slide 79 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Memory The gem and the curse of von-Neumann architectures cache misses dominate in most applications user%

Slide 80

Slide 80 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved TLB very important for memory-bound workloads “Invisible” artifact of virtual memory system user%

Slide 81

Slide 81 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved CPU caches: capacity Important to hide memory latency (and bandwidth) issues Virtually all applications today are memory/ cache-bounded user%

Slide 82

Slide 82 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved CPU Caches: coherence Inter-CPU communication is managed via cache coherence Understanding this is the road to master the communication user%

Slide 83

Slide 83 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Memory Bandwidth once caches run out, you face the memory dominates the cache miss performance faster memory, multiple channels help Application level (Bandwidth)

Slide 84

Slide 84 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Primitive of Queuing theory L = λτ Implications if you know the arrival rate and service time you know the length of the queue for a given L, throughput is inversely proportional to service time Little’s Law

Slide 85

Slide 85 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Plain unshared memory Plain shared memory provide communication Volatile All above, plus visibility Atomics All above, plus atomicity Atomic sections All above, plus group atomicity Spin-locks All above, plus mutual exclusion Wait-locks All above, plus blocking Coherency Primitives power speed

Slide 86

Slide 86 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved It is possible at times to make an optimistic check Fallback to pessimistic version on failure The optimistic check has less power, but more performant Coherency: Optimistic Checks AtomicBoolean isSet = ...; if ( !isSet.get() && isSet.compareAndSet(false, true) { // oneshot action }

Slide 87

Slide 87 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Identify It is possible at times to make an optimistic check Fallback to pessimistic version on failure The optimistic check has less power, but more performant Coherency: Optimistic Checks ReentrantLock lock = ...; int count = LIMIT; while (!lock.tryLock()) { if (count++ > 0) { lock.lock(); break; } }

Slide 88

Slide 88 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved It is possible at times to split the shared state Much less contention on modifying the local state The total state is the superposition of local states Example: thread-safe counter synchronized { i++; } AtomicInteger.inc(); ThreadLocal.set(ThreadLocal.get() + 1); AtomicInteger[random.nextInt(count)].inc(); Coherency: Striping

Slide 89

Slide 89 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved If you can remove the communication, do that Immutability to enforce Thread local states Example: ThreadLocalRandom @ JDK7 Random: use CAS to maintain the state ThreadLocalRandom: essentially, ThreadLocal Can use plain memory ops to maintain the state Coherence: No-coherence

Slide 90

Slide 90 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Identify Communication quanta = cache line 32 – 128 bytes long Helps with bulk memory transfers, cache architecture Coherence protocols working on cache line ! False Sharing CPUs updating the adjacent fields? Cache line ping-pong! Coherence: False Sharing ...][AABB][...

Slide 91

Slide 91 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved JVM is the new abstraction level Interacts with the application, mangles into application JVM performance affects application performance JVM Level

Slide 92

Slide 92 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Most usual contender in JVM layer Lots of things to try fixing (not covered here, see elsewhere) JVM Level GC

Slide 93

Slide 93 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Very cool to have your code compiled Sometimes it's even cooler to get the code compiled better JVM Level JIT

Slide 94

Slide 94 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Important for startup metrics; not really relevant for others Removing obstacles is the road to awe JVM Level (Classloading)

Slide 95

Slide 95 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved What about the Customer? Need to gather clear requirements Develop a sound benchmarking/testing environment get better measurements Identify dominating consumer refocus team on problems that matter not always easy

Slide 96

Slide 96 text

Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All rights reserved Questions