Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The (not so) dark art of performance tuning (from Newts to Newton)

The (not so) dark art of performance tuning (from Newts to Newton)

The talk describes Java Performance Diagnostic Model, a proven performance tuning methodology developed by Kirk Pepperdine and is currently being used by many that have attended his performance tuning seminar. The methodology works to categorize and characterize all of the different types of underlying problems that are responsible for poor performance in Java applications. Doing so accelerates the diagnostic process allowing practitioners to identify root cause often in a fraction of the time.

Kirk Pepperdine

June 11, 2014
Tweet

More Decks by Kirk Pepperdine

Other Decks in Programming

Transcript

  1. Copyright 2014 Kodewerk Ltd. All rights reserved The (not so)

    Dark Art of Performance Tuning From Newts to Newton Kirk Peperdine @kcpeppe Aleksey Shiplev @shipilev
  2. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved About Me • Consultant (www.kodewerk.com) • performance tuning and training seminar • Co-author www.javaperformancetuning.com • Member of Java Champion program • Other stuff... (google is you care to)
  3. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Disclaimer The resemblance of any opinion, recommendation or comment made during this presentation to performance tuning advice is merely coincidental.
  4. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Our Typical Customer Application isn’t performing to project sponsors expectations Development team has been tuning for weeks some improvements but... Different team experts come to the table with different opinions finger pointing
  5. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Commonly Heard I see StringBuffer is being used all over, lets change it to StringBuilder I think our DBMS is the problem, we need to migrate to [buzzword] Not sure where the problem is but we’ve been changing code to make things better
  6. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Better Action Understand your performance requirements typically response time or throughput Your goal is to improve this metric
  7. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved What if the code you’ve changed is not used at all? it accounts for just a few microseconds of time Maybe a good idea is changes are small, isolated and painless to make
  8. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved What if CPU utilization is 6.25% method precomutes something reused later? “I can see that method bar() is accounts for 5% of time, let’s remoe it”
  9. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved What if you just depleted disk bandwidth IT had reshaped the network connection database just needs a cleanup? “I think our database is the problem! lets migrate to [buzzword]”
  10. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Perf tuning 101 The space can be humongous you can’t traverse it all assume something is or isn’t part of the problem Prefer hypothesis free investigation be methodical step wise process to arrive at a conclusion
  11. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Typical Dev Teams Developments teams are often very skilled but don’t understand performance testing Performance data often taken out of context mis-understood completely missed
  12. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved As developers we tend to be too focused on the code!!!!
  13. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved public class Software { public static void main( String[] args) { System.out.println(“Software is abstract”); } }
  14. Copyright 2014 Kodewerk Ltd. All rights reserved Physical Limits CPU

    capacity throughput : clock speed granularity: cache line Memory capacity : volume Bus throughput : clock speed width : 32 bits
  15. Copyright 2014 Kodewerk Ltd. All rights reserved Limits of Hardware

    disk ~1 Bbit/sec (SATA ~3Gbits/sec) granularity: disk sector (512 bytes) Network frame buffer/packet frequency (clock)
  16. Copyright 2014 Kodewerk Ltd. All rights reserved Other Limits other

    hardware devices video/sound heat battery time
  17. Copyright 2014 Kodewerk Ltd. All rights reserved public class Software

    { public static void main( String[] args) { System.out.println(“Software is abstract”); } }
  18. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved But we already have measurements!
  19. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Yeah but…. ! do you have a context to understand the measurements?
  20. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Are they the right measurements
  21. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Are you blind to the real problem?
  22. Copyright 2014 Kodewerk Ltd. All rights reserved Application Developers Live

    Here business logic, non-shareable soft resources
  23. Copyright 2014 Kodewerk Ltd. All rights reserved Question? Which is

    faster? a) Bubble sort b) Quick sort In Big O notation... - Bubble sort is N^2 - Quick sort of Nlog(N)
  24. Copyright 2014 Kodewerk Ltd. All rights reserved Application Performance is

    rooted in dynamics dynamics algorithmic strength
  25. Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application

    Application Runs Here CPU, memory, disk I/O network I/O, Locks manage memory, execution dynamics algorithmic strength
  26. Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application

    Process Diagnostic Model CPU, memory, disk I/O network I/O, Locks manage memory, execution Actors usage patterns algorithmic strength
  27. Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application

    Actors Mixing in the dynamics CPU, memory, disk I/O network I/O. Locks manage memory, execution usage patterns Actors drives application Application drives JVM JVM drives OS/Hardware Hardware consumed function of actors function of application cannot exceed capacity algorithmic strength
  28. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Hardware counters show how the hardware is being consumed subscription rates to non-shareable components tells us what our application is doing Hardware consumtion is a fuction of load Hardware Patterns
  29. Copyright 2014 Kodewerk Ltd. All rights reserved OS/Hardware JVM Application

    Actors Dominating Consumer CPU, memory, disk I/O network I/O. Locks manage memory, execution usage patterns Activity that dominates how the CPU is utilized Determine by analyizing breakout of CPU counters garbage collection logs algorithmic strength
  30. Copyright 2014 Kodewerk Ltd. All rights reserved Dominating Consumer Choices

    OS/Hardware Application Actors CPU, memory, disk I/O network I/O, Locks manage memory, execution algorithmic strength usage patterns Application JVM Liveliness System JVM
  31. Copyright 2014 Kodewerk Ltd. All rights reserved Dominating Consumer Conditions

    sys cpu > 10% of user cpu user CPU ~= 100% memory efficient? GC Logs Application JVM Liveliness System system profiling: netstat, mpstat, iostat, sar, strace, gc logs, etc… Thread starvation Thread dump app/CPU profiling GC tuning, pool sizes, collectors, ... Memory profiling, size frequency, life span,... yes yes yes no no no
  32. Copyright 2014 Kodewerk Ltd. All rights reserved Expression of Consumption

    Application JVM Liveliness System passively dominant aggressivly dominant
  33. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved r b swpd free buff cache si so bi bo in cs us sy id wa ! 3 9 100 24496 11096 13267036 0 0 0 5 0 1 2 1 96 1! 3 2 100 23420 11088 13268328 0 0 0 0 77330 175352 17 26 39 17! 3 9 100 20836 11088 13270628 0 0 0 68 105118 227382 14 40 21 25! 8 4 100 23356 11080 13268272 0 0 0 0 80062 164387 12 30 29 30! 7 7 100 23180 11084 13267068 0 0 0 72 98353 234851 15 43 28 15! 11 2 100 25820 11088 13263676 0 0 0 120 100749 214921 11 42 17 30! 13 1 100 22316 11088 13267176 0 0 0 0 103878 246723 16 56 19 9 ! 4 3 100 21824 11088 13269140 0 0 0 0 48625 97288 15 16 9 60! 11 2 100 20932 11080 13269808 0 0 0 0 110760 236774 14 41 24 20! 1 12 100 23624 11084 13267488 0 0 0 204 69117 148611 15 27 25 33! 7 5 100 24996 11096 13267476 0 0 0 164 24495 48552 13 10 30 48! 1 12 100 20792 11096 13271872 0 0 0 0 25659 54331 8 9 26 56! 6 8 100 21984 11080 13269920 0 0 0 20 46309 101404 16 18 51 15! 4 9 100 22764 11080 13268956 0 0 16 0 88553 229557 17 35 38 11 Dominating Consumer???
  34. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Relevant: reproduces the phenomena Isolated: leavesout unwanted effects Measurable: provides the needed metrics Reliable: produces consisten results Experimental Setup You can’t go anywhere without a proper test environment
  35. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Define: Unit of Test the thing you are testing Test Harness everything else!!!! Experimental Setup
  36. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Hardware production like configuration hidden bottlenecks phantom bottlenecks Software test harness Load injector Relevant
  37. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Software test harness load injector Data-Lite performance testing anti-pattern production like in volume and veracity beware of the effects of caching it is all over the place and it works!!!!! Relevant
  38. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Quiet All activity counts against your hardware budget affects performance counters (global) introduces noise which will corrupt the diagnosis phantom bottlenecks Isolated
  39. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Usage Patterns defines workload use case + tx rate + velocity Performance requirements Validation test the test make sure bottleneck isn’t in the test harness Measurable and Reliable
  40. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Script usage patterns into a load test Install/configure app to the same as prod Setup monitoring OS performance counters and GC logging Lill everything else running on your system Spike test to ensure correctness validate to ensure test tests what needs to be tested Setup
  41. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Benchmark benchmark = new Benchmark() benchmark.configure(); performance = benchmark.baseline(application); user.setHappy(performance.meets(requirements)); while (( ! user.isHappy()) && (user.hasMoney())) { Profiler profiler = performance.identifyDominatingConsumer(); profilingResults = benchmark.profile(profiler); application.fixUsing( profilingResults); while ( application.failsQA()) application.debug(); performance = benchmark.baseline(application); user.setHappy(performance.meets(requirements)); } Process
  42. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Throughput (Bandwidth) rate at which operations are retired ops/sec, MB/sec, frags/sec easy to measure and interpret Time (latency) how much time on operation took generally hard to measure reliably Metrics
  43. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved What is in a response time execution dead time response time = ∑ execution + ∑ dead time
  44. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Many thing affect dead time and consequently execution time these can change from request to request Metrics
  45. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Any errors in the test environment will affect results throwing exceptions running out of memory other failures that short-circuit Coordinated omission will back pressure from the server interfere with the test harness’s ability to apply load? Ensure Test is Reliable
  46. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Generational Counts Quiz how to identify a memory leak
  47. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Change something in some specific way! response to some localized condition How to Speed up the application
  48. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Change something in some specific way! How to Speed up the application What prevents the application from performing? ! ! Where does it reside? ! ! How can we stop it from messing with performance?
  49. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Identify transaction not meeting functional requirement pin point area of concern Identify dominating consumer determine the nature of the problem System -> Application -> Microarchitecture Top-Down
  50. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Entry point is CPU utilization use tools to expose performance counter System Dominant/Liveliness
  51. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Consider network, scheduling, swapping or other kernel activities System Dominant
  52. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved In some cases hardware/OS configuration is needed May need to change the application Other contributors
  53. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Unbalanced threading lots of voluntary context switches (thrashing) Lots of involuntary context switches (saturation) Scheduling
  54. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Swapping is a performance killer for Java avoid swapping at all costs kill other processes to save memory Swapping
  55. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Sometimes the kernel is your enemy Unusual API chooices from the JVM and/or application (un)known bugs Kernel
  56. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Usual things when interacting with devices IRQ balancing sometimes this is expensive IRQ%, SOFT%
  57. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Watch for Disk activity, throughput and IOPS IOWait%
  58. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Is all that disk activity necessary? caching, buffering are your friends faster disk can solve throughput/IOPS problems IOWait%
  59. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved More caching helps? free up memory for caches trade performance for consistency IOWait%
  60. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved CPU are free and no one is using them easy to diagnose but commonly missed Idle%
  61. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Running low-threaded application on multicore improve paralyzation of algorithms throw away CPUs Idle%
  62. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved There are not enough threads ready to run Locking? Waiting for something else? Socket? Idle%
  63. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Surprising case application is highly threaded misconfigured Garbage collector Idle%
  64. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Application/JVM is finally busy This is where most people start and profilers start to be actually useful user%
  65. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Memory The gem and the curse of von-Neumann architectures cache misses dominate in most applications user%
  66. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved TLB very important for memory-bound workloads “Invisible” artifact of virtual memory system user%
  67. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved CPU caches: capacity Important to hide memory latency (and bandwidth) issues Virtually all applications today are memory/ cache-bounded user%
  68. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved CPU Caches: coherence Inter-CPU communication is managed via cache coherence Understanding this is the road to master the communication user%
  69. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Memory Bandwidth once caches run out, you face the memory dominates the cache miss performance faster memory, multiple channels help Application level (Bandwidth)
  70. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Primitive of Queuing theory L = λτ Implications if you know the arrival rate and service time you know the length of the queue for a given L, throughput is inversely proportional to service time Little’s Law
  71. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Plain unshared memory Plain shared memory provide communication Volatile All above, plus visibility Atomics All above, plus atomicity Atomic sections All above, plus group atomicity Spin-locks All above, plus mutual exclusion Wait-locks All above, plus blocking Coherency Primitives power speed
  72. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved It is possible at times to make an optimistic check Fallback to pessimistic version on failure The optimistic check has less power, but more performant Coherency: Optimistic Checks AtomicBoolean isSet = ...; if ( !isSet.get() && isSet.compareAndSet(false, true) { // oneshot action }
  73. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Identify It is possible at times to make an optimistic check Fallback to pessimistic version on failure The optimistic check has less power, but more performant Coherency: Optimistic Checks ReentrantLock lock = ...; int count = LIMIT; while (!lock.tryLock()) { if (count++ > 0) { lock.lock(); break; } }
  74. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved It is possible at times to split the shared state Much less contention on modifying the local state The total state is the superposition of local states Example: thread-safe counter synchronized { i++; } AtomicInteger.inc(); ThreadLocal.set(ThreadLocal.get() + 1); AtomicInteger[random.nextInt(count)].inc(); Coherency: Striping
  75. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved If you can remove the communication, do that Immutability to enforce Thread local states Example: ThreadLocalRandom @ JDK7 Random: use CAS to maintain the state ThreadLocalRandom: essentially, ThreadLocal<Random> Can use plain memory ops to maintain the state Coherence: No-coherence
  76. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Identify Communication quanta = cache line 32 – 128 bytes long Helps with bulk memory transfers, cache architecture Coherence protocols working on cache line ! False Sharing CPUs updating the adjacent fields? Cache line ping-pong! Coherence: False Sharing ...][AABB][...
  77. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved JVM is the new abstraction level Interacts with the application, mangles into application JVM performance affects application performance JVM Level
  78. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Most usual contender in JVM layer Lots of things to try fixing (not covered here, see elsewhere) JVM Level GC
  79. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Very cool to have your code compiled Sometimes it's even cooler to get the code compiled better JVM Level JIT
  80. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved Important for startup metrics; not really relevant for others Removing obstacles is the road to awe JVM Level (Classloading)
  81. Kodewerk Java Performance Services tm Copyright 2014 Kodewerk Ltd. All

    rights reserved What about the Customer? Need to gather clear requirements Develop a sound benchmarking/testing environment get better measurements Identify dominating consumer refocus team on problems that matter not always easy