Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Lessons from Capacity Planning a Java Enterprise Application: How to Keep Capacity Predictions on Target and Cut CPU Usage by 5x

Lessons from Capacity Planning a Java Enterprise Application: How to Keep Capacity Predictions on Target and Cut CPU Usage by 5x

Managing the capacity and efficiency of business-critical Java applications is challenging. In this work, I proposed actionable methodologies and key metrics that enable you to

- Highlight the hidden bottleneck of many Java applications
- Devise a business-oriented capacity model that represents Java memory bottlenecks
- Detect unsound memory usage patterns and anticipate memory leaks
- Uncover a well-kept secret – the garbage collector drives the CPU usage of your servers (not your business!) and how to fix it
- Show how the garbage collector might be your first scalability bottleneck.

This work was first presented at the Computer Measurement Group’s 41th International Conference Performance & Capacity 2015 in San Antonio, Texas, and received the 2015 Computer Measurement Group Best Paper Award.

Stefano Doni

March 01, 2024

More Decks by Stefano Doni

Other Decks in Technology


  1. Lessons from capacity planning a Java enterprise application San Antonio,

    TX – November 2015 Stefano Doni – [email protected] @stef3a linkedin.com/in/stefanodoni How to keep capacity predictions on target and cut CPU usage by 5x
  2. 2 The Usual Way to Do It: Business-oriented Capacity Modeling

    Residual Capacity CPU Utilization % Business KPI (e.g. Payments / hour) Predicted Business Capacity of the System
  3. 3 What’s the Problem with Java Applications? Application CRASH! HW

    resources were healthy… So where is the bottleneck? CPU Utilization % The Bottleneck Was… Java Heap Memory!
  4. 4 Java Memory Bottlenecks Defeat Your Capacity Plans Predicted Business

    Capacity of the System CPU Utilization % Business KPI (e.g. Payments / hour) Actual Max Capacity due to Java Bottlenecks Overestimated Capacity
  5. 6 Is The Widely Used Java Heap Utilization A Good

    Metric for Capacity Planning? Heap Utilization Live Sessions Heap Utilization (all app. servers)
  6. 7 Is Heap Utilization correlated with Workload? Live Sessions Heap

    Utilization Heap Utilization stays flat, irrespective of the workload increase Heap Utilization is a poor metric for Business-Aware Capacity Planning Models!
  7. 8 What is Heap Utilization poor and How to come

    up with a Better Metric? Heap Utilization Time Garbage Collection Events Heap Utilization Live Data Size is the amount of memory consumed by the set of live – lived objects required to run the application Garbage How about using the Live Data Size for capacity planning models?
  8. 9 Live Data Size is not available in many Java

    monitoring tools! How can you measure it? Example of GC log fragment on Oracle JVM w/ Concurrent Mark-Sweep (--XX:+PrintGCDetails): Live Data Size can be gathered by looking at the memory consumed just after a major or full GC event (old gen)
  9. 10 The Result of the Data Collection: Live Data Size

    looks Promising! Heap Utilization Live Sessions Live Data Size
  10. 11 The Final Test: Is Live Data Size Correlated with

    Live Sessions? Live Data Size Live Sessions R-squared = 91% YES!
  11. 12 The End Result: The Java Memory-aware Capacity Model Live

    Data Size Live Sessions Predicted Business Capacity of the System, Considering Java Memory This methodology can be used to build Business-Aware Capacity Planning Models that includes Java Memory!
  12. 14 A new Memory usage pattern emerged after a new

    Application release… Live Data Size Live Sessions What is causing this?
  13. 15 Another Live Data Size Benefit: Anticipating Mem. Leaks Live

    Data Size Live Sessions Live Sessions Live Data Size High Mem Usage @ Low Load Based on this evidence, Devs investigated the app and found the actual memory leak. They later asked us to include this analysis as part of the release cycle
  14. Stop the guessing and start measuring! Efficiency: Are your CPUs

    used for the Business, or by the Garbage Collector?
  15. 17 All of a Sudden, Something Really Weird Happened… CPU

    Utilization Server Call Rate CPU Utilization Server Call Rate CPU Utilization cut by 5x while doing the same amount of work! No variation in business volumes, no new application release, no changes in physical infrastructure. The Change: +2 GB Java Heap!
  16. 18 GC CPU Utilization is not available in many Java

    monitoring tools. How can you measure it? Example of GC log fragment on Oracle JVM (--XX:+PrintGCDetails): Eg. 300 secs (5 min) Sum over the Interval
  17. 19 After data collection: GC was the first consumer of

    CPU! CPU Utilization Total CPU Utilization Garbage Collector CPU Util % Almost all of the CPU cycles used by GC! After cluster expansion: Total CPU cut in half, GC CPU cut by 5x! The Garbage Collector might be the first consumer of your CPUs, well ahead the actual application code. Stop the guessing, start measuring it!
  18. 21 Unexplained CPU Utilization Patterns During Memory Stressful Conditions CPU

    Utilization Server Call Rate CPU Utilization High CPU Utilization during the night, even though workload is zero after 9PM What drives CPU Utilization during the night?
  19. 22 Let’s Find It Out! Linux ‘top’ During The Anomaly

    Example of Linux ‘top’ output, thread view (press ‘H’ once in top) : One software thread consuming all of its CPU cycles? Example of Java Thread Dump (jstack <PID>) : This is the background thread used by the GC!
  20. 23 Can Java Garbage Collector Be A Scalability Bottleneck? Java

    Concurrent Mark and Sweep Garbage Collector (CMS) is concurrent and parallel ✔ Concurrent = perform work without stopping the application threads ✔ Parallel = it is multi-threaded, scales with number of CPUs But we discovered that: 1. Just one CMS Background thread is configured by default with up to 4 CPUs 1. Can be increased via specific option, but watch out for excessive GC CPU Utilization 2. CMS might «fail» and be forced to single-threaded operation 3. Even best in class GCs still need to stop the application - Amdahl law applies!
  21. 25 What have we discovered? • Traditional capacity models might

    severely overestimate the business capacity of Java applications • The major consumer of your infrastructure resources might be the garbage collector • Java memory management can have an impact on your application scalability • Common monitoring tools might not provide all the metrics you need Our contribution to close the gap • An enhanced Capacity model takes into account Java memory and support what-if analyses, using innovative KPIs • The need to get visibility into real garbage collection CPU utilization and how to gather it • How to control the problem by keeping track of single-threaded problems • Be sure to enable detailed GC logging an all your Java enterprise apps and integrate the KPIs in your CM solution! Key Takeaways
  22. Headquarters Via Schiaffino 11C 20158 Milan Italy T +39-024951-7001 USA

    East 283 Franklin Street Boston, MA 02110 T: +1-617-936-0212 USA West 425 Broadway Street Redwood City, CA 94063 T +1-650-226-4274 Contacts @moviri moviricorp moviri +moviri