Guerrilla Techniques for Robust Performance Engineering

Slide 1

Slide 1 text

Guerrilla Techniques for Robust Performance Engineering Dr. Neil J. Gunther Performance Dynamics Company 5th Workshop on Education and Practice of Performance Engineering Toronto, Canada May 5, 2025 © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 1 / 19

Slide 2

Slide 2 text

The Guerrilla Genesis 1 Stanford Summer school on queueing models c.1990 2 Stanford Summer school became the lecturer 1995 – 2001 3 “Guerrilla Capacity Planning” — offhand quip by me 2000 4 Stanford cancelled all Summer schools 2001 5 First private GCAP class in hotel 2002 6 Gaphorisms online: Guerrilla Aphorisms [5] 7 Guerrilla Capacity Planning book 2007 [6] 8 C19 pandemic — first GCAP online classes 2020 9 All Guerrilla classes online 2020 – present Definition 1 (What is Guerrilla Capacity Planning?) “The planning horizon is now about 3 months (1 financial quarter), thanks to the gnomes on Wall Street. Only Guerrilla-style tactical planning is crazy enough to be compatible with that kind of insanity.” —NJG (2003) © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 2 / 19

Slide 3

Slide 3 text

Data Comes from The Devil Why are we still doing TTY in the 21st century FFS? (Linus is the Devil’s handmaiden) Makes you think these are THE numbers but, all measurements are wrong. Should be able to click on a value to drill down and see the standard error CPU usage: 19.43% ± 0.97% user © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 3 / 19

Slide 4

Slide 4 text

Models Come from God Figure 1: Charlton Heston laying down the laws of queueing theory Queueing theory books [1, 2, 10, 11, 14] are written by mathematicians for mathematicians. Led to the development of PDQ (Pretty Damn Quick) queueing analyzer software. The Universal Scalability Law [3, 9, 7, 12] is also based on queueing theory but, you don’t need to understand that to use it. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 4 / 19

Slide 5

Slide 5 text

Performance Metrics Definition 2 There are only three performance metrics: 1 Time (the “zeroth” metric): T 2 Number (count, size, but no time): N 3 Rate (counts per unit time): N/T Everything else is a derived metric Example: IOPS = (Number of) IOs per second = N/T = Rate metric Question: Which metric is CPU % user? (see Slide 3) All performance metrics must boil down to Definition 2 Performance is primarily about Time (how fast) Capacity is primarily about Number (how big) © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 5 / 19

Slide 6

Slide 6 text

Measurement Meets Model ( C T ) N Measurements → X(N) System metric ← γ N 1 + α (N − 1) + β N(N − 1) USL model Completion rate C/T data is one definition of system throughput X(N) USL model is another definition of throughput X(N) [3, 9, 7, 12] Theorem (2008): X(N) must conform to the universal USL model The three Cs: 1 Concurrency (γ) — ideal parallelism, linear scaling 2 Contention (α) — queueing, buffering, Amdahl’s scaling 3 Coherency (β) — data/state exchange, messaging, memory paging © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 6 / 19

Slide 7

Slide 7 text

Data Disasters Throughput (CPS) Vuser load Latency (ms) Vuser load Load-test data comparing the X and R performance of several http servers © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 7 / 19

Slide 8

Slide 8 text

Java Geniuses From a Java performance book1 Fig.1a “isn’t scaling well” because response time is increasing “exponentially” 2 with increasing user load. Fig.1b “scales in a more desirable manner” because response time degradation is more gradual with increasing user load.3 1 S. Wilson and J. Kesselman, Java Platform Performance: Strategies and Tactics, Addison-Wesley (2000) 2 Wrong. It’s correctly scaling linearly. AKA the queueing theory hockey-stick. 3 If you can produce this kind of scaling in prod ... ship it! © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 8 / 19

Slide 9

Slide 9 text

Queueing Cues N X(N) Queueing theory dictates: Measured throughput profiles X(N) must be CONCAVE wrt the load axis (N). N is the stimulus variable and X(N) is the response variable. N R(N) Queueing theory dictates: Measured latency profiles R(N) must be CONVEX wrt the load axis (N). N is the stimulus variable and R(N) is the response variable. Any perf measurements that do not conform to these queueing rules ... are wrong! The more complex the test rig, the more likely the measurements will be wrong. Remember Chuck ... and Einstein: “If the data don’t fit the model, change the data.” © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 9 / 19

Slide 10

Slide 10 text

Example 1: Production Data Sample from 24 hr dailys Ad nauseum time-series plot Plot X(t) vs. t hurts the brain Transform to steady-state coords Sample from 24 hr dailys Steady-state plot X(N) vs. N Time t is now impliciit But what is the message? © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 10 / 19

Slide 11

Slide 11 text

Example 1: Performance model Throughput data must be concave wrt concurrent users (N) LOESS fit (green line) confirms that 100 < N < 500 Only the PDQ model shows the complete curve (blue dotted line) Throughout starts to saturate N ∼ 175 user processes Throughout maxes out at X ∼ 800 requests per second © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 11 / 19

Slide 12

Slide 12 text

Example 2: Statistical Forecasting (Trending) Figure 2: Procurement forecast for spinning up more JVM servers Holt-Winters model [13] is the blue curve inside the circles. Blue line segment (inside the red funnel) is forecast average. Very different from a queueing model or the USL. Red funnel is 90% CI, yellow funnel is 95% CI. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 12 / 19

Slide 13

Slide 13 text

Example 3: AWS Application Performance Figure 3: PDQ model of Tomcat application on AWS [8] App is scaling as good as it possibly can (modulo statistical noise) Just from following AWS autoscaling guidelines No dramatic performance improvements But capacity planning cost reduction on $10 MM/yr AWS chargeback © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 13 / 19

Slide 14

Slide 14 text

Example 4: GenAI LLM Efficient Compute Frontier Figure 4: GPT-3 pre-training ECF across successively larger LLMs OpenAI GPT-3 is a multi-layer neural network with 150 BILLION connections (“parameters”) Guerrilla analysis of ECF [4] used a combination of queueing theory [2] and the USL [7] © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 14 / 19

Slide 15

Slide 15 text

Example 4: ECF Error Loss in 3D Bistable queue Error loss transition All LLM training computations start in the upper metastable valley. Queue transitions from short length in stable upper valley to long length in stable lower valley, like a piece of string. Explains the common sigmoidal shape. Larger LLMs have deeper valleys that lie on the ECF bound. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 15 / 19

Slide 16

Slide 16 text

What is Guerrilla Perf Eng? Measurements: All data are wrong (by definition) Measurement is a process that inherently produces errors Standard error is the conventional quantification of errors How much error is acceptable? Models: All models are wrong (approximations) Data transformer from time-series to steady state plots Statistical models CAN only do trending on already measured data Queueing models can predict what CANNOT been measured Business: Need performance models to quantify ROI (e.g., AWS chargeback) Need performance models to predict procurement cycles Just ask the Finance Department Guerrilla Approach Measurements + Models = Information (need both Ms) © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 16 / 19

Slide 17

Slide 17 text

References I [1] Arnold Allen. Probability, Statistics, and Queueing Theory with Computer Science Applications. 2nd. San Diego, CA: Academic Press, 1990. [2] Neil Gunther. Analyzing Computer System Performance Using Perl::PDQ. 2nd. Heidelberg, DE: Springer, 2011. [3] Neil Gunther. “Applying The Universal Scalability Law to Distributed Systems”. In: Distributed Systems Conference. Pune, India: Distributed Systems Meetup, 2019. URL: https://speakerdeck.com/drqz/applying-the-universal-scalability- law-to-distributed-systems. [4] Neil Gunther. “Does the Efficiency Compute Frontier Represent New Physics?” In: APS Global Physics Summit. Anaheim, CA: American Physical Society, 2025. URL: https://summit.aps.org. [5] Neil Gunther. Gaphorisms: Guerrilla Aphorisms. Performance Dynamics. Mar. 2021. URL: http://www.perfdynamics.com/Manifesto/gcaprules.html. [6] Neil Gunther. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Heidelberg, DE: Springer, 2007. [7] Neil Gunther. How to Quantify Scalability. Performance Dynamics. Feb. 2020. URL: http://www.perfdynamics.com/Manifesto/USLscalability.html. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 17 / 19

Slide 18

Slide 18 text

References II [8] Neil Gunther and Mohit Chawla. “Tomcat-Applikationsperformance in der Amazon-Cloud unter Linux modelliert”. In: Linux Magazin 08 (2019). English version: https://arxiv.org/pdf/1811.12341, pp. 38–49. [9] Neil Gunther, Paul Puglia, and Kristofer Tomasette. “Hadoop Superlinear Scalability: The perpetual motion of parallel performance”. In: Comm. ACM 58.4 (2015), pp. 46–55. DOI: 10.1145/2719919. [10] Mor Harchol-Balter. Performance Modelling and Design of Computer Systems: Queueing Theory in Action. Cambridge, UK: Cambridge University Press, 2013. [11] Peter Harrison and Naresh Patel. Performance Modelling of Communication Networks and Computer Architectures. Wokingham, UK: Addison-Wesley, 1993. [12] James Holtman and Neil Gunther. “Getting in the Zone for Successful Scalability”. In: International Conference of the Computer Measurement Group. December 7-12,Las Vegas, Nevada, USA: CMG Inc., 2008. URL: https://arxiv.org/abs/0809.2541. [13] Rob Hyndman and George Athanasopoulos et al. forecast: Forecasting Functions for Time Series and Linear Models. Comprehensive R Archive Network (CRAN). June 2024. URL: https://cran.r-project.org/web/packages/forecast/index.html. [14] Leonard Kleinrock. Queueing Systems. Vol. I: Theory. New York, NY: John Wiley, 1976. © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 18 / 19

Slide 19

Slide 19 text

Questions? Thank you for attending www.perfdynamics.com Castro Valley, California Twitter twitter.com/DrQz LinkedIn Performance Dynamics Facebook Performance Dynamics Blog The Pith of Performance Training PerfDynamics.com/Classes Email [email protected] © 2025 Performance Dynamics Company Guerrilla Techniques for Robust Performance Engineering May 4, 2025 19 / 19