Slide 1

Slide 1 text

The Data Analytics of Application Scaling and Why There Are No Giants Neil J. Gunther @DrQz Performance Dynamics SF Bay ACM Meetup Walmart Labs, Sunnyvale, California Wed, Nov 14, 2018 SM c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 1 / 74

Slide 2

Slide 2 text

The Topic—Scalability Performance Analysis What is performance analysis? What it isn’t: Not monitoring computer systems Not debugging code Not “performance testing” (load testing) Not on anyone’s business card What it is: All of the above! Multidisciplinary: maths, stats, coding, skepticism, architecture, critical thinking, market trends, etc. The art of knowing what data to throw away How to quantify application scalability Data analytics applied to computer efficiency Data is only “half” the story (at best) Other “half” requires performance models c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 2 / 74

Slide 3

Slide 3 text

The Topic—Scalability Performance Analysis What is performance analysis? What it isn’t: Not monitoring computer systems Not debugging code Not “performance testing” (load testing) Not on anyone’s business card What it is: All of the above! Multidisciplinary: maths, stats, coding, skepticism, architecture, critical thinking, market trends, etc. The art of knowing what data to throw away How to quantify application scalability Data analytics applied to computer efficiency Data is only “half” the story (at best) Other “half” requires performance models Working motto: Trust nothing and verify c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 2 / 74

Slide 4

Slide 4 text

I’ve written some things c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 3 / 74

Slide 5

Slide 5 text

Guerrilla performance classes Performance training is a very serious business (so, lunches need to be long) Guerrilla training classes local, online, and in-house (textbooks included) Guerrilla Data Analytics class — linear regression to machine learning in R c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 4 / 74

Slide 6

Slide 6 text

Tall Tales About Giants Outline 1 Tall Tales About Giants 2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 5 / 74

Slide 7

Slide 7 text

Tall Tales About Giants Mythical giants (and beanstalks) J&B giant was reputedly about 30 tall in some accounts (no data) Let’s not even get into beanstalks in the clouds! c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 6 / 74

Slide 8

Slide 8 text

Tall Tales About Giants Galileo was onto this 1 Discorsi e Dimostrazioni Matematiche Intorno a Due Nuove Scienze1, Galileo Galilei (1638) 2 On Being the Right Size, J. B. S. Haldane (1928) 1 The two new sciences: (1) materials science and (2) kinematics. c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 7 / 74

Slide 9

Slide 9 text

Tall Tales About Giants Allometric scaling Robert P. Wadlow Tallest human giant b. Alton, Illinois in 1918 Reached 8.925 feet (2.72 meters) Guinness world record Died in 1940 at 22 yo Father was 5.958 feet (1.82 meters) Leg braces Dear old Dad c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 8 / 74

Slide 10

Slide 10 text

Tall Tales About Giants Why there are no 30 ft giants 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Weight Scaling Stable region Unstable region load line support line critical point 1 Weight (mass) grows ∼ L3 with volume but support ∼ L2 cross-sectional area 2 Above critical point things break!!! 3 Going to use a similar approach to computer software scalability c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 9 / 74

Slide 11

Slide 11 text

Computational Scaling Outline 1 Tall Tales About Giants 2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 10 / 74

Slide 12

Slide 12 text

Computational Scaling Goggle up — real science ahead c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 11 / 74

Slide 13

Slide 13 text

Computational Scaling Scaling property 1: Equal bang for the buck α = 0 β = 0 Processes Capacity Everybody wants this c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 12 / 74

Slide 14

Slide 14 text

Computational Scaling Scaling property 2: Diminishing returns α > 0 β = 0 Processes Capacity Everybody usually gets this c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 13 / 74

Slide 15

Slide 15 text

Computational Scaling Scaling property 3: Bottleneck limit α >> 0 β = 0 1/α Processes Capacity Everybody hates this c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 14 / 74

Slide 16

Slide 16 text

Computational Scaling Scaling property 4: Retrograde throughput α >> 0 β > 0 1/N 1/α Processes Capacity Everybody thinks this never happens c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 15 / 74

Slide 17

Slide 17 text

Universal Scalability Law (USL) Outline 1 Tall Tales About Giants 2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 16 / 74

Slide 18

Slide 18 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 19

Slide 19 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 20

Slide 20 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 21

Slide 21 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 22

Slide 22 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 23

Slide 23 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 24

Slide 24 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency (0 < γ < ∞) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 25

Slide 25 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency (0 < γ < ∞) 2 Contention (0 < α < 1) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 26

Slide 26 text

Universal Scalability Law (USL) How to quantify computational scalability N: processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency (0 < γ < ∞) 2 Contention (0 < α < 1) 3 Coherency (0 < β < 1) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

Slide 27

Slide 27 text

Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput data c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

Slide 28

Slide 28 text

Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput data −→ CN Scalability metric c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

Slide 29

Slide 29 text

Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput data −→ CN Scalability metric ←− γN 1 + α (N − 1) + β N(N − 1) USL model c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

Slide 30

Slide 30 text

Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput data −→ CN Scalability metric ←− γN 1 + α (N − 1) + β N(N − 1) USL model 0 20 40 60 80 100 5 10 15 20 Processes (N) Relative capacity, C(N) Linear scaling Amdahl−like scaling Retrograde scaling c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

Slide 31

Slide 31 text

Determining USL Coefficients Outline 1 Tall Tales About Giants 2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 19 / 74

Slide 32

Slide 32 text

Determining USL Coefficients Finding α, β, and γ Throughput measurements XN at various process loads N sourced from: 1 Load testing platform 2 Production monitoring Want to determine the α, β, γ that best model the XN data XN (α, β, γ) = γ 1 + α (N − 1) + β N(N − 1) XN is a nonlinear rational function (tricky) Brute force (ugh!) Clever ways: Solve for α, β, γ coefficients in one swell foop 1 nls() nonlinear regression in R 2 Solver optimizer Excel Add-in c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 20 / 74

Slide 33

Slide 33 text

Determining USL Coefficients Everybody’s data scientist but ... NASA Dawn spacecraft is orbiting the dwarf planet Ceres (gone dark) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 21 / 74

Slide 34

Slide 34 text

Determining USL Coefficients A little least-squares history Ceres was first observed over 200 years ago Gauss accurately estimated the (then unknown) orbit of Ceres c.1801 Already developed least squares statistical regression at 18 yo He used little data when everyone else assumed big data was necessary Data errors represented by Gaussian distribution c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 22 / 74

Slide 35

Slide 35 text

Determining USL Coefficients Tips at different restaurants 0 1 2 3 4 5 6 0 5 10 15 Restaurant Tip ($) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 23 / 74

Slide 36

Slide 36 text

Determining USL Coefficients Tip deviations from mean 0 1 2 3 4 5 6 0 5 10 15 Restaurant Tip ($) −5 7 1 −2 4 −5 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 24 / 74

Slide 37

Slide 37 text

Determining USL Coefficients Are tips related to the bill? 0 20 40 60 80 100 0 5 10 15 Bill ($) Tip ($) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 25 / 74

Slide 38

Slide 38 text

Determining USL Coefficients Least squares are real 0 20 40 60 80 100 0 5 10 15 Bill ($) Tip ($) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 26 / 74

Slide 39

Slide 39 text

Determining USL Coefficients Relative areas: R2 = 0.7494 0 20 40 60 80 100 0 5 10 15 Bill ($) Tip ($) Tip error Model error c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 27 / 74

Slide 40

Slide 40 text

Determining USL Coefficients Confidence bands See GCaP book Chap. 5 & App. B α = 0.04979728 β = 1.143438e−05 R2 = 0.9883438 0 100 200 300 0 20 40 60 Processors Benchmark throughput (Krays/s) USL Analysis of SGI Origin 2000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 28 / 74

Slide 41

Slide 41 text

Application Scalability Outline 1 Tall Tales About Giants 2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 29 / 74

Slide 42

Slide 42 text

Application Scalability Varnish Scalability Data provided by Darius Popa, DigitAir and Stefan Parvu, Nokia c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 30 / 74

Slide 43

Slide 43 text

Application Scalability Varnish architecture HTTP accelerator Reverse web proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 31 / 74

Slide 44

Slide 44 text

Application Scalability Varnish architecture HTTP accelerator Reverse web proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) ... but is it? c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 31 / 74

Slide 45

Slide 45 text

Application Scalability JMeter measurements Example (Read in raw data and plot it in R) df.test <- read.table(fname, header=TRUE, sep="\t") plot(df.test$N, df.test$X_N, type="b") print(df.test) 0 100 200 300 400 0 100 200 300 USL 1 Load generators (N) Relative capacity C(N) N X_N Capacity Efficiency 1 1 1.4 1.000000 1.0000000 2 2 2.7 1.928571 0.9642857 3 5 6.4 4.571429 0.9142857 4 10 12.8 9.142857 0.9142857 5 25 32.0 22.857143 0.9142857 6 50 64.0 45.714286 0.9142857 7 75 98.0 70.000000 0.9333333 8 100 131.0 93.571429 0.9357143 9 150 197.0 140.714286 0.9380952 10 250 320.0 228.571429 0.9142857 11 300 392.0 280.000000 0.9333333 12 400 518.0 370.000000 0.9250000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 32 / 74

Slide 46

Slide 46 text

Application Scalability Near linear scaling 0 100 200 300 400 0 100 200 300 USL 2 Load generators (N) Relative capacity C(N) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 33 / 74

Slide 47

Slide 47 text

Application Scalability Varnish meets the USL 0 100 200 300 400 0 100 200 300 USL 3 Load generators (N) Relative capacity C(N) α = 1e−04 β = 0 γ = 0.955364 pmax = NaN Cmax = NaN Croof = 10000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 34 / 74

Slide 48

Slide 48 text

Application Scalability USL beyond the data 0 1000 2000 3000 4000 5000 0 500 1000 1500 2000 2500 3000 USL 4 Load generators (N) Relative capacity C(N) α = 1e−04 β = 0 γ = 0.955364 pmax = NaN Cmax = NaN Croof = 10000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 35 / 74

Slide 49

Slide 49 text

Application Scalability Linux Network Driver XPD: eXpress Data Path “XDP: A new fast and programmable network layer” Jesper Brouer, Red Hat c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 36 / 74

Slide 50

Slide 50 text

Application Scalability RedHat(IBM?) benchmark data c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 37 / 74

Slide 51

Slide 51 text

Application Scalability USL sees beyond the data Projected from 6 to 20 cores c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 38 / 74

Slide 52

Slide 52 text

Application Scalability Memcached “Hidden Scalability Gotchas in Memcached and Friends” NJG, Shanti Subramanyam, and Stefan Parvu Sun Microsystems Presented at Velocity 2010 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 39 / 74

Slide 53

Slide 53 text

Application Scalability Memcached scalability Scaleup Scaleout c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 40 / 74

Slide 54

Slide 54 text

Application Scalability Memcache scaleout strategy Distributed cache of key-value pairs Data pre-loaded from RDBMS backend Deploy memcache on cheaper older CPUs (but not multicore) Single worker thread ok — until next hardware roll (multicore) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 41 / 74

Slide 55

Slide 55 text

Application Scalability Memcache scalability data 0 2 4 6 8 10 12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 42 / 74

Slide 56

Slide 56 text

Application Scalability Explains these configuration warnings Configuring the memcached server Threading is used to scale memcached across CPU’s. The model is by "worker threads", meaning that each thread handles concurrent connections. ... By default 4 threads are allocated. ... Setting it to very large values (80+) will make it run considerably slower. Linux man pages - memcached (1) -t Number of threads to use to process incoming requests. ... It is typically not useful to set this higher than the number of CPU cores on the memcached server. Setting a high number (64 or more) of worker threads is not recommended. The default is 4. c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 43 / 74

Slide 57

Slide 57 text

Application Scalability Memcached load-test data c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 44 / 74

Slide 58

Slide 58 text

Application Scalability Memcached regression analysis c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 45 / 74

Slide 59

Slide 59 text

Application Scalability Memcache scalability model 0 2 4 6 8 10 12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 46 / 74

Slide 60

Slide 60 text

Application Scalability Concurrency parameter 0 2 4 6 8 10 12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 α = 0 β = 0 Processes Capacity 1 γ = 84.89 2 Slope of linear bound as Kops/thread 3 Estimate of throughput X(1) = 84.89 Kops at N = 1 thread c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 47 / 74

Slide 61

Slide 61 text

Application Scalability Contention parameter 0 2 4 6 8 10 12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 α >> 0 β = 0 1/α Processes Capacity α = 0.0468 Waiting or queueing for resources about 4.6% of the time Max possible throughput is X(1)/α = 1814.78 Kops (Xroof ) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 48 / 74

Slide 62

Slide 62 text

Application Scalability Coherency parameter 0 2 4 6 8 10 12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 α >> 0 β > 0 1/N 1/α Processes Capacity β = 0.0210 corresponds to retrograde throughput Distributed copies of data (e.g., caches) have to be exchanged/updated about 2.1% of the time to be consistent Peak occurs at Nmax = (1 − α)/β = 6.73 threads c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 49 / 74

Slide 63

Slide 63 text

Application Scalability Improving scalability performance 0 10 20 30 40 50 0 5 10 15 20 25 Threads (N) Speedup S(N) mcd 1.2.8 mcd 1.3.2 mcd 1.3.2 + patch c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 50 / 74

Slide 64

Slide 64 text

Application Scalability Sirius and Zookeeper “Sirius: Distributing and Coordinating Application” Michael Bevilacqua-Linn, Maulan Byron, Peter Cline, Jon Moore, and Steve Muir, Comcast Presented at USENIX Annual Technical Conf. 2014 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 51 / 74

Slide 65

Slide 65 text

Application Scalability Distributed voting throughput data All downhill ... which looked crazy! (to me) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 52 / 74

Slide 66

Slide 66 text

Application Scalability USL scalability model c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 53 / 74

Slide 67

Slide 67 text

Application Scalability Concurrency parameter α = 0 β = 0 Processes Capacity 1 γ = 1024.98 2 Single node is meaningless (need N ≥ 3 for majority) 3 Interpret γ as N = 1 virtual throughput 4 USL estimates X(1) = 1024.98 WPS (black square) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 54 / 74

Slide 68

Slide 68 text

Application Scalability Contention parameter α >> 0 β = 0 1/α Processes Capacity α = 0.05 Queueing for resources about 5% of the time Max possible throughput is X(1)/α = 20499.54 WPS (Xroof ) But Xroof not feasible in these systems c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 55 / 74

Slide 69

Slide 69 text

Application Scalability Coherency parameter α >> 0 β > 0 1/N 1/α Processes Capacity β = 0.1651 says retrograde throughput dominates! Distributed data being exchanged (compared?) about 16.5% of the time (virtual) Peak at Nmax = (1 − α)/β = 2.4 cluster nodes Shocking but that’s exactly how it’s supposed to work! c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 56 / 74

Slide 70

Slide 70 text

Using Production Data Outline 1 Tall Tales About Giants 2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 57 / 74

Slide 71

Slide 71 text

Using Production Data AWS Cloud Application “Exposing the Cost of Performance Hidden in the Cloud” NJG and Mohit Chawla, Germany Presented at CMG cloudXchange 2018 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 58 / 74

Slide 72

Slide 72 text

Using Production Data Production data Previously measured X and R directly on test rig Table 1: Converting data to performance metrics Data Meaning Metrics Meaning T Elapsed time X = C/T Throughtput Tp Processing time R = (Tp/T)(T/C) Response time C Completed work N = X × R Concurrent threads Ucpu CPU utilization S = Ucpu/X Service time Example (Coalesced metrics) Linux epoch Timestamp interval between rows is 300 seconds Timestamp, X, N, S, R, U_cpu 1486771200000, 502.171674, 170.266663, 0.000912, 0.336740, 0.458120 1486771500000, 494.403035, 175.375000, 0.001043, 0.355975, 0.515420 1486771800000, 509.541751, 188.866669, 0.000885, 0.360924, 0.450980 1486772100000, 507.089094, 188.437500, 0.000910, 0.367479, 0.461700 1486772400000, 532.803039, 191.466660, 0.000880, 0.362905, 0.468860 ... c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 59 / 74

Slide 73

Slide 73 text

Using Production Data Tomcat data from AWS 0 100 200 300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 60 / 74

Slide 74

Slide 74 text

Using Production Data USL nonlinear analysis 0 100 200 300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 61 / 74

Slide 75

Slide 75 text

Using Production Data Concurrency parameter 0 100 200 300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 α = 0 β = 0 Processes Capacity 1 γ = 3.0 2 Smallest number of threads during 24 hr sample is N > 100 3 Nonetheless USL estimates throughput X(1) = 3 RPS c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 62 / 74

Slide 76

Slide 76 text

Using Production Data Contention parameter 0 100 200 300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 α >> 0 β = 0 1/α Processes Capacity α = 0 No significant waiting or queueing Max possible throughput Xroof not defined c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 63 / 74

Slide 77

Slide 77 text

Using Production Data Coherency parameter 0 100 200 300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 α >> 0 β > 0 1/N 1/α Processes Capacity β = 3 × 10−6 implies very weak retrograde throughput Extremely little data exchange But entirely responsible for sublinearity And peak throughput Xmax = 809.55 RPS Peak occurs at Nmax = 539.2 threads c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 64 / 74

Slide 78

Slide 78 text

Using Production Data Revised USL analysis Parallel threads implies linear scaling Linear slope γ ∼ 3: γ = 2.65 Should be no contention, i.e., α = 0 Discontinuity at N ∼ 275 threads Throughput plateaus, i.e., β = 0 Saturation occurs at processor utilization UCPU ≥ 75% Linux OS can’t do that! Pseudo-saturation due to AWS Auto Scaling policy (hypervisor?) Many EC2 instances spun up and down during 24 hrs c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 65 / 74

Slide 79

Slide 79 text

Using Production Data Corrected USL linear model 0 100 200 300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 0 γ = 2.65 Nmax = NaN Xmax = 727.03 Nopt = 274.8 Parallel threads Pseudo−saturation c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 66 / 74

Slide 80

Slide 80 text

Using Production Data MySQL Big Data 2,000 production database logs 500,000 data points NJG with Baron Schwartz and Preetam Jinka c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 67 / 74

Slide 81

Slide 81 text

Using Production Data How do you comprehend big data? c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 68 / 74

Slide 82

Slide 82 text

Using Production Data MySQL big data — the (horror) movie c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 69 / 74

Slide 83

Slide 83 text

Using Production Data USL analysis Analyzed some 2,000 production database logs I resorted to animation as a visualization tool About 500,000 data points in aggregate 0 1 2 3 4 5 6 7 0 5000 10000 15000 20000 USL3: [ 917 ] mysql 10.0.24−MariaDB Concurrent processes Queries per second α = 0.0469166 β = 0.0067516 γ = 5444.43 X1 = 5196.11 Npeak = 11.88 Xpeak = 25910.07 Nopt = 21.31 Xmax = 116044.9 0 10 20 30 40 0 5000 10000 15000 20000 USL3: [ 1403 ] mysql 5.5.52−0ubuntu0.12.04.1−log ... Concurrent processes Queries per second α = 0.0456189 β = 0.00118014 γ = 2271.33 X1 = 3195.96 Npeak = 28.44 Xpeak = 24074.24 Nopt = 21.92 Xmax = 49789.25 Comparison of USL parameters USL found unexpected progressive changes in scalability c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 70 / 74

Slide 84

Slide 84 text

Wrap Up Outline 1 Tall Tales About Giants 2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 71 / 74

Slide 85

Slide 85 text

Wrap Up R packages for the USL 1 SATK on R-Forge Author: Paul Puglia (Guerrilla graduate) Applies multiple USL coefficient models for best fit install.packages("SATK", repos="http://R-Forge.R-project.org") library(SATK) data(USLcalc) uslcalc.zones <- zones(USLcalc) plot(uslcalc.zones) 2 usl on CRAN Author: Stefan M¨ oding Uses both nls() from base and nlxb() from nlsr package install.packages("usl") library(usl) data(specsdm91) usl.model <- usl(throughput ˜ load, specsdm91) summary(usl.model) peak.scalability(usl.model) plot(specsdm91, pch=16) plot(usl.model, col="red", add=TRUE) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 72 / 74

Slide 86

Slide 86 text

Wrap Up Response time scalability Presented throughput scalability Response time scalability? Brooks’ law (too many cooks) Queueing theory foundations Queueing simulations XN = γN 1 + α (N − 1) + β N(N − 1) RN = N XN − Z 0 2 4 6 8 10 People 2 4 6 8 10 12 Months Fixed delay due to meetings 0 2 4 6 8 10 People 2 4 6 8 10 Months Growing delay due to 1 on 1 mtgs 0 2 4 6 8 10 People 5 10 15 Months 0 2 4 6 8 10 People 0.5 1.0 1.5 2.0 Output Parallel Amdahl Brooks USL c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 73 / 74

Slide 87

Slide 87 text

Wrap Up Questions? www.perfdynamics.com Castro Valley, California Twitter twitter.com/DrQz Facebook facebook.com/PerformanceDynamics Blog perfdynamics.blogspot.com Training classes perfdynamics.com/Classes [email protected] +1-510-537-5758 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 74 / 74