The Data Analytics of Application Scaling and Why There Are No Giants

The Data Analytics of Application Scaling and Why There Are No Giants

Like the 30 ft. giant in 'Jack and the Beanstalk,' myths and fallacies abound
regarding application scaling. Many blog posts show some performance data as
time-series charts, but otherwise only offer a qualitative analysis. This talk
is intended to remedy that by showing you how to QUANTIFY scalability. Time
series are not sufficient to assess cost-benefit of cloud services and other
scalability trade-offs. After reviewing the nonlinear constraints on the
scalability of giants, we apply similar nonlinear data-analytics techniques to
determine the universal scalability constraints on such well-known applications
as: MySQL, Memcached, Varnish, Zookeeper, and Amazon AWS.

Ced140140e9ae226f0d9ef0fbb84a3a1?s=128

Dr. Neil Gunther

November 15, 2018
Tweet

Transcript

  1. The Data Analytics of Application Scaling and Why There Are

    No Giants Neil J. Gunther @DrQz Performance Dynamics SF Bay ACM Meetup Walmart Labs, Sunnyvale, California Wed, Nov 14, 2018 SM c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 1 / 74
  2. The Topic—Scalability Performance Analysis What is performance analysis? What it

    isn’t: Not monitoring computer systems Not debugging code Not “performance testing” (load testing) Not on anyone’s business card What it is: All of the above! Multidisciplinary: maths, stats, coding, skepticism, architecture, critical thinking, market trends, etc. The art of knowing what data to throw away How to quantify application scalability Data analytics applied to computer efficiency Data is only “half” the story (at best) Other “half” requires performance models c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 2 / 74
  3. The Topic—Scalability Performance Analysis What is performance analysis? What it

    isn’t: Not monitoring computer systems Not debugging code Not “performance testing” (load testing) Not on anyone’s business card What it is: All of the above! Multidisciplinary: maths, stats, coding, skepticism, architecture, critical thinking, market trends, etc. The art of knowing what data to throw away How to quantify application scalability Data analytics applied to computer efficiency Data is only “half” the story (at best) Other “half” requires performance models Working motto: Trust nothing and verify c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 2 / 74
  4. I’ve written some things c 2018 Performance Dynamics The Data

    Analytics of Application Scaling November 15, 2018 3 / 74
  5. Guerrilla performance classes Performance training is a very serious business

    (so, lunches need to be long) Guerrilla training classes local, online, and in-house (textbooks included) Guerrilla Data Analytics class — linear regression to machine learning in R c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 4 / 74
  6. Tall Tales About Giants Outline 1 Tall Tales About Giants

    2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 5 / 74
  7. Tall Tales About Giants Mythical giants (and beanstalks) J&B giant

    was reputedly about 30 tall in some accounts (no data) Let’s not even get into beanstalks in the clouds! c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 6 / 74
  8. Tall Tales About Giants Galileo was onto this 1 Discorsi

    e Dimostrazioni Matematiche Intorno a Due Nuove Scienze1, Galileo Galilei (1638) 2 On Being the Right Size, J. B. S. Haldane (1928) 1 The two new sciences: (1) materials science and (2) kinematics. c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 7 / 74
  9. Tall Tales About Giants Allometric scaling Robert P. Wadlow Tallest

    human giant b. Alton, Illinois in 1918 Reached 8.925 feet (2.72 meters) Guinness world record Died in 1940 at 22 yo Father was 5.958 feet (1.82 meters) Leg braces Dear old Dad c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 8 / 74
  10. Tall Tales About Giants Why there are no 30 ft

    giants 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Weight Scaling Stable region Unstable region load line support line critical point 1 Weight (mass) grows ∼ L3 with volume but support ∼ L2 cross-sectional area 2 Above critical point things break!!! 3 Going to use a similar approach to computer software scalability c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 9 / 74
  11. Computational Scaling Outline 1 Tall Tales About Giants 2 Computational

    Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 10 / 74
  12. Computational Scaling Goggle up — real science ahead c 2018

    Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 11 / 74
  13. Computational Scaling Scaling property 1: Equal bang for the buck

    α = 0 β = 0 Processes Capacity Everybody wants this c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 12 / 74
  14. Computational Scaling Scaling property 2: Diminishing returns α > 0

    β = 0 Processes Capacity Everybody usually gets this c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 13 / 74
  15. Computational Scaling Scaling property 3: Bottleneck limit α >> 0

    β = 0 1/α Processes Capacity Everybody hates this c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 14 / 74
  16. Computational Scaling Scaling property 4: Retrograde throughput α >> 0

    β > 0 1/N 1/α Processes Capacity Everybody thinks this never happens c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 15 / 74
  17. Universal Scalability Law (USL) Outline 1 Tall Tales About Giants

    2 Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 16 / 74
  18. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  19. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  20. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  21. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  22. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  23. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  24. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency (0 < γ < ∞) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  25. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency (0 < γ < ∞) 2 Contention (0 < α < 1) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  26. Universal Scalability Law (USL) How to quantify computational scalability N:

    processes provide system stimulus or load CN : response function or relative capacity Question: What kind of function? Answer: Nonlinear rational function 2 CN (α, β, γ) = γN 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency (0 < γ < ∞) 2 Contention (0 < α < 1) 3 Coherency (0 < β < 1) 2 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74
  27. Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput

    data c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74
  28. Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput

    data −→ CN Scalability metric c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74
  29. Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput

    data −→ CN Scalability metric ←− γN 1 + α (N − 1) + β N(N − 1) USL model c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74
  30. Universal Scalability Law (USL) Measurement meets model X(N) X(1) Thruput

    data −→ CN Scalability metric ←− γN 1 + α (N − 1) + β N(N − 1) USL model 0 20 40 60 80 100 5 10 15 20 Processes (N) Relative capacity, C(N) Linear scaling Amdahl−like scaling Retrograde scaling c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74
  31. Determining USL Coefficients Outline 1 Tall Tales About Giants 2

    Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 19 / 74
  32. Determining USL Coefficients Finding α, β, and γ Throughput measurements

    XN at various process loads N sourced from: 1 Load testing platform 2 Production monitoring Want to determine the α, β, γ that best model the XN data XN (α, β, γ) = γ 1 + α (N − 1) + β N(N − 1) XN is a nonlinear rational function (tricky) Brute force (ugh!) Clever ways: Solve for α, β, γ coefficients in one swell foop 1 nls() nonlinear regression in R 2 Solver optimizer Excel Add-in c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 20 / 74
  33. Determining USL Coefficients Everybody’s data scientist but ... NASA Dawn

    spacecraft is orbiting the dwarf planet Ceres (gone dark) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 21 / 74
  34. Determining USL Coefficients A little least-squares history Ceres was first

    observed over 200 years ago Gauss accurately estimated the (then unknown) orbit of Ceres c.1801 Already developed least squares statistical regression at 18 yo He used little data when everyone else assumed big data was necessary Data errors represented by Gaussian distribution c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 22 / 74
  35. Determining USL Coefficients Tips at different restaurants 0 1 2

    3 4 5 6 0 5 10 15 Restaurant Tip ($) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 23 / 74
  36. Determining USL Coefficients Tip deviations from mean 0 1 2

    3 4 5 6 0 5 10 15 Restaurant Tip ($) −5 7 1 −2 4 −5 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 24 / 74
  37. Determining USL Coefficients Are tips related to the bill? 0

    20 40 60 80 100 0 5 10 15 Bill ($) Tip ($) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 25 / 74
  38. Determining USL Coefficients Least squares are real 0 20 40

    60 80 100 0 5 10 15 Bill ($) Tip ($) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 26 / 74
  39. Determining USL Coefficients Relative areas: R2 = 0.7494 0 20

    40 60 80 100 0 5 10 15 Bill ($) Tip ($) Tip error Model error c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 27 / 74
  40. Determining USL Coefficients Confidence bands See GCaP book Chap. 5

    & App. B α = 0.04979728 β = 1.143438e−05 R2 = 0.9883438 0 100 200 300 0 20 40 60 Processors Benchmark throughput (Krays/s) USL Analysis of SGI Origin 2000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 28 / 74
  41. Application Scalability Outline 1 Tall Tales About Giants 2 Computational

    Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 29 / 74
  42. Application Scalability Varnish Scalability Data provided by Darius Popa, DigitAir

    and Stefan Parvu, Nokia c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 30 / 74
  43. Application Scalability Varnish architecture HTTP accelerator Reverse web proxy caching

    system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 31 / 74
  44. Application Scalability Varnish architecture HTTP accelerator Reverse web proxy caching

    system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) ... but is it? c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 31 / 74
  45. Application Scalability JMeter measurements Example (Read in raw data and

    plot it in R) df.test <- read.table(fname, header=TRUE, sep="\t") plot(df.test$N, df.test$X_N, type="b") print(df.test) 0 100 200 300 400 0 100 200 300 USL 1 Load generators (N) Relative capacity C(N) N X_N Capacity Efficiency 1 1 1.4 1.000000 1.0000000 2 2 2.7 1.928571 0.9642857 3 5 6.4 4.571429 0.9142857 4 10 12.8 9.142857 0.9142857 5 25 32.0 22.857143 0.9142857 6 50 64.0 45.714286 0.9142857 7 75 98.0 70.000000 0.9333333 8 100 131.0 93.571429 0.9357143 9 150 197.0 140.714286 0.9380952 10 250 320.0 228.571429 0.9142857 11 300 392.0 280.000000 0.9333333 12 400 518.0 370.000000 0.9250000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 32 / 74
  46. Application Scalability Near linear scaling 0 100 200 300 400

    0 100 200 300 USL 2 Load generators (N) Relative capacity C(N) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 33 / 74
  47. Application Scalability Varnish meets the USL 0 100 200 300

    400 0 100 200 300 USL 3 Load generators (N) Relative capacity C(N) α = 1e−04 β = 0 γ = 0.955364 pmax = NaN Cmax = NaN Croof = 10000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 34 / 74
  48. Application Scalability USL beyond the data 0 1000 2000 3000

    4000 5000 0 500 1000 1500 2000 2500 3000 USL 4 Load generators (N) Relative capacity C(N) α = 1e−04 β = 0 γ = 0.955364 pmax = NaN Cmax = NaN Croof = 10000 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 35 / 74
  49. Application Scalability Linux Network Driver XPD: eXpress Data Path “XDP:

    A new fast and programmable network layer” Jesper Brouer, Red Hat c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 36 / 74
  50. Application Scalability RedHat(IBM?) benchmark data c 2018 Performance Dynamics The

    Data Analytics of Application Scaling November 15, 2018 37 / 74
  51. Application Scalability USL sees beyond the data Projected from 6

    to 20 cores c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 38 / 74
  52. Application Scalability Memcached “Hidden Scalability Gotchas in Memcached and Friends”

    NJG, Shanti Subramanyam, and Stefan Parvu Sun Microsystems Presented at Velocity 2010 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 39 / 74
  53. Application Scalability Memcached scalability Scaleup Scaleout c 2018 Performance Dynamics

    The Data Analytics of Application Scaling November 15, 2018 40 / 74
  54. Application Scalability Memcache scaleout strategy Distributed cache of key-value pairs

    Data pre-loaded from RDBMS backend Deploy memcache on cheaper older CPUs (but not multicore) Single worker thread ok — until next hardware roll (multicore) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 41 / 74
  55. Application Scalability Memcache scalability data 0 2 4 6 8

    10 12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 42 / 74
  56. Application Scalability Explains these configuration warnings Configuring the memcached server

    Threading is used to scale memcached across CPU’s. The model is by "worker threads", meaning that each thread handles concurrent connections. ... By default 4 threads are allocated. ... Setting it to very large values (80+) will make it run considerably slower. Linux man pages - memcached (1) -t <threads> Number of threads to use to process incoming requests. ... It is typically not useful to set this higher than the number of CPU cores on the memcached server. Setting a high number (64 or more) of worker threads is not recommended. The default is 4. c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 43 / 74
  57. Application Scalability Memcached load-test data c 2018 Performance Dynamics The

    Data Analytics of Application Scaling November 15, 2018 44 / 74
  58. Application Scalability Memcached regression analysis c 2018 Performance Dynamics The

    Data Analytics of Application Scaling November 15, 2018 45 / 74
  59. Application Scalability Memcache scalability model 0 2 4 6 8

    10 12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 46 / 74
  60. Application Scalability Concurrency parameter 0 2 4 6 8 10

    12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 α = 0 β = 0 Processes Capacity 1 γ = 84.89 2 Slope of linear bound as Kops/thread 3 Estimate of throughput X(1) = 84.89 Kops at N = 1 thread c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 47 / 74
  61. Application Scalability Contention parameter 0 2 4 6 8 10

    12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 α >> 0 β = 0 1/α Processes Capacity α = 0.0468 Waiting or queueing for resources about 4.6% of the time Max possible throughput is X(1)/α = 1814.78 Kops (Xroof ) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 48 / 74
  62. Application Scalability Coherency parameter 0 2 4 6 8 10

    12 0 50 100 150 200 250 300 Worker threads (N) Throughput KOPS X(N) α = 0.0468 β = 0.021016 γ = 84.89 Nmax = 6.73 Xmax = 274.87 Xroof = 1814.82 α >> 0 β > 0 1/N 1/α Processes Capacity β = 0.0210 corresponds to retrograde throughput Distributed copies of data (e.g., caches) have to be exchanged/updated about 2.1% of the time to be consistent Peak occurs at Nmax = (1 − α)/β = 6.73 threads c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 49 / 74
  63. Application Scalability Improving scalability performance 0 10 20 30 40

    50 0 5 10 15 20 25 Threads (N) Speedup S(N) mcd 1.2.8 mcd 1.3.2 mcd 1.3.2 + patch c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 50 / 74
  64. Application Scalability Sirius and Zookeeper “Sirius: Distributing and Coordinating Application”

    Michael Bevilacqua-Linn, Maulan Byron, Peter Cline, Jon Moore, and Steve Muir, Comcast Presented at USENIX Annual Technical Conf. 2014 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 51 / 74
  65. Application Scalability Distributed voting throughput data All downhill ... which

    looked crazy! (to me) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 52 / 74
  66. Application Scalability USL scalability model c 2018 Performance Dynamics The

    Data Analytics of Application Scaling November 15, 2018 53 / 74
  67. Application Scalability Concurrency parameter α = 0 β = 0

    Processes Capacity 1 γ = 1024.98 2 Single node is meaningless (need N ≥ 3 for majority) 3 Interpret γ as N = 1 virtual throughput 4 USL estimates X(1) = 1024.98 WPS (black square) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 54 / 74
  68. Application Scalability Contention parameter α >> 0 β = 0

    1/α Processes Capacity α = 0.05 Queueing for resources about 5% of the time Max possible throughput is X(1)/α = 20499.54 WPS (Xroof ) But Xroof not feasible in these systems c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 55 / 74
  69. Application Scalability Coherency parameter α >> 0 β > 0

    1/N 1/α Processes Capacity β = 0.1651 says retrograde throughput dominates! Distributed data being exchanged (compared?) about 16.5% of the time (virtual) Peak at Nmax = (1 − α)/β = 2.4 cluster nodes Shocking but that’s exactly how it’s supposed to work! c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 56 / 74
  70. Using Production Data Outline 1 Tall Tales About Giants 2

    Computational Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 57 / 74
  71. Using Production Data AWS Cloud Application “Exposing the Cost of

    Performance Hidden in the Cloud” NJG and Mohit Chawla, Germany Presented at CMG cloudXchange 2018 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 58 / 74
  72. Using Production Data Production data Previously measured X and R

    directly on test rig Table 1: Converting data to performance metrics Data Meaning Metrics Meaning T Elapsed time X = C/T Throughtput Tp Processing time R = (Tp/T)(T/C) Response time C Completed work N = X × R Concurrent threads Ucpu CPU utilization S = Ucpu/X Service time Example (Coalesced metrics) Linux epoch Timestamp interval between rows is 300 seconds Timestamp, X, N, S, R, U_cpu 1486771200000, 502.171674, 170.266663, 0.000912, 0.336740, 0.458120 1486771500000, 494.403035, 175.375000, 0.001043, 0.355975, 0.515420 1486771800000, 509.541751, 188.866669, 0.000885, 0.360924, 0.450980 1486772100000, 507.089094, 188.437500, 0.000910, 0.367479, 0.461700 1486772400000, 532.803039, 191.466660, 0.000880, 0.362905, 0.468860 ... c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 59 / 74
  73. Using Production Data Tomcat data from AWS 0 100 200

    300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 60 / 74
  74. Using Production Data USL nonlinear analysis 0 100 200 300

    400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 61 / 74
  75. Using Production Data Concurrency parameter 0 100 200 300 400

    500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 α = 0 β = 0 Processes Capacity 1 γ = 3.0 2 Smallest number of threads during 24 hr sample is N > 100 3 Nonetheless USL estimates throughput X(1) = 3 RPS c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 62 / 74
  76. Using Production Data Contention parameter 0 100 200 300 400

    500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 α >> 0 β = 0 1/α Processes Capacity α = 0 No significant waiting or queueing Max possible throughput Xroof not defined c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 63 / 74
  77. Using Production Data Coherency parameter 0 100 200 300 400

    500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 3e−06 γ = 3 Nmax = 539.2 Xmax = 809.55 Nopt = 274.8 α >> 0 β > 0 1/N 1/α Processes Capacity β = 3 × 10−6 implies very weak retrograde throughput Extremely little data exchange But entirely responsible for sublinearity And peak throughput Xmax = 809.55 RPS Peak occurs at Nmax = 539.2 threads c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 64 / 74
  78. Using Production Data Revised USL analysis Parallel threads implies linear

    scaling Linear slope γ ∼ 3: γ = 2.65 Should be no contention, i.e., α = 0 Discontinuity at N ∼ 275 threads Throughput plateaus, i.e., β = 0 Saturation occurs at processor utilization UCPU ≥ 75% Linux OS can’t do that! Pseudo-saturation due to AWS Auto Scaling policy (hypervisor?) Many EC2 instances spun up and down during 24 hrs c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 65 / 74
  79. Using Production Data Corrected USL linear model 0 100 200

    300 400 500 0 200 400 600 800 Tomcat threads Throughput (RPS) α = 0 β = 0 γ = 2.65 Nmax = NaN Xmax = 727.03 Nopt = 274.8 Parallel threads Pseudo−saturation c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 66 / 74
  80. Using Production Data MySQL Big Data 2,000 production database logs

    500,000 data points NJG with Baron Schwartz and Preetam Jinka c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 67 / 74
  81. Using Production Data How do you comprehend big data? c

    2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 68 / 74
  82. Using Production Data MySQL big data — the (horror) movie

    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 69 / 74
  83. Using Production Data USL analysis Analyzed some 2,000 production database

    logs I resorted to animation as a visualization tool About 500,000 data points in aggregate 0 1 2 3 4 5 6 7 0 5000 10000 15000 20000 USL3: [ 917 ] mysql 10.0.24−MariaDB Concurrent processes Queries per second α = 0.0469166 β = 0.0067516 γ = 5444.43 X1 = 5196.11 Npeak = 11.88 Xpeak = 25910.07 Nopt = 21.31 Xmax = 116044.9 0 10 20 30 40 0 5000 10000 15000 20000 USL3: [ 1403 ] mysql 5.5.52−0ubuntu0.12.04.1−log ... Concurrent processes Queries per second α = 0.0456189 β = 0.00118014 γ = 2271.33 X1 = 3195.96 Npeak = 28.44 Xpeak = 24074.24 Nopt = 21.92 Xmax = 49789.25 Comparison of USL parameters USL found unexpected progressive changes in scalability c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 70 / 74
  84. Wrap Up Outline 1 Tall Tales About Giants 2 Computational

    Scaling 3 Universal Scalability Law (USL) 4 Determining USL Coefficients 5 Application Scalability 6 Using Production Data 7 Wrap Up c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 71 / 74
  85. Wrap Up R packages for the USL 1 SATK on

    R-Forge Author: Paul Puglia (Guerrilla graduate) Applies multiple USL coefficient models for best fit install.packages("SATK", repos="http://R-Forge.R-project.org") library(SATK) data(USLcalc) uslcalc.zones <- zones(USLcalc) plot(uslcalc.zones) 2 usl on CRAN Author: Stefan M¨ oding Uses both nls() from base and nlxb() from nlsr package install.packages("usl") library(usl) data(specsdm91) usl.model <- usl(throughput ˜ load, specsdm91) summary(usl.model) peak.scalability(usl.model) plot(specsdm91, pch=16) plot(usl.model, col="red", add=TRUE) c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 72 / 74
  86. Wrap Up Response time scalability Presented throughput scalability Response time

    scalability? Brooks’ law (too many cooks) Queueing theory foundations Queueing simulations XN = γN 1 + α (N − 1) + β N(N − 1) RN = N XN − Z 0 2 4 6 8 10 People 2 4 6 8 10 12 Months Fixed delay due to meetings 0 2 4 6 8 10 People 2 4 6 8 10 Months Growing delay due to 1 on 1 mtgs 0 2 4 6 8 10 People 5 10 15 Months 0 2 4 6 8 10 People 0.5 1.0 1.5 2.0 Output Parallel Amdahl Brooks USL c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 73 / 74
  87. Wrap Up Questions? www.perfdynamics.com Castro Valley, California Twitter twitter.com/DrQz Facebook

    facebook.com/PerformanceDynamics Blog perfdynamics.blogspot.com Training classes perfdynamics.com/Classes info@perfdynamics.com +1-510-537-5758 c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 74 / 74