Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Data Analytics of Application Scaling and Why There Are No Giants

Dr. Neil Gunther
November 15, 2018

The Data Analytics of Application Scaling and Why There Are No Giants

Like the 30 ft. giant in 'Jack and the Beanstalk,' myths and fallacies abound
regarding application scaling. Many blog posts show some performance data as
time-series charts, but otherwise only offer a qualitative analysis. This talk
is intended to remedy that by showing you how to QUANTIFY scalability. Time
series are not sufficient to assess cost-benefit of cloud services and other
scalability trade-offs. After reviewing the nonlinear constraints on the
scalability of giants, we apply similar nonlinear data-analytics techniques to
determine the universal scalability constraints on such well-known applications
as: MySQL, Memcached, Varnish, Zookeeper, and Amazon AWS.

Dr. Neil Gunther

November 15, 2018
Tweet

More Decks by Dr. Neil Gunther

Other Decks in Research

Transcript

  1. The Data Analytics of Application Scaling
    and Why There Are No Giants
    Neil J. Gunther @DrQz
    Performance Dynamics
    SF Bay ACM Meetup
    Walmart Labs, Sunnyvale, California
    Wed, Nov 14, 2018
    SM
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 1 / 74

    View Slide

  2. The Topic—Scalability Performance Analysis
    What is performance analysis?
    What it isn’t: Not monitoring computer systems
    Not debugging code
    Not “performance testing” (load testing)
    Not on anyone’s business card
    What it is: All of the above!
    Multidisciplinary: maths, stats, coding, skepticism,
    architecture, critical thinking, market trends, etc.
    The art of knowing what data to throw away
    How to quantify application scalability
    Data analytics applied to computer efficiency
    Data is only “half” the story (at best)
    Other “half” requires performance models
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 2 / 74

    View Slide

  3. The Topic—Scalability Performance Analysis
    What is performance analysis?
    What it isn’t: Not monitoring computer systems
    Not debugging code
    Not “performance testing” (load testing)
    Not on anyone’s business card
    What it is: All of the above!
    Multidisciplinary: maths, stats, coding, skepticism,
    architecture, critical thinking, market trends, etc.
    The art of knowing what data to throw away
    How to quantify application scalability
    Data analytics applied to computer efficiency
    Data is only “half” the story (at best)
    Other “half” requires performance models
    Working motto: Trust nothing and verify
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 2 / 74

    View Slide

  4. I’ve written some things
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 3 / 74

    View Slide

  5. Guerrilla performance classes
    Performance training is a very serious business (so, lunches need to be long)
    Guerrilla training classes local, online, and in-house (textbooks included)
    Guerrilla Data Analytics class — linear regression to machine learning in R
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 4 / 74

    View Slide

  6. Tall Tales About Giants
    Outline
    1 Tall Tales About Giants
    2 Computational Scaling
    3 Universal Scalability Law (USL)
    4 Determining USL Coefficients
    5 Application Scalability
    6 Using Production Data
    7 Wrap Up
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 5 / 74

    View Slide

  7. Tall Tales About Giants
    Mythical giants (and beanstalks)
    J&B giant was reputedly about 30 tall in some accounts (no data)
    Let’s not even get into beanstalks in the clouds!
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 6 / 74

    View Slide

  8. Tall Tales About Giants
    Galileo was onto this
    1 Discorsi e Dimostrazioni Matematiche Intorno a Due Nuove Scienze1, Galileo
    Galilei (1638)
    2 On Being the Right Size, J. B. S. Haldane (1928)
    1
    The two new sciences: (1) materials science and (2) kinematics.
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 7 / 74

    View Slide

  9. Tall Tales About Giants
    Allometric scaling
    Robert P. Wadlow
    Tallest human giant
    b. Alton, Illinois in 1918
    Reached 8.925 feet
    (2.72 meters)
    Guinness world record
    Died in 1940 at 22 yo
    Father was 5.958 feet
    (1.82 meters) Leg braces
    Dear old Dad
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 8 / 74

    View Slide

  10. Tall Tales About Giants
    Why there are no 30 ft giants
    0.0 0.5 1.0 1.5 2.0
    0.0 0.5 1.0 1.5 2.0
    Weight
    Scaling
    Stable region Unstable region
    load line
    support line critical point
    1 Weight (mass) grows ∼ L3 with volume but support ∼ L2 cross-sectional area
    2 Above critical point things break!!!
    3 Going to use a similar approach to computer software scalability
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 9 / 74

    View Slide

  11. Computational Scaling
    Outline
    1 Tall Tales About Giants
    2 Computational Scaling
    3 Universal Scalability Law (USL)
    4 Determining USL Coefficients
    5 Application Scalability
    6 Using Production Data
    7 Wrap Up
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 10 / 74

    View Slide

  12. Computational Scaling
    Goggle up — real science ahead
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 11 / 74

    View Slide

  13. Computational Scaling
    Scaling property 1: Equal bang for the buck
    α = 0
    β = 0
    Processes
    Capacity
    Everybody wants this
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 12 / 74

    View Slide

  14. Computational Scaling
    Scaling property 2: Diminishing returns
    α > 0
    β = 0
    Processes
    Capacity
    Everybody usually gets this
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 13 / 74

    View Slide

  15. Computational Scaling
    Scaling property 3: Bottleneck limit
    α >> 0
    β = 0
    1/α
    Processes
    Capacity
    Everybody hates this
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 14 / 74

    View Slide

  16. Computational Scaling
    Scaling property 4: Retrograde throughput
    α >> 0
    β > 0
    1/N
    1/α
    Processes
    Capacity
    Everybody thinks this never happens
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 15 / 74

    View Slide

  17. Universal Scalability Law (USL)
    Outline
    1 Tall Tales About Giants
    2 Computational Scaling
    3 Universal Scalability Law (USL)
    4 Determining USL Coefficients
    5 Application Scalability
    6 Using Production Data
    7 Wrap Up
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 16 / 74

    View Slide

  18. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  19. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  20. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    Question: What kind of function?
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  21. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    Question: What kind of function?
    Answer: Nonlinear rational function 2
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  22. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    Question: What kind of function?
    Answer: Nonlinear rational function 2
    CN
    (α, β, γ) =
    γN
    1 + α (N − 1) + β N(N − 1)
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  23. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    Question: What kind of function?
    Answer: Nonlinear rational function 2
    CN
    (α, β, γ) =
    γN
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  24. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    Question: What kind of function?
    Answer: Nonlinear rational function 2
    CN
    (α, β, γ) =
    γN
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    1
    Concurrency (0 < γ < ∞)
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  25. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    Question: What kind of function?
    Answer: Nonlinear rational function 2
    CN
    (α, β, γ) =
    γN
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    1
    Concurrency (0 < γ < ∞)
    2
    Contention (0 < α < 1)
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  26. Universal Scalability Law (USL)
    How to quantify computational scalability
    N: processes provide system stimulus or load
    CN
    : response function or relative capacity
    Question: What kind of function?
    Answer: Nonlinear rational function 2
    CN
    (α, β, γ) =
    γN
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    1
    Concurrency (0 < γ < ∞)
    2
    Contention (0 < α < 1)
    3
    Coherency (0 < β < 1)
    2
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conference (1993)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 17 / 74

    View Slide

  27. Universal Scalability Law (USL)
    Measurement meets model
    X(N)
    X(1)
    Thruput data
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

    View Slide

  28. Universal Scalability Law (USL)
    Measurement meets model
    X(N)
    X(1)
    Thruput data
    −→ CN
    Scalability metric
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

    View Slide

  29. Universal Scalability Law (USL)
    Measurement meets model
    X(N)
    X(1)
    Thruput data
    −→ CN
    Scalability metric
    ←−
    γN
    1 + α (N − 1) + β N(N − 1)
    USL model
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

    View Slide

  30. Universal Scalability Law (USL)
    Measurement meets model
    X(N)
    X(1)
    Thruput data
    −→ CN
    Scalability metric
    ←−
    γN
    1 + α (N − 1) + β N(N − 1)
    USL model
    0 20 40 60 80 100
    5 10 15 20
    Processes (N)
    Relative capacity, C(N)
    Linear scaling Amdahl−like scaling
    Retrograde scaling
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 18 / 74

    View Slide

  31. Determining USL Coefficients
    Outline
    1 Tall Tales About Giants
    2 Computational Scaling
    3 Universal Scalability Law (USL)
    4 Determining USL Coefficients
    5 Application Scalability
    6 Using Production Data
    7 Wrap Up
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 19 / 74

    View Slide

  32. Determining USL Coefficients
    Finding α, β, and γ
    Throughput measurements XN
    at various process loads N sourced from:
    1 Load testing platform
    2 Production monitoring
    Want to determine the α, β, γ that best model the XN
    data
    XN
    (α, β, γ) =
    γ
    1 + α (N − 1) + β N(N − 1)
    XN
    is a nonlinear rational function (tricky)
    Brute force (ugh!)
    Clever ways: Solve for α, β, γ coefficients in one swell foop
    1 nls() nonlinear regression in R
    2 Solver optimizer Excel Add-in
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 20 / 74

    View Slide

  33. Determining USL Coefficients
    Everybody’s data scientist but ...
    NASA Dawn spacecraft is orbiting the dwarf planet Ceres (gone dark)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 21 / 74

    View Slide

  34. Determining USL Coefficients
    A little least-squares history
    Ceres was first observed over 200 years ago
    Gauss accurately estimated the (then unknown) orbit of Ceres c.1801
    Already developed least squares statistical regression at 18 yo
    He used little data when everyone else assumed big data was necessary
    Data errors represented by Gaussian distribution
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 22 / 74

    View Slide

  35. Determining USL Coefficients
    Tips at different restaurants
    0 1 2 3 4 5 6
    0 5 10 15
    Restaurant
    Tip ($)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 23 / 74

    View Slide

  36. Determining USL Coefficients
    Tip deviations from mean
    0 1 2 3 4 5 6
    0 5 10 15
    Restaurant
    Tip ($)
    −5
    7
    1
    −2
    4
    −5
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 24 / 74

    View Slide

  37. Determining USL Coefficients
    Are tips related to the bill?
    0 20 40 60 80 100
    0 5 10 15
    Bill ($)
    Tip ($)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 25 / 74

    View Slide

  38. Determining USL Coefficients
    Least squares are real
    0 20 40 60 80 100
    0 5 10 15
    Bill ($)
    Tip ($)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 26 / 74

    View Slide

  39. Determining USL Coefficients
    Relative areas: R2 = 0.7494
    0 20 40 60 80 100
    0 5 10 15
    Bill ($)
    Tip ($)
    Tip
    error
    Model
    error
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 27 / 74

    View Slide

  40. Determining USL Coefficients
    Confidence bands
    See GCaP book Chap. 5 & App. B
    α = 0.04979728
    β = 1.143438e−05
    R2 = 0.9883438
    0
    100
    200
    300
    0 20 40 60
    Processors
    Benchmark throughput (Krays/s)
    USL Analysis of SGI Origin 2000
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 28 / 74

    View Slide

  41. Application Scalability
    Outline
    1 Tall Tales About Giants
    2 Computational Scaling
    3 Universal Scalability Law (USL)
    4 Determining USL Coefficients
    5 Application Scalability
    6 Using Production Data
    7 Wrap Up
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 29 / 74

    View Slide

  42. Application Scalability
    Varnish Scalability
    Data provided by
    Darius Popa, DigitAir and Stefan Parvu, Nokia
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 30 / 74

    View Slide

  43. Application Scalability
    Varnish architecture
    HTTP accelerator
    Reverse web proxy caching system
    Sits in front of classic web server
    Caching handled by virtual memory
    Claim: Highly scalable (read: linear)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 31 / 74

    View Slide

  44. Application Scalability
    Varnish architecture
    HTTP accelerator
    Reverse web proxy caching system
    Sits in front of classic web server
    Caching handled by virtual memory
    Claim: Highly scalable (read: linear) ... but is it?
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 31 / 74

    View Slide

  45. Application Scalability
    JMeter measurements
    Example (Read in raw data and plot it in R)
    df.test <- read.table(fname, header=TRUE, sep="\t")
    plot(df.test$N, df.test$X_N, type="b")
    print(df.test)
    0 100 200 300 400
    0 100 200 300
    USL 1
    Load generators (N)
    Relative capacity C(N)
    N X_N Capacity Efficiency
    1 1 1.4 1.000000 1.0000000
    2 2 2.7 1.928571 0.9642857
    3 5 6.4 4.571429 0.9142857
    4 10 12.8 9.142857 0.9142857
    5 25 32.0 22.857143 0.9142857
    6 50 64.0 45.714286 0.9142857
    7 75 98.0 70.000000 0.9333333
    8 100 131.0 93.571429 0.9357143
    9 150 197.0 140.714286 0.9380952
    10 250 320.0 228.571429 0.9142857
    11 300 392.0 280.000000 0.9333333
    12 400 518.0 370.000000 0.9250000
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 32 / 74

    View Slide

  46. Application Scalability
    Near linear scaling
    0 100 200 300 400
    0 100 200 300
    USL 2
    Load generators (N)
    Relative capacity C(N)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 33 / 74

    View Slide

  47. Application Scalability
    Varnish meets the USL
    0 100 200 300 400
    0 100 200 300
    USL 3
    Load generators (N)
    Relative capacity C(N)
    α = 1e−04
    β = 0
    γ = 0.955364
    pmax = NaN
    Cmax = NaN
    Croof = 10000
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 34 / 74

    View Slide

  48. Application Scalability
    USL beyond the data
    0 1000 2000 3000 4000 5000
    0 500 1000 1500 2000 2500 3000
    USL 4
    Load generators (N)
    Relative capacity C(N)
    α = 1e−04
    β = 0
    γ = 0.955364
    pmax = NaN
    Cmax = NaN
    Croof = 10000
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 35 / 74

    View Slide

  49. Application Scalability
    Linux Network Driver
    XPD: eXpress Data Path
    “XDP: A new fast and programmable network layer”
    Jesper Brouer, Red Hat
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 36 / 74

    View Slide

  50. Application Scalability
    RedHat(IBM?) benchmark data
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 37 / 74

    View Slide

  51. Application Scalability
    USL sees beyond the data
    Projected from 6 to 20 cores
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 38 / 74

    View Slide

  52. Application Scalability
    Memcached
    “Hidden Scalability Gotchas in Memcached and Friends”
    NJG, Shanti Subramanyam, and Stefan Parvu Sun Microsystems
    Presented at Velocity 2010
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 39 / 74

    View Slide

  53. Application Scalability
    Memcached scalability
    Scaleup Scaleout
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 40 / 74

    View Slide

  54. Application Scalability
    Memcache scaleout strategy
    Distributed cache of key-value pairs
    Data pre-loaded from RDBMS backend
    Deploy memcache on cheaper older CPUs (but not multicore)
    Single worker thread ok — until next hardware roll (multicore)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 41 / 74

    View Slide

  55. Application Scalability
    Memcache scalability data
    0 2 4 6 8 10 12
    0 50 100 150 200 250 300
    Worker threads (N)
    Throughput KOPS X(N)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 42 / 74

    View Slide

  56. Application Scalability
    Explains these configuration warnings
    Configuring the memcached server
    Threading is used to scale memcached across CPU’s. The model
    is by "worker threads", meaning that each thread handles concurrent
    connections. ... By default 4 threads are allocated. ... Setting it
    to very large values (80+) will make it run considerably slower.
    Linux man pages - memcached (1)
    -t
    Number of threads to use to process incoming requests. ... It is
    typically not useful to set this higher than the number of CPU cores
    on the memcached server. Setting a high number (64 or more) of worker
    threads is not recommended. The default is 4.
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 43 / 74

    View Slide

  57. Application Scalability
    Memcached load-test data
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 44 / 74

    View Slide

  58. Application Scalability
    Memcached regression analysis
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 45 / 74

    View Slide

  59. Application Scalability
    Memcache scalability model
    0 2 4 6 8 10 12
    0 50 100 150 200 250 300
    Worker threads (N)
    Throughput KOPS X(N)
    α = 0.0468
    β = 0.021016
    γ = 84.89
    Nmax = 6.73
    Xmax = 274.87
    Xroof = 1814.82
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 46 / 74

    View Slide

  60. Application Scalability
    Concurrency parameter
    0 2 4 6 8 10 12
    0 50 100 150 200 250 300
    Worker threads (N)
    Throughput KOPS X(N)
    α = 0.0468
    β = 0.021016
    γ = 84.89
    Nmax = 6.73
    Xmax = 274.87
    Xroof = 1814.82
    α = 0
    β = 0
    Processes
    Capacity
    1 γ = 84.89
    2 Slope of linear bound as Kops/thread
    3 Estimate of throughput X(1) = 84.89 Kops at N = 1 thread
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 47 / 74

    View Slide

  61. Application Scalability
    Contention parameter
    0 2 4 6 8 10 12
    0 50 100 150 200 250 300
    Worker threads (N)
    Throughput KOPS X(N)
    α = 0.0468
    β = 0.021016
    γ = 84.89
    Nmax = 6.73
    Xmax = 274.87
    Xroof = 1814.82
    α >> 0
    β = 0
    1/α
    Processes
    Capacity
    α = 0.0468
    Waiting or queueing for resources about 4.6% of the time
    Max possible throughput is X(1)/α = 1814.78 Kops (Xroof
    )
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 48 / 74

    View Slide

  62. Application Scalability
    Coherency parameter
    0 2 4 6 8 10 12
    0 50 100 150 200 250 300
    Worker threads (N)
    Throughput KOPS X(N)
    α = 0.0468
    β = 0.021016
    γ = 84.89
    Nmax = 6.73
    Xmax = 274.87
    Xroof = 1814.82
    α >> 0
    β > 0
    1/N
    1/α
    Processes
    Capacity
    β = 0.0210 corresponds to retrograde throughput
    Distributed copies of data (e.g., caches) have to be exchanged/updated
    about 2.1% of the time to be consistent
    Peak occurs at Nmax = (1 − α)/β = 6.73 threads
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 49 / 74

    View Slide

  63. Application Scalability
    Improving scalability performance
    0 10 20 30 40 50
    0 5 10 15 20 25
    Threads (N)
    Speedup S(N)
    mcd 1.2.8
    mcd 1.3.2
    mcd 1.3.2 + patch
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 50 / 74

    View Slide

  64. Application Scalability
    Sirius and Zookeeper
    “Sirius: Distributing and Coordinating Application”
    Michael Bevilacqua-Linn, Maulan Byron, Peter Cline,
    Jon Moore, and Steve Muir, Comcast
    Presented at USENIX Annual Technical Conf. 2014
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 51 / 74

    View Slide

  65. Application Scalability
    Distributed voting throughput data
    All downhill ... which looked crazy! (to me)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 52 / 74

    View Slide

  66. Application Scalability
    USL scalability model
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 53 / 74

    View Slide

  67. Application Scalability
    Concurrency parameter
    α = 0
    β = 0
    Processes
    Capacity
    1 γ = 1024.98
    2 Single node is meaningless (need N ≥ 3 for majority)
    3 Interpret γ as N = 1 virtual throughput
    4 USL estimates X(1) = 1024.98 WPS (black square)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 54 / 74

    View Slide

  68. Application Scalability
    Contention parameter
    α >> 0
    β = 0
    1/α
    Processes
    Capacity
    α = 0.05
    Queueing for resources about 5% of the time
    Max possible throughput is X(1)/α = 20499.54 WPS (Xroof
    )
    But Xroof
    not feasible in these systems
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 55 / 74

    View Slide

  69. Application Scalability
    Coherency parameter
    α >> 0
    β > 0
    1/N
    1/α
    Processes
    Capacity
    β = 0.1651 says retrograde throughput dominates!
    Distributed data being exchanged (compared?) about 16.5% of the time
    (virtual) Peak at Nmax = (1 − α)/β = 2.4 cluster nodes
    Shocking but that’s exactly how it’s supposed to work!
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 56 / 74

    View Slide

  70. Using Production Data
    Outline
    1 Tall Tales About Giants
    2 Computational Scaling
    3 Universal Scalability Law (USL)
    4 Determining USL Coefficients
    5 Application Scalability
    6 Using Production Data
    7 Wrap Up
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 57 / 74

    View Slide

  71. Using Production Data
    AWS Cloud Application
    “Exposing the Cost of Performance Hidden in the Cloud”
    NJG and Mohit Chawla, Germany
    Presented at CMG cloudXchange 2018
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 58 / 74

    View Slide

  72. Using Production Data
    Production data
    Previously measured X and R directly on test rig
    Table 1: Converting data to performance metrics
    Data Meaning Metrics Meaning
    T Elapsed time X = C/T Throughtput
    Tp
    Processing time R = (Tp/T)(T/C) Response time
    C Completed work N = X × R Concurrent threads
    Ucpu
    CPU utilization S = Ucpu/X Service time
    Example (Coalesced metrics)
    Linux epoch Timestamp interval between rows is 300 seconds
    Timestamp, X, N, S, R, U_cpu
    1486771200000, 502.171674, 170.266663, 0.000912, 0.336740, 0.458120
    1486771500000, 494.403035, 175.375000, 0.001043, 0.355975, 0.515420
    1486771800000, 509.541751, 188.866669, 0.000885, 0.360924, 0.450980
    1486772100000, 507.089094, 188.437500, 0.000910, 0.367479, 0.461700
    1486772400000, 532.803039, 191.466660, 0.000880, 0.362905, 0.468860
    ...
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 59 / 74

    View Slide

  73. Using Production Data
    Tomcat data from AWS
    0 100 200 300 400 500
    0 200 400 600 800
    Tomcat threads
    Throughput (RPS)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 60 / 74

    View Slide

  74. Using Production Data
    USL nonlinear analysis
    0 100 200 300 400 500
    0 200 400 600 800
    Tomcat threads
    Throughput (RPS)
    α = 0
    β = 3e−06
    γ = 3
    Nmax = 539.2
    Xmax = 809.55
    Nopt = 274.8
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 61 / 74

    View Slide

  75. Using Production Data
    Concurrency parameter
    0 100 200 300 400 500
    0 200 400 600 800
    Tomcat threads
    Throughput (RPS)
    α = 0
    β = 3e−06
    γ = 3
    Nmax = 539.2
    Xmax = 809.55
    Nopt = 274.8
    α = 0
    β = 0
    Processes
    Capacity
    1 γ = 3.0
    2 Smallest number of threads during 24 hr sample is N > 100
    3 Nonetheless USL estimates throughput X(1) = 3 RPS
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 62 / 74

    View Slide

  76. Using Production Data
    Contention parameter
    0 100 200 300 400 500
    0 200 400 600 800
    Tomcat threads
    Throughput (RPS)
    α = 0
    β = 3e−06
    γ = 3
    Nmax = 539.2
    Xmax = 809.55
    Nopt = 274.8
    α >> 0
    β = 0
    1/α
    Processes
    Capacity
    α = 0
    No significant waiting or queueing
    Max possible throughput Xroof
    not defined
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 63 / 74

    View Slide

  77. Using Production Data
    Coherency parameter
    0 100 200 300 400 500
    0 200 400 600 800
    Tomcat threads
    Throughput (RPS)
    α = 0
    β = 3e−06
    γ = 3
    Nmax = 539.2
    Xmax = 809.55
    Nopt = 274.8
    α >> 0
    β > 0
    1/N
    1/α
    Processes
    Capacity
    β = 3 × 10−6 implies very weak retrograde throughput
    Extremely little data exchange
    But entirely responsible for sublinearity
    And peak throughput Xmax = 809.55 RPS
    Peak occurs at Nmax = 539.2 threads
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 64 / 74

    View Slide

  78. Using Production Data
    Revised USL analysis
    Parallel threads implies linear scaling
    Linear slope γ ∼ 3: γ = 2.65
    Should be no contention, i.e., α = 0
    Discontinuity at N ∼ 275 threads
    Throughput plateaus, i.e., β = 0
    Saturation occurs at processor
    utilization UCPU
    ≥ 75%
    Linux OS can’t do that!
    Pseudo-saturation due to AWS Auto
    Scaling policy (hypervisor?)
    Many EC2 instances spun up and down
    during 24 hrs
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 65 / 74

    View Slide

  79. Using Production Data
    Corrected USL linear model
    0 100 200 300 400 500
    0 200 400 600 800
    Tomcat threads
    Throughput (RPS)
    α = 0
    β = 0
    γ = 2.65
    Nmax = NaN
    Xmax = 727.03
    Nopt = 274.8
    Parallel threads Pseudo−saturation
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 66 / 74

    View Slide

  80. Using Production Data
    MySQL Big Data
    2,000 production database logs
    500,000 data points
    NJG with Baron Schwartz and Preetam Jinka
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 67 / 74

    View Slide

  81. Using Production Data
    How do you comprehend big data?
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 68 / 74

    View Slide

  82. Using Production Data
    MySQL big data — the (horror) movie
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 69 / 74

    View Slide

  83. Using Production Data
    USL analysis
    Analyzed some 2,000 production database logs
    I resorted to animation as a visualization tool
    About 500,000 data points in aggregate
    0 1 2 3 4 5 6 7
    0 5000 10000 15000 20000
    USL3: [ 917 ] mysql 10.0.24−MariaDB
    Concurrent processes
    Queries per second
    α = 0.0469166
    β = 0.0067516
    γ = 5444.43
    X1 = 5196.11
    Npeak = 11.88
    Xpeak = 25910.07
    Nopt = 21.31
    Xmax = 116044.9
    0 10 20 30 40
    0 5000 10000 15000 20000
    USL3: [ 1403 ] mysql 5.5.52−0ubuntu0.12.04.1−log ...
    Concurrent processes
    Queries per second
    α = 0.0456189
    β = 0.00118014
    γ = 2271.33
    X1 = 3195.96
    Npeak = 28.44
    Xpeak = 24074.24
    Nopt = 21.92
    Xmax = 49789.25
    Comparison of USL parameters
    USL found unexpected progressive changes in scalability
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 70 / 74

    View Slide

  84. Wrap Up
    Outline
    1 Tall Tales About Giants
    2 Computational Scaling
    3 Universal Scalability Law (USL)
    4 Determining USL Coefficients
    5 Application Scalability
    6 Using Production Data
    7 Wrap Up
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 71 / 74

    View Slide

  85. Wrap Up
    R packages for the USL
    1 SATK on R-Forge
    Author: Paul Puglia (Guerrilla graduate)
    Applies multiple USL coefficient models for best fit
    install.packages("SATK", repos="http://R-Forge.R-project.org")
    library(SATK)
    data(USLcalc)
    uslcalc.zones <- zones(USLcalc)
    plot(uslcalc.zones)
    2 usl on CRAN
    Author: Stefan M¨
    oding
    Uses both nls() from base and nlxb() from nlsr package
    install.packages("usl")
    library(usl)
    data(specsdm91)
    usl.model <- usl(throughput ˜ load, specsdm91)
    summary(usl.model)
    peak.scalability(usl.model)
    plot(specsdm91, pch=16)
    plot(usl.model, col="red", add=TRUE)
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 72 / 74

    View Slide

  86. Wrap Up
    Response time scalability
    Presented throughput scalability
    Response time scalability?
    Brooks’ law (too many cooks)
    Queueing theory foundations
    Queueing simulations
    XN
    =
    γN
    1 + α (N − 1) + β N(N − 1)
    RN =
    N
    XN
    − Z
    0 2 4 6 8 10
    People
    2
    4
    6
    8
    10
    12
    Months
    Fixed delay due to meetings
    0 2 4 6 8 10
    People
    2
    4
    6
    8
    10
    Months
    Growing delay due to 1 on 1 mtgs
    0 2 4 6 8 10
    People
    5
    10
    15
    Months
    0 2 4 6 8 10
    People
    0.5
    1.0
    1.5
    2.0
    Output
    Parallel
    Amdahl
    Brooks
    USL
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 73 / 74

    View Slide

  87. Wrap Up
    Questions?
    www.perfdynamics.com
    Castro Valley, California
    Twitter twitter.com/DrQz
    Facebook facebook.com/PerformanceDynamics
    Blog perfdynamics.blogspot.com
    Training classes perfdynamics.com/Classes
    [email protected]
    +1-510-537-5758
    c 2018 Performance Dynamics The Data Analytics of Application Scaling November 15, 2018 74 / 74

    View Slide