Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Quantifying Scalability FTW

Quantifying Scalability FTW

Presented at SURGE 2010 Conference

Dr. Neil Gunther

October 01, 2010
Tweet

More Decks by Dr. Neil Gunther

Other Decks in Technology

Transcript

  1. Quantifying Scalability FTW
    How to do a scalability surge in ∆t < 1 hour
    Dr. Neil J. Gunther
    Performance Dynamics
    SURGE 2010
    Sept 30 – Oct 1
    SM
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 1 / 45

    View Slide

  2. Scaling vs. Scalability
    Outline
    1 Scaling vs. Scalability
    2 Components of Scalability
    3 Problem: Bad scalability data
    4 Problem: eBay 1.0 scalability
    5 Problem: memcache scalability
    6 Summary and Review
    7 Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 2 / 45

    View Slide

  3. Scaling vs. Scalability
    Motivation for This Talk
    Practical methodology for assessing the cost-benefit of a given
    scalability strategy
    quantify system scalability
    scalability is not a single number (it’s a function)
    all measurements are wrong by definition
    need a framework to validate data
    measurement + model == information
    Scalability: sustainable performance under increasing load (size N)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 3 / 45

    View Slide

  4. Scaling vs. Scalability
    Jack and the Beanstalk
    Jack climbs a magic
    beanstalk up into the
    clouds (10,000 ft?)
    Guarded by a giant who
    is 10 times bigger than
    Jack
    “Fee-fie-foe-fum!” and all
    that
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 4 / 45

    View Slide

  5. Scaling vs. Scalability
    Where Are All the Giants?
    Can giants exist?
    Can 10,000’ beanstalk exist?
    Guinness world record
    Robert P. Wadlow (USA)
    Height: 8’11” (2.72 m)
    Jack
    Height: 1.8 m tall (L)
    Weight: 90 kg
    Giant (10x bigger)
    Height: 18 m tall (10 × L)
    L3 × 90 kg = 103 × 90 kg
    Weight: 90,000 kg
    A bone-crushing 100 tons!
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 5 / 45

    View Slide

  6. Scaling vs. Scalability
    Scaling vs. Scalability
    Natural scaling
    Inherent critical limits to sustainable loads
    When the load (volume) exceeds the material strength (supporting
    area), things tend to snap
    Load ∼ L3 (volume), but strength ∼ L2 (cross-section area)
    Computer scalability
    No critical limit
    Point of diminishing returns
    Scalability is about sustainable size
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 6 / 45

    View Slide

  7. Scaling vs. Scalability
    Natural System Scaling
    Weight
    Strength
    0.0 0.5 1.0 1.5 2.0
    Weight
    0.2
    0.4
    0.6
    0.8
    1.0
    1.2
    1.4
    Scalability
    Giant’s legs, beanstalks, bridges, collapse where the curves cross
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 7 / 45

    View Slide

  8. Scaling vs. Scalability
    Computer System Scaling
    Scaling
    Degradation
    0 200 400 600 800 1000
    Users
    100
    200
    300
    400
    500
    600
    Scalability
    Critical point is maximum in throughput curve
    Beyond max performance degradation or retrograde scalability
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 8 / 45

    View Slide

  9. Scaling vs. Scalability
    Web 2.0 Scalability Fails
    Twitter.com
    Amazon EC2
    Cuil.com
    Apple iStore
    Google Gmail
    WolframAlpha
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 9 / 45

    View Slide

  10. Scaling vs. Scalability
    Scalability is Not a Number
    Google 2005 paper “Parallel Analysis with Sawzall”
    “If scaling were perfect, performance would be proportional to the number of machines... In
    our test, the effect is to contribute 0.98 machines.”
    Translation: Not 100% linear but 98% of linear
    or C++, while capable of handling such tasks, are more awkward to use and require more effort
    on the part of the programmer. Still, Awk and Python are not panaceas; for instance, they have no
    inherent facilities for processing data on multiple machines.
    Since the data records we wish to process do live on many machines, it would be fruitful to exploit
    the combined computing power to perform these analyses. In particular, if the individual steps
    can be expressed as query operations that can be evaluated one record at a time, we can distribute
    the calculation across all the machines and achieve very high throughput. The results of these
    operations will then require an aggregation phase. For example, if we are counting records, we
    need to gather the counts from the individual machines before we can report the total count.
    We therefore break our calculations into two phases. The first phase evaluates the analysis on
    each record individually, while the second phase aggregates the results (Figure 2). The system
    described in this paper goes even further, however. The analysis in the first phase is expressed in a
    new procedural programming language that executes one record at a time, in isolation, to calculate
    query results for each record. The second phase is restricted to a set of predefined aggregators
    that process the intermediate results generated by the first phase. By restricting the calculations
    to this model, we can achieve very high throughput. Although not all calculations fit this model
    well, the ability to harness a thousand or more machines with a few lines of code provides some
    compensation.
    !""#$"%&'#(
    !"#$%&'#%()*$('
    +,-$$%&'&.$.'
    !
    !
    )*+&$#,(
    /.0'&.$.'
    Figure 2: The overall flow of filtering, aggregating, and collating. Each stage typically
    involves less data than the previous.
    Of course, there are still many subproblems that remain to be solved. The calculation must be
    divided into pieces and distributed across the machines holding the data, keeping the computation
    as near the data as possible to avoid network bottlenecks. And when there are many machines
    there is a high probability of some of them failing during the analysis, so the system must be
    3
    Theo Schlossnagle: “Linear scaling is simply a falsehood” p.71
    Scalability is a function
    Not a number
    Always limits, e.g., throughput capacity
    Want to quantify such limits
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 10 / 45

    View Slide

  11. Components of Scalability
    Outline
    1 Scaling vs. Scalability
    2 Components of Scalability
    3 Problem: Bad scalability data
    4 Problem: eBay 1.0 scalability
    5 Problem: memcache scalability
    6 Summary and Review
    7 Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 11 / 45

    View Slide

  12. Components of Scalability
    Math Phobes Can Relax
    Proper quantification involves math
    Quantifying scalability requires some math
    But nothing as complicated as this1
    Pr{Murphy} =
    (U + C + I) × (10 − S)
    20
    ×
    A
    1 − sin(F/10)
    I have no idea what this equation is (ask Theo )
    1Source: Theo Schlossnagle, Scalable Intenet Architectures, p.12
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 12 / 45

    View Slide

  13. Components of Scalability
    Equal Bang for the Buck (Concurrency)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 13 / 45

    View Slide

  14. Components of Scalability
    Cost of Sharing Resources (Contention)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 14 / 45

    View Slide

  15. Components of Scalability
    Diminishing Returns (Saturation)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 15 / 45

    View Slide

  16. Components of Scalability
    Negative ROI (Inconsistency Delays)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 16 / 45

    View Slide

  17. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  18. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    C is the scalability function of N
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  19. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    C is the scalability function of N
    C(N, α, β) =
    N
    1 + α (N − 1) + β N(N − 1)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  20. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    C is the scalability function of N
    C(N, α, β) =
    N
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  21. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    C is the scalability function of N
    C(N, α, β) =
    N
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    1
    Concurrency
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  22. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    C is the scalability function of N
    C(N, α, β) =
    N
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    1
    Concurrency
    2
    Contention (amount α)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  23. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    C is the scalability function of N
    C(N, α, β) =
    N
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    1
    Concurrency
    2
    Contention (amount α)
    3
    Consistency as in ACID & CAP Thm (amount β)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  24. Components of Scalability
    Universal Scalability Law (USL)
    Pulling it all together:
    N users or processes
    C is the scalability function of N
    C(N, α, β) =
    N
    1 + α (N − 1) + β N(N − 1)
    Three Cs:
    1
    Concurrency
    2
    Contention (amount α)
    3
    Consistency as in ACID & CAP Thm (amount β)
    Theorem (Universality)
    Only need α, β coefficients to determine maximum in C(N)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

    View Slide

  25. Problem: Bad scalability data
    Outline
    1 Scaling vs. Scalability
    2 Components of Scalability
    3 Problem: Bad scalability data
    4 Problem: eBay 1.0 scalability
    5 Problem: memcache scalability
    6 Summary and Review
    7 Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 18 / 45

    View Slide

  26. Problem: Bad scalability data
    Data Are Not Divine
    Data come from the Devil Models come from God
    Skepticism should rule!
    Theorem
    Data + Models ≡ Insight
    Data needs to be put in prison (a model) and made to confess the truth
    Corollary
    Waterboarding your data is ok
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 19 / 45

    View Slide

  27. Problem: Bad scalability data
    Scalability Measurements
    J2EE web application
    Throughput measurements using Apache Jmeter
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 20 / 45

    View Slide

  28. Problem: Bad scalability data
    Bad Data
    The Problem
    Monotonically increasing, looks ok visually
    but some data are > 100% efficient
    Can’t haz
    Or your have some very serious explaining to do
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 21 / 45

    View Slide

  29. Problem: Bad scalability data
    Put Your Data in Prison
    Excel table of various USL quantities.
    Column F is scaling efficiency: C/N. Between N = 5 and 150 vusers,
    efficiencies > 1.0. Can’t have more than 100% of anything. Need to explain?
    Data + Model == Information
    Merely attempting to set up the USL model in Excel or R, shows
    measurement data (not the model) are wrong.
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 22 / 45

    View Slide

  30. Problem: eBay 1.0 scalability
    Outline
    1 Scaling vs. Scalability
    2 Components of Scalability
    3 Problem: Bad scalability data
    4 Problem: eBay 1.0 scalability
    5 Problem: memcache scalability
    6 Summary and Review
    7 Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 23 / 45

    View Slide

  31. Problem: eBay 1.0 scalability
    eBay 1.0 Capacity Upgrades
    The Problem
    Want to compare capacity upgrades for Sun E10K backend
    Running ORA dbms for both OLTP and DSS
    eBay 1.0 had no performance measurements of their app
    eBay Inc. was just hiring into a QA/load-test group
    No scalability measurements
    Sun PS provided me with their M-values
    M-values ⇒ α 0.005
    But that’s only α =
    1
    2
    % contention ... WTF !?
    ORA dmbs is more typically α ≈ 3%
    But at least we have things quantified
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 24 / 45

    View Slide

  32. Problem: eBay 1.0 scalability
    eBay 1.0 Optimistic Projections
    Python: CPU Upgrade Scenarios - Optimistic
    0.00
    50.00
    100.00
    150.00
    200.00
    250.00
    0 4 8 12 16 20 24 28 32
    Weeks since 8/5/99
    Total Utilization (%)
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    1 E10K
    2 E10Ks
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 25 / 45

    View Slide

  33. Problem: eBay 1.0 scalability
    How to keep the peace?
    The Solution
    Apply the USL model to Sun’s M-values
    C(N) =
    N
    1 + α(N − 1) + βN(N − 1)
    ORA backend ⇒ α 0.03
    Simply re-run the USL curves with that value. Voila!
    Creates a scalability envelope
    eBay mileage will vary within this scalability envelope
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 26 / 45

    View Slide

  34. Problem: eBay 1.0 scalability
    eBay 1.0 Realistic Projections
    CPU Upgrade Scenarios - Realistic
    0.00
    50.00
    100.00
    150.00
    200.00
    250.00
    0 4 8 12 16 20 24 28 32
    Weeks since 8/5/99
    Total Utilization (%)
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    1 E10K
    2 E10Ks
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 27 / 45

    View Slide

  35. Problem: memcache scalability
    Outline
    1 Scaling vs. Scalability
    2 Components of Scalability
    3 Problem: Bad scalability data
    4 Problem: eBay 1.0 scalability
    5 Problem: memcache scalability
    6 Summary and Review
    7 Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 28 / 45

    View Slide

  36. Problem: memcache scalability
    Velocity 2010, June 24
    Velocity 2010, June 24 2
    2
    Scalability
    Scalability
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 29 / 45

    View Slide

  37. Problem: memcache scalability
    Scale out with memcache
    Tiers of older servers
    Servers often blades
    Mostly single processor
    Single threading ok
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 30 / 45

    View Slide

  38. Problem: memcache scalability
    But ...
    Datacenter HW gets rolled
    Single-CPU blades will be replaced with multicores
    Multicores will be the only game in town (HW vendor decision)
    The Problem
    memcached is thread limited on multicores
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 31 / 45

    View Slide

  39. Problem: memcache scalability
    The evidence
    Velocity 2010, June 24
    Velocity 2010, June 24 12
    12
    Memcached
    Memcached scaling is thread limited
    scaling is thread limited
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 32 / 45

    View Slide

  40. Problem: memcache scalability
    Performance measurement rig
    SunFire X4170 w/ 64 GB RAM2
    Intel Nehalem multicores
    2 processor sockets ⇒ 2 quad-cores == 8 cores
    Intel SMT enabled ⇒ 2 threads/ core
    16 virtual CPUs seen by Solaris
    Load generators ←→ 10Gbps link ←→ SUT
    2Joint work with Sun, pre-Oracle acquistion
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 33 / 45

    View Slide

  41. Problem: memcache scalability
    USL regression in Excel
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 34 / 45

    View Slide

  42. Problem: memcache scalability
    USL regression in R
    # Standard non-linear least squares (NLS) fit using USL model
    usl <- nls(Norm ˜ N/(1 + alpha * (N-1) + beta * N * (N-1)),
    input, start=c(alpha=0.1, beta=0.01))
    # Get alpha & beta parameters for use in plot legend
    x.coef <- coef(usl)
    # Determine sum-of-squares for R-squared coeff from NLS fit
    sse <- sum((input$Norm - predict(usl))ˆ2)
    sst <- sum((input$Norm - mean(input$Norm))ˆ2)
    # Calculate Nmax and X(Nmax)
    Nmax<-sqrt((1-x.coef[’alpha’])/x.coef[’beta’])
    Xmax<-input$X_N[1]* Nmax/(1 + x.coef[’alpha’] * (Nmax-1) +
    x.coef[’beta’] * Nmax * (Nmax-1))
    # Plot all the results
    plot(x<-c(0:max(input$N)), input$X_N[1] ...)
    title("USL Scalability")
    points(input$N, input$X_N)
    legend("bottom", legend=eval(parse(text=sprintf(...)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 35 / 45

    View Slide

  43. Problem: memcache scalability
    Raw bench data
    p
    Xp
    50
    100
    150
    200
    250
    300
    10 20 30 40 50 60
    Data smoother
    p
    Xp
    50
    100
    150
    200
    250
    300
    10 20 30 40 50 60
    USL fit
    p
    Xp
    50
    100
    150
    200
    250
    300
    10 20 30 40 50 60
    USL fit + CI bands
    p
    Xp
    50
    100
    150
    200
    250
    300
    10 20 30 40 50 60
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 36 / 45

    View Slide

  44. Problem: memcache scalability
    The envelope please!
    Table: Intel 2-socket duo core + SMT
    Version α β Nmax
    1.2.8 0.0255 0.0210 7
    1.4.1 0.0821 0.0207 6
    1.4.5 0.0988 0.0209 6
    Little’s law 3: N = X(R + Z) threads
    Also know R is on the order of ms (10−3 s), so latency dominated
    by client-side “think time” Z = 5 s in tests
    Avg X ≈ 350 KOPS on Intel quad-core
    Therefore: N ≈ 350 × 103 × 5 = 1, 750, 000 threads
    Same as users, assuming 1 user process per thread
    3See e.g., Scalable Intenet Architectures, p.127
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 37 / 45

    View Slide

  45. Problem: memcache scalability
    SPARC Solaris mods
    Table: SPARC T2 + Solaris
    Version α β Nmax
    Vanilla 0.0041 0.0092 22
    Modified 0.0000 0.0004 48
    The Solution
    Partitioned mcd hash table
    Single hash table contention avoided by partitioning table
    Solaris patches improve scalability to ≈ 40 threads
    Throughput X increases from 200 → 400 KOPS on SPARC CMT
    Can’t assume same 2x win on x86 arch
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 38 / 45

    View Slide

  46. Summary and Review
    Outline
    1 Scaling vs. Scalability
    2 Components of Scalability
    3 Problem: Bad scalability data
    4 Problem: eBay 1.0 scalability
    5 Problem: memcache scalability
    6 Summary and Review
    7 Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 39 / 45

    View Slide

  47. Summary and Review
    Why Should You Care?
    Werner Vogels, Amazon CTO
    “Scalability is hard because it cannot be an after-thought. Good scalability is
    possible, but only if we architect and engineer our systems to take scalability
    into account.”
    Old reason: Concurrent programming was hard on SMPs
    New reason: Multicores are SMPs on a chip (HW vendor decision)
    More threads enable higher concurrency, shorter user latencies
    But it’s hard: beware the 3rd C in the USL (β coefficient)
    Theo Schlossnagle, OmniTI CEO
    “Simply having a solution that scales horizontally doesn’t mean that you are
    safe.”
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 40 / 45

    View Slide

  48. Summary and Review
    Where is Your Application?
    Class A Class B
    Ideal concurrency (α, β = 0) Contention-only (α > 0, β = 0)
    Shared-nothing platform Message-based queueing (e.g., MQSeries)
    Google text search Message Passing Interface (MPI) applications
    Lexus–Nexus search Transaction monitors (e.g., Tuxedo)
    Read-only queries Polling service (e.g., VMWare)
    Peer-to-peer (e.g., Skype)
    Class C Class D
    Incoherent-only (α = 0, β > 0) Worst case (α, β > 0)
    Scientific HPC computations Anything with shared writes
    Online analytic processing (OLAP) Hotel reservation system
    Data mining Banking online transaction processing (OLTP)
    Decision support software (DSS), Java database connectivity (JDBC)
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 41 / 45

    View Slide

  49. Summary and Review
    USL Scalability Zones
    Think scalability zones rather than Scalability curves
    A
    B
    C
    0 20 40 60 80 100 120
    N
    0
    200
    400
    600
    800
    1000
    X N
    Websphere measurements (dots)
    A Asynchronous messaging (average queue lengths)
    B Synchronous messaging (worst queue lengths)
    C Synchronous messaging + pairwise exchanges
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 42 / 45

    View Slide

  50. Resources and Coordinates
    Outline
    1 Scaling vs. Scalability
    2 Components of Scalability
    3 Problem: Bad scalability data
    4 Problem: eBay 1.0 scalability
    5 Problem: memcache scalability
    6 Summary and Review
    7 Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 43 / 45

    View Slide

  51. Resources and Coordinates
    Resources and Coordinates
    Castro Valley, California, 94552
    Resources:
    Books
    Training
    USL tools
    Coordinates:
    www.perfdynamics.com
    perfdynamics.blogspot.com
    twitter.com/DrQz
    [email protected]
    +1-510-537-5758
    Chapters 4–6
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 44 / 45

    View Slide

  52. Resources and Coordinates
    c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 45 / 45

    View Slide