Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hadoop Super Scaling

Hadoop Super Scaling

Invited Tech Talk at Salesforce HQ in San Francisco

Dr. Neil Gunther

August 09, 2016
Tweet

More Decks by Dr. Neil Gunther

Other Decks in Research

Transcript

  1. Hadoop Super Scaling
    Dr. Neil Gunther — @DrQz
    Performance Dynamics Labs
    Salesforce Tech Talk
    August 8, San Francisco
    SM
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 1 / 55

    View Slide

  2. What this talk is about
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 2 / 55

    View Slide

  3. What this talk is about
    Superlinear
    Linear
    Sublinear
    Processors
    Speedup
    Scalability: Performance gain due to increasing resource capacity
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 2 / 55

    View Slide

  4. Qualitative Scalability
    Qualitative means:
    Scalability from an operational view
    Scalability as configuration recipes
    Lot’s a words (on blogs), but no numbers
    Cost-benefit analysis demands numbers, not words.
    Need to measure scalability appropriately it to quantify it.
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

    View Slide

  5. Qualitative Scalability
    Qualitative means:
    Scalability from an operational view
    Scalability as configuration recipes
    Lot’s a words (on blogs), but no numbers
    Cost-benefit analysis demands numbers, not words.
    Need to measure scalability appropriately it to quantify it.
    But how?
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

    View Slide

  6. Qualitative Scalability
    Qualitative means:
    Scalability from an operational view
    Scalability as configuration recipes
    Lot’s a words (on blogs), but no numbers
    Cost-benefit analysis demands numbers, not words.
    Need to measure scalability appropriately it to quantify it.
    But how?
    Need controlled measurements (e.g., Apache JMeter)
    Cannot understand scalability by monitoring Prod systems. The human brain
    is not built for that. Need to transform time-series data to informational
    performance metrics.
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

    View Slide

  7. Quantitative Scalability
    Google 2005 on MapReduce: “If scaling were perfect, performance would be
    proportional to the number of machines. In our test, it was 0.98 of the machines.”
    Since the data records we wish to process do live on many machines, it would be fruitful to exploit
    the combined computing power to perform these analyses. In particular, if the individual steps
    can be expressed as query operations that can be evaluated one record at a time, we can distribute
    the calculation across all the machines and achieve very high throughput. The results of these
    operations will then require an aggregation phase. For example, if we are counting records, we
    need to gather the counts from the individual machines before we can report the total count.
    We therefore break our calculations into two phases. The first phase evaluates the analysis on
    each record individually, while the second phase aggregates the results (Figure 2). The system
    described in this paper goes even further, however. The analysis in the first phase is expressed in a
    new procedural programming language that executes one record at a time, in isolation, to calculate
    query results for each record. The second phase is restricted to a set of predefined aggregators
    that process the intermediate results generated by the first phase. By restricting the calculations
    to this model, we can achieve very high throughput. Although not all calculations fit this model
    well, the ability to harness a thousand or more machines with a few lines of code provides some
    compensation.
    !""#$"%&'#(
    !"#$%&'#%()*$('
    +,-$$%&'&.$.'
    !
    !
    )*+&$#,(
    /.0'&.$.'
    Figure 2: The overall flow of filtering, aggregating, and collating. Each stage typically
    involves less data than the previous.
    Of course, there are still many subproblems that remain to be solved. The calculation must be
    divided into pieces and distributed across the machines holding the data, keeping the computation
    as near the data as possible to avoid network bottlenecks. And when there are many machines
    there is a high probability of some of them failing during the analysis, so the system must be
    3
    Translation: MR scalability is 98% of ideal linear scaling
    Scalability is a function , not a single number
    Diminishing returns due to increasing overhead
    Want to express overhead loss quantitatively
    But what (mathematical) function?
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 4 / 55

    View Slide

  8. The Speedup Metric
    Commonly used in the context of parallel processing performance
    I’ll denote it by the symbol Sp
    in this talk
    Expect Sp = p if linear with p parallel processors
    Superlinear if Sp > p
    Example (MIT Swarm processor)
    Some of these speedup profiles look superlinear(?)
    1
    32
    64
    Speedup
    1c 32c 64c
    bfs
    117x
    1c 32c 64c
    sssp
    1c 32c 64c
    astar
    1c 32c 64c
    msf
    1c 32c 64c
    des
    1c 32c 64c
    silo
    Swarm Software-only parallel
    Figure 9. Speedup of Swarm and state-of-the-art software-parallel implementations from 1 to 64 cores, relative to a tuned
    serial implementation running on a system of the same size. At 64 cores, Swarm programs are 43 to 117 times faster than the
    serial versions and 2.7 to 18 times faster than software-parallel versions.
    80
    100
    (%)
    1,200
    1,400
    sed
    2.6K 2.6K 2.3K 2.7K
    “Unlocking Ordered Parallelism with the Swarm Architecture,” IEEE Micro, Issue No. 03, vol.36,
    105–117 (2016 )
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 5 / 55

    View Slide

  9. How to Quantify Scalability
    Outline
    1 How to Quantify Scalability
    Components of Scalability
    Universal Scalability Law (USL)
    2 Applying the USL
    Varnish
    Memcached
    Tomcat Java Application
    Sirius (Zookeeper)
    3 Superlinear Scaling
    What it looks like
    Perpetual motion
    Hunting the Superlinear Snark
    4 Superlinear Payback Trap
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 6 / 55

    View Slide

  10. How to Quantify Scalability Components of Scalability
    Equal bang for your buck: Concurrency
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 7 / 55

    View Slide

  11. How to Quantify Scalability Components of Scalability
    Diminishing returns: Contention cost
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 8 / 55

    View Slide

  12. How to Quantify Scalability Components of Scalability
    Resource saturation: More contention cost
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 9 / 55

    View Slide

  13. How to Quantify Scalability Components of Scalability
    Negative returns: Coherency of non-local data
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 10 / 55

    View Slide

  14. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  15. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  16. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    Question: What kind of function is Sp
    ?
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  17. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    Question: What kind of function is Sp
    ?
    Answer: A rational function 1
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  18. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    Question: What kind of function is Sp
    ?
    Answer: A rational function 1
    Sp(σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  19. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    Question: What kind of function is Sp
    ?
    Answer: A rational function 1
    Sp(σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    The three Cs:
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  20. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    Question: What kind of function is Sp
    ?
    Answer: A rational function 1
    Sp(σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    The three Cs:
    1
    Concurrency
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  21. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    Question: What kind of function is Sp
    ?
    Answer: A rational function 1
    Sp(σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    The three Cs:
    1
    Concurrency
    2
    Contention (0 < σ < 1)
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  22. How to Quantify Scalability Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    p processors or processes provide system load
    Sp
    speedup performance function ≡ normalized thruput
    Question: What kind of function is Sp
    ?
    Answer: A rational function 1
    Sp(σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    The three Cs:
    1
    Concurrency
    2
    Contention (0 < σ < 1)
    3
    Coherency (0 < κ < 1)
    1
    NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

    View Slide

  23. How to Quantify Scalability Universal Scalability Law (USL)
    Measurement meets Model
    X(p)
    X(1)
    Thruput data
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

    View Slide

  24. How to Quantify Scalability Universal Scalability Law (USL)
    Measurement meets Model
    X(p)
    X(1)
    Thruput data
    −→ Sp(σ, κ)
    Speedup
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

    View Slide

  25. How to Quantify Scalability Universal Scalability Law (USL)
    Measurement meets Model
    X(p)
    X(1)
    Thruput data
    −→ Sp(σ, κ)
    Speedup
    ←−
    p
    1 + σ (p − 1) + κ p(p − 1)
    USL model
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

    View Slide

  26. How to Quantify Scalability Universal Scalability Law (USL)
    Measurement meets Model
    X(p)
    X(1)
    Thruput data
    −→ Sp(σ, κ)
    Speedup
    ←−
    p
    1 + σ (p − 1) + κ p(p − 1)
    USL model
    0 20 40 60 80 100
    5 10 15 20
    Processors (p)
    Speedup S(p)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

    View Slide

  27. How to Quantify Scalability Universal Scalability Law (USL)
    How do we determine σ and κ?
    S(p, σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    Brute force measurements (good luck!)
    Data from controlled measurements, e.g., JMeter
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

    View Slide

  28. How to Quantify Scalability Universal Scalability Law (USL)
    How do we determine σ and κ?
    S(p, σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    Brute force measurements (good luck!)
    Data from controlled measurements, e.g., JMeter
    Clever way: Apply statistical regression
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

    View Slide

  29. How to Quantify Scalability Universal Scalability Law (USL)
    How do we determine σ and κ?
    S(p, σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    Brute force measurements (good luck!)
    Data from controlled measurements, e.g., JMeter
    Clever way: Apply statistical regression
    I’ll use R stats tools throughout this talk:
    FOSS with 40 yr history since S at Bell Labs
    GDAT: Guerrilla Data Analysis Techniques
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

    View Slide

  30. How to Quantify Scalability Universal Scalability Law (USL)
    How do we determine σ and κ?
    S(p, σ, κ) =
    p
    1 + σ (p − 1) + κ p(p − 1)
    Brute force measurements (good luck!)
    Data from controlled measurements, e.g., JMeter
    Clever way: Apply statistical regression
    I’ll use R stats tools throughout this talk:
    FOSS with 40 yr history since S at Bell Labs
    GDAT: Guerrilla Data Analysis Techniques
    Magic functions in R:
    nls() nonlinear regression → σ, κ in one swell foop
    optimize() to estimate Xdata
    (1) if missing
    predict() smooth interpolation/extrapolation from data
    plot() with various bells & whistles
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

    View Slide

  31. Applying the USL
    Outline
    1 How to Quantify Scalability
    Components of Scalability
    Universal Scalability Law (USL)
    2 Applying the USL
    Varnish
    Memcached
    Tomcat Java Application
    Sirius (Zookeeper)
    3 Superlinear Scaling
    What it looks like
    Perpetual motion
    Hunting the Superlinear Snark
    4 Superlinear Payback Trap
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 14 / 55

    View Slide

  32. Applying the USL Varnish
    Varnish
    Data provided by
    D. Popa (DigitAir, RO)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 15 / 55

    View Slide

  33. Applying the USL Varnish
    Varnish Architecture
    HTTP accelerator
    Reverse web proxy caching system
    Sits in front of classic web server
    Caching handled by virtual memory
    Claim: Highly scalable (read: linear)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55

    View Slide

  34. Applying the USL Varnish
    Varnish Architecture
    HTTP accelerator
    Reverse web proxy caching system
    Sits in front of classic web server
    Caching handled by virtual memory
    Claim: Highly scalable (read: linear) ... but is it?
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55

    View Slide

  35. Applying the USL Varnish
    Varnish JMeter Measurements
    Example (Read raw data and plot in R)
    data <- read.table(fname,header=TRUE,sep="\t")
    print(data)
    plot(data$N,data$X_N,type="b")
    0 100 200 300 400
    0 100 200 300
    Varnish JMeter Speedup Data
    Load generators (N)
    Speedup S(N)
    By typing data into R console:
    > data
    N X_N Speed Effcy
    1 1 1.4 1.000000 1.0000000
    2 2 2.7 1.928571 0.9642857
    3 5 6.4 4.571429 0.9142857
    4 10 12.8 9.142857 0.9142857
    5 25 32.0 22.857143 0.9142857
    6 50 64.0 45.714286 0.9142857
    7 75 98.0 70.000000 0.9333333
    8 100 131.0 93.571429 0.9357143
    9 150 197.0 140.714286 0.9380952
    10 250 320.0 228.571429 0.9142857
    11 300 392.0 280.000000 0.9333333
    12 400 518.0 370.000000 0.9250000
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 17 / 55

    View Slide

  36. Applying the USL Varnish
    Varnish Comparison with Linear Scaling
    0 100 200 300 400
    0 100 200 300
    Varnish JMeter Speedup Data
    Load generators (N)
    Speedup S(N)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 18 / 55

    View Slide

  37. Applying the USL Varnish
    Varnish meets the USL
    0 100 200 300 400
    0 100 200 300
    Load generators (N)
    Speedup S(N)
    USL Fit to Varnish Speedup Data
    σ = 2e-04
    κ = 0
    R2
    = 0.9992
    pmax
    = NaN
    Smax
    = NaN
    Sroof = 4234.67
    Z(sec) = NaN
    TS = 1111141521
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 19 / 55

    View Slide

  38. Applying the USL Varnish
    USL Scalability Projection
    0 1000 2000 3000 4000 5000
    0 500 1000 1500 2000 2500 3000
    Load generators (N)
    Speedup S(N)
    USL Projection for Varnish
    σ = 2e-04
    κ = 0
    R2
    = 0.9992
    pmax
    = NaN
    Smax
    = NaN
    Sroof = 4234.67
    Z(sec) = NaN
    TS = 1111141530
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 20 / 55

    View Slide

  39. Applying the USL Memcached
    Memcached
    Joint work with
    S. Subramanyam (Sun, USA) and S. Parvu (Nokia, FI)
    Presented at Velocity 2010
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 21 / 55

    View Slide

  40. Applying the USL Memcached
    Memcached Scalability
    Scaleup Scaleout
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 22 / 55

    View Slide

  41. Applying the USL Memcached
    Memcached Scaleout Strategy
    Distributed cache of key-value pairs
    Pre-loaded from RDBMS
    Deploy mcd on tier of cheap, older CPUs (but not multicores)
    Single threaded mcd ok — until next hardware roll (i.e., multicores)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 23 / 55

    View Slide

  42. Applying the USL Memcached
    Memcached Measurements
    Example (Read in raw data and plot it)
    data <- read.table(fname,header=TRUE,sep="\t")
    print(data)
    plot(data$N,data$X_N,type="b")
    0 2 4 6 8 10 12
    0 1 2 3 4
    Raw Speedup Data
    Threads (N)
    Thruput X(N)
    Typing data into R console:
    > data
    N X_N
    1 1 89
    2 2 160
    3 4 272
    4 8 333
    5 10 352
    6 12 339
    7 14 315
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 24 / 55

    View Slide

  43. Applying the USL Memcached
    Memcached Regression Analysis
    Example (Normalize, check efficiencies, USL fit)
    > data
    p X_p Speed Effcy
    1 1 89 1.000000 1.0000000
    2 2 160 1.797753 0.8988764
    3 4 272 3.056180 0.7640449
    4 8 333 3.741573 0.4676966
    5 10 352 3.955056 0.3955056
    6 12 339 3.808989 0.3174157
    7 14 315 3.539326 0.2528090
    > summary(usl)
    Formula: Speed ˜ p/(1 + sigma * (p - 1) + kappa * p * (p - 1))
    Parameters:
    Estimate Std. Error t value Pr(>|t|)
    sigma 0.025517 0.014830 1.721 0.146
    kappa 0.020958 0.001746 12.003 7.08e-05 ***
    ---
    Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
    Residual standard error: 0.08918 on 5 degrees of freedom
    Algorithm "port", convergence message: relative convergence (4)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 25 / 55

    View Slide

  44. Applying the USL Memcached
    Memcached USL Scalability Fit
    0 2 4 6 8 10 12
    0 1 2 3 4
    Threads (N)
    Speedup S(N)
    USL Analysis of Memcached
    σ = 0.0255
    κ = 0.020958
    R2
    = 0.9925
    pmax
    = 6.82
    Smax
    = 3.44
    Sroof = 39.19
    Z(sec) = 0
    TS = 1111141517
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 26 / 55

    View Slide

  45. Applying the USL Memcached
    MCD Scalability Improvements (Sun SPARC patch)
    0 10 20 30 40 50
    0 5 10 15 20 25
    Threads (N)
    Speedup S(N)
    mcd 1.2.8
    mcd 1.3.2
    mcd 1.3.2 + patch
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 27 / 55

    View Slide

  46. Applying the USL Tomcat Java Application
    Tomcat Scalability
    Data provided by
    M. Chawla (Germany)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 28 / 55

    View Slide

  47. Applying the USL Tomcat Java Application
    USL Fit to Initial Production Data
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 29 / 55

    View Slide

  48. Applying the USL Tomcat Java Application
    Extended Production Data
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 30 / 55

    View Slide

  49. Applying the USL Tomcat Java Application
    USL Fit to Extended Production Data
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 31 / 55

    View Slide

  50. Applying the USL Sirius (Zookeeper)
    Comcast Sirius
    and
    Apache Zookeeper
    “Sirius: Distributing and Coordinating Application Reference Data”
    USENIX ;login: Oct 2014, Figure 4 (PDF)
    “ZooKeeper: Wait-free coordination for Internet-scale systems”
    USENIX Annual Tech. Conf., 2010 (PDF)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 32 / 55

    View Slide

  51. Applying the USL Sirius (Zookeeper)
    Sirius: Distributed coordination by voting
    2 4 6 8 10 12 14
    0 500 1000 1500
    USL Model of Sirius Scalability
    Cluster size
    Writes per second
    σ = 0.037 κ = 0.1649 R2
    = 0.9591
    NJG Wednesday, November 12, 2014
    Sirius
    Sirius-NoBrain
    Sirius-NoDisk
    USL model
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 33 / 55

    View Slide

  52. Superlinear Scaling
    Outline
    1 How to Quantify Scalability
    Components of Scalability
    Universal Scalability Law (USL)
    2 Applying the USL
    Varnish
    Memcached
    Tomcat Java Application
    Sirius (Zookeeper)
    3 Superlinear Scaling
    What it looks like
    Perpetual motion
    Hunting the Superlinear Snark
    4 Superlinear Payback Trap
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 34 / 55

    View Slide

  53. Superlinear Scaling
    Superlinear Scaling
    Joint work with
    P. Puglia (BofA) and K. Tomasette (Comcast)
    Published in journal
    Comm. ACM, Vol.58 No.4, April 2015
    and online at
    ACM Queue (unabridged)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 35 / 55

    View Slide

  54. Superlinear Scaling
    Remember this?
    Superlinear
    Linear
    Sublinear
    Processors
    Speedup
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 36 / 55

    View Slide

  55. Superlinear Scaling
    Remember this?
    Superlinear
    Linear
    Sublinear
    Processors
    Speedup
    More than 100% efficient!(???)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 36 / 55

    View Slide

  56. Superlinear Scaling
    “Speedup in Parallel Contexts.”
    http://en.wikipedia.org/wiki/Speedup#Speedup_in_Parallel_Contexts
    “Where does super-linear speedup come from?”
    http://stackoverflow.com/questions/4332967/where-does-super-linear-speedup-come-from
    “Sun Fire X2270 M2 Super-Linear Scaling of Hadoop TeraSort and CloudBurst
    Benchmarks.”
    https://blogs.oracle.com/BestPerf/entry/20090920_x2270m2_hadoop
    Haas, R. “Scalability, in Graphical Form, Analyzed.”
    http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html
    Sutter, H. 2008. “Going Superlinear.”
    Dr. Dobb’s Journal 33(3), March.
    http://www.drdobbs.com/cpp/going-superlinear/206100542
    Sutter, H. 2008. “Super Linearity and the Bigger Machine.”
    Dr. Dobb’s Journal 33(4), April.
    http://www.drdobbs.com/parallel/super-linearity-and-the-bigger-machine/206903306
    “SDN analytics and control using sFlow standard — Superlinear.”
    http://blog.sflow.com/2010/09/superlinear.html
    Eijkhout, V. 2014. Introduction to High Performance Scientific Computing.
    Lulu.com
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 37 / 55

    View Slide

  57. Superlinear Scaling What it looks like
    . . . . .
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 38 / 55

    View Slide

  58. Superlinear Scaling What it looks like
    Oracle plot of Hadoop on SunFire cluster
    Superlinear speedup on 16-node SunFire (158% linear scaling)
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

    View Slide

  59. Superlinear Scaling What it looks like
    Oracle plot of Hadoop on SunFire cluster
    Superlinear speedup on 16-node SunFire (158% linear scaling)
    Linear superlinearity ???
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

    View Slide

  60. Superlinear Scaling What it looks like
    Oracle plot of Hadoop on SunFire cluster
    Superlinear speedup on 16-node SunFire (158% linear scaling)
    Linear superlinearity ??? ← Ship it !!!
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

    View Slide

  61. Superlinear Scaling Perpetual motion
    Reminiscent of perpetual motion
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 40 / 55

    View Slide

  62. Superlinear Scaling Perpetual motion
    Perpetual motion
    Perpetual motion machines:
    Perpetual motion contraptions violate conservation of energy law.
    Super efficiency is tantamount to more than 100% output.
    Even if you know it’s wrong, proving it is the hard part.
    Superlinear scalability:
    Superlinearity exceeds 100% of total capacity.
    Violates the Universal Scalability Law (USL) bounds.
    Again, proving it wrong is the hard part.
    Requires serious analysis and debugging.
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 41 / 55

    View Slide

  63. Superlinear Scaling Perpetual motion
    TeraSort Cluster Simulations
    Need controlled environment to study superlinearity
    TeraSort workload sorts 1 TB of data in parallel
    TeraSort has benchmarked Hadoop MapReduce performance
    We used just 100 GB data input (not benchmarking anything)
    Simulate in AWS cloud (more flexible and much cheaper)
    Many test runs, some done in parallel
    Table 1: Amazon EC2 Configurations
    Optimized Processor vCPU Memory Instance Network
    for Arch number (GiB) Storage (GB) Performance
    BigMem Memory 64-bit 4 34.2 1 x 850 Moderate
    BigDisk Compute 64-bit 8 7 4 x 420 High
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 42 / 55

    View Slide

  64. Superlinear Scaling Perpetual motion
    Hadoop MapReduce Architecture
    ."."."
    ."."."
    Mapper"
    tasks"
    Reducer"
    tasks"
    Node%2% Node%p%%
    Shuffle"
    exchange"
    Node%1%
    Mapper"
    tasks"
    Reducer"
    tasks"
    DataNode"
    Reducer"
    tasks"
    Mapper"
    tasks"
    JobTracker" NameNode"
    Job"
    Client"
    Input"
    Map(k,v)"
    Sort"
    ParEEon"
    Input"
    Merge"
    Reduce(k,[v])"
    Output"
    DataNode" DataNode"
    Load"from"HDFS"
    Write"to"HDFS"
    100 GB input data
    840 Mappers
    3 Reducers/node
    EC2 nodes: p = 1 . . . 200
    p = 1 runtimes ∼ 5 hrs
    Cloudera 4.7.0 dsn
    Apache Whirr
    Linux perf tools
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 43 / 55

    View Slide

  65. Superlinear Scaling Hunting the Superlinear Snark
    USL Model of BigMem p = 50 Speedup
    0 50 100 150
    0 50 100 150
    USL Model of BigMem Hadoop TS Data
    EC2 m2 nodes (p)
    Speedup S(p)
    σ = −0.0288
    κ = 0.000447
    R2
    = 0.9974
    pmax
    = 47.96
    Smax
    = 73.48
    Sroof = N A
    pcross = 64.5
    TS = 1311140942
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 44 / 55

    View Slide

  66. Superlinear Scaling Hunting the Superlinear Snark
    USL Model of BigMem p = 150 Speedup
    0 50 100 150
    0 50 100 150
    USL Model of BigMem Hadoop TS Data
    EC2 m2 nodes (p)
    Speedup S(p)
    σ = −0.0089
    κ = 9e-05
    R2
    = 0.977
    pmax
    = 105.72
    Smax
    = 99.53
    Sroof = N A
    pcross = 99.14
    TS = 1311140942
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 45 / 55

    View Slide

  67. Superlinear Scaling Hunting the Superlinear Snark
    A Sign of Superlinearity
    USL contention coefficient is negative:
    σ = −0.0288
    σ = −0.0089
    The sign that superlinear scaling is really there (get it )
    Positive σ means capacity loss due to overhead
    Negative σ therefore implies capacity gain or credit
    But what could provide such credit?
    And like a credit card, do you have to pay for it later?
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 46 / 55

    View Slide

  68. Superlinear Scaling Hunting the Superlinear Snark
    Recall from AMZ EC2 Configs
    Optimized Processor vCPU Memory Instance Network
    for Arch number (GiB) Storage (GB) Performance
    BigMem Memory 64-bit 4 34.2 1 x 850 Moderate
    BigDisk Compute 64-bit 8 7 4 x 420 High
    From Table 1:
    1 BigMem has a 1 disk per EC2 node type
    2 BigDisk has 4 disks per EC2 node type
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 47 / 55

    View Slide

  69. Superlinear Scaling Hunting the Superlinear Snark
    Speedup on p = 10 BigMem Nodes (1 disk)
    0 5 10 15
    0 5 10 15
    BigMem Hadoop Terasort Speedup Data
    EC2 m2 nodes (p)
    Speedup S(p)
    Superlinear region Sublinear region
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 48 / 55

    View Slide

  70. Superlinear Scaling Hunting the Superlinear Snark
    Speedup on p = 10 BigDisk Nodes (4 disks)
    0 5 10 15
    0 5 10 15
    BigDisk Hadoop Terasort Speedup Data
    EC2 c1 nodes (p)
    Speedup S(p)
    Superlinear region Sublinear region
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 49 / 55

    View Slide

  71. Superlinear Scaling Hunting the Superlinear Snark
    Brief Explanation
    1 Apparent capacity credit produced by IO bottleneck:
    Credit = Gradual reduction in IO constraint
    Relaxation of the latent IO bandwidth constraint.
    Constraint decreases with cluster size p = 1, 2, 3, . . .
    2 IO bottleneck induces random Reducer retries:
    Up to 10% variation in runtimes
    Stretches measured runtimes
    Distorts normalization of the speedup data
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55

    View Slide

  72. Superlinear Scaling Hunting the Superlinear Snark
    Brief Explanation
    1 Apparent capacity credit produced by IO bottleneck:
    Credit = Gradual reduction in IO constraint
    Relaxation of the latent IO bandwidth constraint.
    Constraint decreases with cluster size p = 1, 2, 3, . . .
    2 IO bottleneck induces random Reducer retries:
    Up to 10% variation in runtimes
    Stretches measured runtimes
    Distorts normalization of the speedup data
    Details are discussed in the unabridged ACM Queue article
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55

    View Slide

  73. Superlinear Payback Trap
    Outline
    1 How to Quantify Scalability
    Components of Scalability
    Universal Scalability Law (USL)
    2 Applying the USL
    Varnish
    Memcached
    Tomcat Java Application
    Sirius (Zookeeper)
    3 Superlinear Scaling
    What it looks like
    Perpetual motion
    Hunting the Superlinear Snark
    4 Superlinear Payback Trap
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 51 / 55

    View Slide

  74. Superlinear Payback Trap
    Summary
    USL requires σ, κ > 0 for S(p) to be concave function
    Convex efficiencies S(p)/p > 100% do (appear to) exist
    Data → σ < 0 in USL model is a superlinear detector
    Super efficiency is not free
    Like perpetual motion, it’s an illusion
    You will pay the piper eventually
    Debugging latent capacity credit can be very tricky
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55

    View Slide

  75. Superlinear Payback Trap
    Summary
    USL requires σ, κ > 0 for S(p) to be concave function
    Convex efficiencies S(p)/p > 100% do (appear to) exist
    Data → σ < 0 in USL model is a superlinear detector
    Super efficiency is not free
    Like perpetual motion, it’s an illusion
    You will pay the piper eventually
    Debugging latent capacity credit can be very tricky
    Theorem (Gunther 2012)
    USL Payback Trap: Superlinearity is
    always followed by severe loss of
    speedup in the payback region
    Verified by Kris Tomasette on April
    15, 2014
    Superlinear Payback
    Processors
    Speedup
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55

    View Slide

  76. Superlinear Payback Trap
    The Visual Takeaway
    People think it’s this
    Superlinear
    Linear
    Sublinear
    Processors
    Speedup
    But it’s really this
    Superlinear Payback
    Processo
    Speedup
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55

    View Slide

  77. Superlinear Payback Trap
    The Visual Takeaway
    People think it’s this
    Superlinear
    Linear
    Sublinear
    Processors
    Speedup
    But it’s really this
    Superlinear Payback
    Processo
    Speedup
    USL explains Terasort superlinearity on Hadoop
    Superlinear effects do appear in other guises (see the cited links)
    More and more apps becoming massively distributed
    Look for negative USL σ in your performance data
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55

    View Slide

  78. Superlinear Payback Trap
    More about USL Modeling
    Chapters 6 and 14 Chapters 4–6
    Also check out:
    USL web page
    Guerrilla classes
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 54 / 55

    View Slide

  79. Superlinear Payback Trap
    Thank you!
    Castro Valley, California
    www.perfdynamics.com
    perfdynamics.blogspot.com
    Twitter/DrQz
    Facebook
    [email protected]
    +1-510-537-5758
    c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 55 / 55

    View Slide