Hadoop Super Scaling

Hadoop Super Scaling

Invited Tech Talk at Salesforce HQ in San Francisco

Ced140140e9ae226f0d9ef0fbb84a3a1?s=128

Dr. Neil Gunther

August 09, 2016
Tweet

Transcript

  1. Hadoop Super Scaling Dr. Neil Gunther — @DrQz Performance Dynamics

    Labs Salesforce Tech Talk August 8, San Francisco SM c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 1 / 55
  2. What this talk is about c 2016 Performance Dynamics Labs

    Hadoop Super Scaling August 8, 2016 2 / 55
  3. What this talk is about Superlinear Linear Sublinear Processors Speedup

    Scalability: Performance gain due to increasing resource capacity c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 2 / 55
  4. Qualitative Scalability Qualitative means: Scalability from an operational view Scalability

    as configuration recipes Lot’s a words (on blogs), but no numbers Cost-benefit analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55
  5. Qualitative Scalability Qualitative means: Scalability from an operational view Scalability

    as configuration recipes Lot’s a words (on blogs), but no numbers Cost-benefit analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. But how? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55
  6. Qualitative Scalability Qualitative means: Scalability from an operational view Scalability

    as configuration recipes Lot’s a words (on blogs), but no numbers Cost-benefit analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. But how? Need controlled measurements (e.g., Apache JMeter) Cannot understand scalability by monitoring Prod systems. The human brain is not built for that. Need to transform time-series data to informational performance metrics. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55
  7. Quantitative Scalability Google 2005 on MapReduce: “If scaling were perfect,

    performance would be proportional to the number of machines. In our test, it was 0.98 of the machines.” Since the data records we wish to process do live on many machines, it would be fruitful to exploit the combined computing power to perform these analyses. In particular, if the individual steps can be expressed as query operations that can be evaluated one record at a time, we can distribute the calculation across all the machines and achieve very high throughput. The results of these operations will then require an aggregation phase. For example, if we are counting records, we need to gather the counts from the individual machines before we can report the total count. We therefore break our calculations into two phases. The first phase evaluates the analysis on each record individually, while the second phase aggregates the results (Figure 2). The system described in this paper goes even further, however. The analysis in the first phase is expressed in a new procedural programming language that executes one record at a time, in isolation, to calculate query results for each record. The second phase is restricted to a set of predefined aggregators that process the intermediate results generated by the first phase. By restricting the calculations to this model, we can achieve very high throughput. Although not all calculations fit this model well, the ability to harness a thousand or more machines with a few lines of code provides some compensation. !""#$"%&'#( !"#$%&'#%()*$(' +,-$$%&'&.$.' ! ! )*+&$#,( /.0'&.$.' Figure 2: The overall flow of filtering, aggregating, and collating. Each stage typically involves less data than the previous. Of course, there are still many subproblems that remain to be solved. The calculation must be divided into pieces and distributed across the machines holding the data, keeping the computation as near the data as possible to avoid network bottlenecks. And when there are many machines there is a high probability of some of them failing during the analysis, so the system must be 3 Translation: MR scalability is 98% of ideal linear scaling Scalability is a function , not a single number Diminishing returns due to increasing overhead Want to express overhead loss quantitatively But what (mathematical) function? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 4 / 55
  8. The Speedup Metric Commonly used in the context of parallel

    processing performance I’ll denote it by the symbol Sp in this talk Expect Sp = p if linear with p parallel processors Superlinear if Sp > p Example (MIT Swarm processor) Some of these speedup profiles look superlinear(?) 1 32 64 Speedup 1c 32c 64c bfs 117x 1c 32c 64c sssp 1c 32c 64c astar 1c 32c 64c msf 1c 32c 64c des 1c 32c 64c silo Swarm Software-only parallel Figure 9. Speedup of Swarm and state-of-the-art software-parallel implementations from 1 to 64 cores, relative to a tuned serial implementation running on a system of the same size. At 64 cores, Swarm programs are 43 to 117 times faster than the serial versions and 2.7 to 18 times faster than software-parallel versions. 80 100 (%) 1,200 1,400 sed 2.6K 2.6K 2.3K 2.7K “Unlocking Ordered Parallelism with the Swarm Architecture,” IEEE Micro, Issue No. 03, vol.36, 105–117 (2016 ) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 5 / 55
  9. How to Quantify Scalability Outline 1 How to Quantify Scalability

    Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 6 / 55
  10. How to Quantify Scalability Components of Scalability Equal bang for

    your buck: Concurrency c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 7 / 55
  11. How to Quantify Scalability Components of Scalability Diminishing returns: Contention

    cost c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 8 / 55
  12. How to Quantify Scalability Components of Scalability Resource saturation: More

    contention cost c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 9 / 55
  13. How to Quantify Scalability Components of Scalability Negative returns: Coherency

    of non-local data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 10 / 55
  14. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  15. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  16. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  17. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  18. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  19. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  20. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  21. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 2 Contention (0 < σ < 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  22. How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability

    Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 2 Contention (0 < σ < 1) 3 Coherency (0 < κ < 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55
  23. How to Quantify Scalability Universal Scalability Law (USL) Measurement meets

    Model X(p) X(1) Thruput data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55
  24. How to Quantify Scalability Universal Scalability Law (USL) Measurement meets

    Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55
  25. How to Quantify Scalability Universal Scalability Law (USL) Measurement meets

    Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup ←− p 1 + σ (p − 1) + κ p(p − 1) USL model c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55
  26. How to Quantify Scalability Universal Scalability Law (USL) Measurement meets

    Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup ←− p 1 + σ (p − 1) + κ p(p − 1) USL model 0 20 40 60 80 100 5 10 15 20 Processors (p) Speedup S(p) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55
  27. How to Quantify Scalability Universal Scalability Law (USL) How do

    we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55
  28. How to Quantify Scalability Universal Scalability Law (USL) How do

    we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55
  29. How to Quantify Scalability Universal Scalability Law (USL) How do

    we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression I’ll use R stats tools throughout this talk: FOSS with 40 yr history since S at Bell Labs GDAT: Guerrilla Data Analysis Techniques c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55
  30. How to Quantify Scalability Universal Scalability Law (USL) How do

    we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression I’ll use R stats tools throughout this talk: FOSS with 40 yr history since S at Bell Labs GDAT: Guerrilla Data Analysis Techniques Magic functions in R: nls() nonlinear regression → σ, κ in one swell foop optimize() to estimate Xdata (1) if missing predict() smooth interpolation/extrapolation from data plot() with various bells & whistles c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55
  31. Applying the USL Outline 1 How to Quantify Scalability Components

    of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 14 / 55
  32. Applying the USL Varnish Varnish Data provided by D. Popa

    (DigitAir, RO) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 15 / 55
  33. Applying the USL Varnish Varnish Architecture HTTP accelerator Reverse web

    proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55
  34. Applying the USL Varnish Varnish Architecture HTTP accelerator Reverse web

    proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) ... but is it? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55
  35. Applying the USL Varnish Varnish JMeter Measurements Example (Read raw

    data and plot in R) data <- read.table(fname,header=TRUE,sep="\t") print(data) plot(data$N,data$X_N,type="b") 0 100 200 300 400 0 100 200 300 Varnish JMeter Speedup Data Load generators (N) Speedup S(N) By typing data into R console: > data N X_N Speed Effcy 1 1 1.4 1.000000 1.0000000 2 2 2.7 1.928571 0.9642857 3 5 6.4 4.571429 0.9142857 4 10 12.8 9.142857 0.9142857 5 25 32.0 22.857143 0.9142857 6 50 64.0 45.714286 0.9142857 7 75 98.0 70.000000 0.9333333 8 100 131.0 93.571429 0.9357143 9 150 197.0 140.714286 0.9380952 10 250 320.0 228.571429 0.9142857 11 300 392.0 280.000000 0.9333333 12 400 518.0 370.000000 0.9250000 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 17 / 55
  36. Applying the USL Varnish Varnish Comparison with Linear Scaling 0

    100 200 300 400 0 100 200 300 Varnish JMeter Speedup Data Load generators (N) Speedup S(N) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 18 / 55
  37. Applying the USL Varnish Varnish meets the USL 0 100

    200 300 400 0 100 200 300 Load generators (N) Speedup S(N) USL Fit to Varnish Speedup Data σ = 2e-04 κ = 0 R2 = 0.9992 pmax = NaN Smax = NaN Sroof = 4234.67 Z(sec) = NaN TS = 1111141521 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 19 / 55
  38. Applying the USL Varnish USL Scalability Projection 0 1000 2000

    3000 4000 5000 0 500 1000 1500 2000 2500 3000 Load generators (N) Speedup S(N) USL Projection for Varnish σ = 2e-04 κ = 0 R2 = 0.9992 pmax = NaN Smax = NaN Sroof = 4234.67 Z(sec) = NaN TS = 1111141530 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 20 / 55
  39. Applying the USL Memcached Memcached Joint work with S. Subramanyam

    (Sun, USA) and S. Parvu (Nokia, FI) Presented at Velocity 2010 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 21 / 55
  40. Applying the USL Memcached Memcached Scalability Scaleup Scaleout c 2016

    Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 22 / 55
  41. Applying the USL Memcached Memcached Scaleout Strategy Distributed cache of

    key-value pairs Pre-loaded from RDBMS Deploy mcd on tier of cheap, older CPUs (but not multicores) Single threaded mcd ok — until next hardware roll (i.e., multicores) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 23 / 55
  42. Applying the USL Memcached Memcached Measurements Example (Read in raw

    data and plot it) data <- read.table(fname,header=TRUE,sep="\t") print(data) plot(data$N,data$X_N,type="b") 0 2 4 6 8 10 12 0 1 2 3 4 Raw Speedup Data Threads (N) Thruput X(N) Typing data into R console: > data N X_N 1 1 89 2 2 160 3 4 272 4 8 333 5 10 352 6 12 339 7 14 315 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 24 / 55
  43. Applying the USL Memcached Memcached Regression Analysis Example (Normalize, check

    efficiencies, USL fit) > data p X_p Speed Effcy 1 1 89 1.000000 1.0000000 2 2 160 1.797753 0.8988764 3 4 272 3.056180 0.7640449 4 8 333 3.741573 0.4676966 5 10 352 3.955056 0.3955056 6 12 339 3.808989 0.3174157 7 14 315 3.539326 0.2528090 > summary(usl) Formula: Speed ˜ p/(1 + sigma * (p - 1) + kappa * p * (p - 1)) Parameters: Estimate Std. Error t value Pr(>|t|) sigma 0.025517 0.014830 1.721 0.146 kappa 0.020958 0.001746 12.003 7.08e-05 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.08918 on 5 degrees of freedom Algorithm "port", convergence message: relative convergence (4) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 25 / 55
  44. Applying the USL Memcached Memcached USL Scalability Fit 0 2

    4 6 8 10 12 0 1 2 3 4 Threads (N) Speedup S(N) USL Analysis of Memcached σ = 0.0255 κ = 0.020958 R2 = 0.9925 pmax = 6.82 Smax = 3.44 Sroof = 39.19 Z(sec) = 0 TS = 1111141517 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 26 / 55
  45. Applying the USL Memcached MCD Scalability Improvements (Sun SPARC patch)

    0 10 20 30 40 50 0 5 10 15 20 25 Threads (N) Speedup S(N) mcd 1.2.8 mcd 1.3.2 mcd 1.3.2 + patch c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 27 / 55
  46. Applying the USL Tomcat Java Application Tomcat Scalability Data provided

    by M. Chawla (Germany) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 28 / 55
  47. Applying the USL Tomcat Java Application USL Fit to Initial

    Production Data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 29 / 55
  48. Applying the USL Tomcat Java Application Extended Production Data c

    2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 30 / 55
  49. Applying the USL Tomcat Java Application USL Fit to Extended

    Production Data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 31 / 55
  50. Applying the USL Sirius (Zookeeper) Comcast Sirius and Apache Zookeeper

    “Sirius: Distributing and Coordinating Application Reference Data” USENIX ;login: Oct 2014, Figure 4 (PDF) “ZooKeeper: Wait-free coordination for Internet-scale systems” USENIX Annual Tech. Conf., 2010 (PDF) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 32 / 55
  51. Applying the USL Sirius (Zookeeper) Sirius: Distributed coordination by voting

    2 4 6 8 10 12 14 0 500 1000 1500 USL Model of Sirius Scalability Cluster size Writes per second σ = 0.037 κ = 0.1649 R2 = 0.9591 NJG Wednesday, November 12, 2014 Sirius Sirius-NoBrain Sirius-NoDisk USL model c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 33 / 55
  52. Superlinear Scaling Outline 1 How to Quantify Scalability Components of

    Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 34 / 55
  53. Superlinear Scaling Superlinear Scaling Joint work with P. Puglia (BofA)

    and K. Tomasette (Comcast) Published in journal Comm. ACM, Vol.58 No.4, April 2015 and online at ACM Queue (unabridged) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 35 / 55
  54. Superlinear Scaling Remember this? Superlinear Linear Sublinear Processors Speedup c

    2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 36 / 55
  55. Superlinear Scaling Remember this? Superlinear Linear Sublinear Processors Speedup More

    than 100% efficient!(???) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 36 / 55
  56. Superlinear Scaling “Speedup in Parallel Contexts.” http://en.wikipedia.org/wiki/Speedup#Speedup_in_Parallel_Contexts “Where does super-linear

    speedup come from?” http://stackoverflow.com/questions/4332967/where-does-super-linear-speedup-come-from “Sun Fire X2270 M2 Super-Linear Scaling of Hadoop TeraSort and CloudBurst Benchmarks.” https://blogs.oracle.com/BestPerf/entry/20090920_x2270m2_hadoop Haas, R. “Scalability, in Graphical Form, Analyzed.” http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Sutter, H. 2008. “Going Superlinear.” Dr. Dobb’s Journal 33(3), March. http://www.drdobbs.com/cpp/going-superlinear/206100542 Sutter, H. 2008. “Super Linearity and the Bigger Machine.” Dr. Dobb’s Journal 33(4), April. http://www.drdobbs.com/parallel/super-linearity-and-the-bigger-machine/206903306 “SDN analytics and control using sFlow standard — Superlinear.” http://blog.sflow.com/2010/09/superlinear.html Eijkhout, V. 2014. Introduction to High Performance Scientific Computing. Lulu.com c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 37 / 55
  57. Superlinear Scaling What it looks like . . . .

    . c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 38 / 55
  58. Superlinear Scaling What it looks like Oracle plot of Hadoop

    on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55
  59. Superlinear Scaling What it looks like Oracle plot of Hadoop

    on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) Linear superlinearity ??? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55
  60. Superlinear Scaling What it looks like Oracle plot of Hadoop

    on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) Linear superlinearity ??? ← Ship it !!! c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55
  61. Superlinear Scaling Perpetual motion Reminiscent of perpetual motion c 2016

    Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 40 / 55
  62. Superlinear Scaling Perpetual motion Perpetual motion Perpetual motion machines: Perpetual

    motion contraptions violate conservation of energy law. Super efficiency is tantamount to more than 100% output. Even if you know it’s wrong, proving it is the hard part. Superlinear scalability: Superlinearity exceeds 100% of total capacity. Violates the Universal Scalability Law (USL) bounds. Again, proving it wrong is the hard part. Requires serious analysis and debugging. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 41 / 55
  63. Superlinear Scaling Perpetual motion TeraSort Cluster Simulations Need controlled environment

    to study superlinearity TeraSort workload sorts 1 TB of data in parallel TeraSort has benchmarked Hadoop MapReduce performance We used just 100 GB data input (not benchmarking anything) Simulate in AWS cloud (more flexible and much cheaper) Many test runs, some done in parallel Table 1: Amazon EC2 Configurations Optimized Processor vCPU Memory Instance Network for Arch number (GiB) Storage (GB) Performance BigMem Memory 64-bit 4 34.2 1 x 850 Moderate BigDisk Compute 64-bit 8 7 4 x 420 High c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 42 / 55
  64. Superlinear Scaling Perpetual motion Hadoop MapReduce Architecture ."."." ."."." Mapper"

    tasks" Reducer" tasks" Node%2% Node%p%% Shuffle" exchange" Node%1% Mapper" tasks" Reducer" tasks" DataNode" Reducer" tasks" Mapper" tasks" JobTracker" NameNode" Job" Client" Input" Map(k,v)" Sort" ParEEon" Input" Merge" Reduce(k,[v])" Output" DataNode" DataNode" Load"from"HDFS" Write"to"HDFS" 100 GB input data 840 Mappers 3 Reducers/node EC2 nodes: p = 1 . . . 200 p = 1 runtimes ∼ 5 hrs Cloudera 4.7.0 dsn Apache Whirr Linux perf tools c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 43 / 55
  65. Superlinear Scaling Hunting the Superlinear Snark USL Model of BigMem

    p = 50 Speedup 0 50 100 150 0 50 100 150 USL Model of BigMem Hadoop TS Data EC2 m2 nodes (p) Speedup S(p) σ = −0.0288 κ = 0.000447 R2 = 0.9974 pmax = 47.96 Smax = 73.48 Sroof = N A pcross = 64.5 TS = 1311140942 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 44 / 55
  66. Superlinear Scaling Hunting the Superlinear Snark USL Model of BigMem

    p = 150 Speedup 0 50 100 150 0 50 100 150 USL Model of BigMem Hadoop TS Data EC2 m2 nodes (p) Speedup S(p) σ = −0.0089 κ = 9e-05 R2 = 0.977 pmax = 105.72 Smax = 99.53 Sroof = N A pcross = 99.14 TS = 1311140942 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 45 / 55
  67. Superlinear Scaling Hunting the Superlinear Snark A Sign of Superlinearity

    USL contention coefficient is negative: σ = −0.0288 σ = −0.0089 The sign that superlinear scaling is really there (get it ) Positive σ means capacity loss due to overhead Negative σ therefore implies capacity gain or credit But what could provide such credit? And like a credit card, do you have to pay for it later? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 46 / 55
  68. Superlinear Scaling Hunting the Superlinear Snark Recall from AMZ EC2

    Configs Optimized Processor vCPU Memory Instance Network for Arch number (GiB) Storage (GB) Performance BigMem Memory 64-bit 4 34.2 1 x 850 Moderate BigDisk Compute 64-bit 8 7 4 x 420 High From Table 1: 1 BigMem has a 1 disk per EC2 node type 2 BigDisk has 4 disks per EC2 node type c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 47 / 55
  69. Superlinear Scaling Hunting the Superlinear Snark Speedup on p =

    10 BigMem Nodes (1 disk) 0 5 10 15 0 5 10 15 BigMem Hadoop Terasort Speedup Data EC2 m2 nodes (p) Speedup S(p) Superlinear region Sublinear region c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 48 / 55
  70. Superlinear Scaling Hunting the Superlinear Snark Speedup on p =

    10 BigDisk Nodes (4 disks) 0 5 10 15 0 5 10 15 BigDisk Hadoop Terasort Speedup Data EC2 c1 nodes (p) Speedup S(p) Superlinear region Sublinear region c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 49 / 55
  71. Superlinear Scaling Hunting the Superlinear Snark Brief Explanation 1 Apparent

    capacity credit produced by IO bottleneck: Credit = Gradual reduction in IO constraint Relaxation of the latent IO bandwidth constraint. Constraint decreases with cluster size p = 1, 2, 3, . . . 2 IO bottleneck induces random Reducer retries: Up to 10% variation in runtimes Stretches measured runtimes Distorts normalization of the speedup data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55
  72. Superlinear Scaling Hunting the Superlinear Snark Brief Explanation 1 Apparent

    capacity credit produced by IO bottleneck: Credit = Gradual reduction in IO constraint Relaxation of the latent IO bandwidth constraint. Constraint decreases with cluster size p = 1, 2, 3, . . . 2 IO bottleneck induces random Reducer retries: Up to 10% variation in runtimes Stretches measured runtimes Distorts normalization of the speedup data Details are discussed in the unabridged ACM Queue article c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55
  73. Superlinear Payback Trap Outline 1 How to Quantify Scalability Components

    of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 51 / 55
  74. Superlinear Payback Trap Summary USL requires σ, κ > 0

    for S(p) to be concave function Convex efficiencies S(p)/p > 100% do (appear to) exist Data → σ < 0 in USL model is a superlinear detector Super efficiency is not free Like perpetual motion, it’s an illusion You will pay the piper eventually Debugging latent capacity credit can be very tricky c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55
  75. Superlinear Payback Trap Summary USL requires σ, κ > 0

    for S(p) to be concave function Convex efficiencies S(p)/p > 100% do (appear to) exist Data → σ < 0 in USL model is a superlinear detector Super efficiency is not free Like perpetual motion, it’s an illusion You will pay the piper eventually Debugging latent capacity credit can be very tricky Theorem (Gunther 2012) USL Payback Trap: Superlinearity is always followed by severe loss of speedup in the payback region Verified by Kris Tomasette on April 15, 2014 Superlinear Payback Processors Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55
  76. Superlinear Payback Trap The Visual Takeaway People think it’s this

    Superlinear Linear Sublinear Processors Speedup But it’s really this Superlinear Payback Processo Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55
  77. Superlinear Payback Trap The Visual Takeaway People think it’s this

    Superlinear Linear Sublinear Processors Speedup But it’s really this Superlinear Payback Processo Speedup USL explains Terasort superlinearity on Hadoop Superlinear effects do appear in other guises (see the cited links) More and more apps becoming massively distributed Look for negative USL σ in your performance data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55
  78. Superlinear Payback Trap More about USL Modeling Chapters 6 and

    14 Chapters 4–6 Also check out: USL web page Guerrilla classes c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 54 / 55
  79. Superlinear Payback Trap Thank you! Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com

    Twitter/DrQz Facebook njgunther@perfdynamics.com +1-510-537-5758 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 55 / 55