Slide 1

Slide 1 text

Hadoop Super Scaling Dr. Neil Gunther — @DrQz Performance Dynamics Labs Salesforce Tech Talk August 8, San Francisco SM c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 1 / 55

Slide 2

Slide 2 text

What this talk is about c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 2 / 55

Slide 3

Slide 3 text

What this talk is about Superlinear Linear Sublinear Processors Speedup Scalability: Performance gain due to increasing resource capacity c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 2 / 55

Slide 4

Slide 4 text

Qualitative Scalability Qualitative means: Scalability from an operational view Scalability as configuration recipes Lot’s a words (on blogs), but no numbers Cost-benefit analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

Slide 5

Slide 5 text

Qualitative Scalability Qualitative means: Scalability from an operational view Scalability as configuration recipes Lot’s a words (on blogs), but no numbers Cost-benefit analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. But how? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

Slide 6

Slide 6 text

Qualitative Scalability Qualitative means: Scalability from an operational view Scalability as configuration recipes Lot’s a words (on blogs), but no numbers Cost-benefit analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. But how? Need controlled measurements (e.g., Apache JMeter) Cannot understand scalability by monitoring Prod systems. The human brain is not built for that. Need to transform time-series data to informational performance metrics. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

Slide 7

Slide 7 text

Quantitative Scalability Google 2005 on MapReduce: “If scaling were perfect, performance would be proportional to the number of machines. In our test, it was 0.98 of the machines.” Since the data records we wish to process do live on many machines, it would be fruitful to exploit the combined computing power to perform these analyses. In particular, if the individual steps can be expressed as query operations that can be evaluated one record at a time, we can distribute the calculation across all the machines and achieve very high throughput. The results of these operations will then require an aggregation phase. For example, if we are counting records, we need to gather the counts from the individual machines before we can report the total count. We therefore break our calculations into two phases. The first phase evaluates the analysis on each record individually, while the second phase aggregates the results (Figure 2). The system described in this paper goes even further, however. The analysis in the first phase is expressed in a new procedural programming language that executes one record at a time, in isolation, to calculate query results for each record. The second phase is restricted to a set of predefined aggregators that process the intermediate results generated by the first phase. By restricting the calculations to this model, we can achieve very high throughput. Although not all calculations fit this model well, the ability to harness a thousand or more machines with a few lines of code provides some compensation. !""#$"%&'#( !"#$%&'#%()*$(' +,-$$%&'&.$.' ! ! )*+&$#,( /.0'&.$.' Figure 2: The overall flow of filtering, aggregating, and collating. Each stage typically involves less data than the previous. Of course, there are still many subproblems that remain to be solved. The calculation must be divided into pieces and distributed across the machines holding the data, keeping the computation as near the data as possible to avoid network bottlenecks. And when there are many machines there is a high probability of some of them failing during the analysis, so the system must be 3 Translation: MR scalability is 98% of ideal linear scaling Scalability is a function , not a single number Diminishing returns due to increasing overhead Want to express overhead loss quantitatively But what (mathematical) function? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 4 / 55

Slide 8

Slide 8 text

The Speedup Metric Commonly used in the context of parallel processing performance I’ll denote it by the symbol Sp in this talk Expect Sp = p if linear with p parallel processors Superlinear if Sp > p Example (MIT Swarm processor) Some of these speedup profiles look superlinear(?) 1 32 64 Speedup 1c 32c 64c bfs 117x 1c 32c 64c sssp 1c 32c 64c astar 1c 32c 64c msf 1c 32c 64c des 1c 32c 64c silo Swarm Software-only parallel Figure 9. Speedup of Swarm and state-of-the-art software-parallel implementations from 1 to 64 cores, relative to a tuned serial implementation running on a system of the same size. At 64 cores, Swarm programs are 43 to 117 times faster than the serial versions and 2.7 to 18 times faster than software-parallel versions. 80 100 (%) 1,200 1,400 sed 2.6K 2.6K 2.3K 2.7K “Unlocking Ordered Parallelism with the Swarm Architecture,” IEEE Micro, Issue No. 03, vol.36, 105–117 (2016 ) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 5 / 55

Slide 9

Slide 9 text

How to Quantify Scalability Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 6 / 55

Slide 10

Slide 10 text

How to Quantify Scalability Components of Scalability Equal bang for your buck: Concurrency c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 7 / 55

Slide 11

Slide 11 text

How to Quantify Scalability Components of Scalability Diminishing returns: Contention cost c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 8 / 55

Slide 12

Slide 12 text

How to Quantify Scalability Components of Scalability Resource saturation: More contention cost c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 9 / 55

Slide 13

Slide 13 text

How to Quantify Scalability Components of Scalability Negative returns: Coherency of non-local data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 10 / 55

Slide 14

Slide 14 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 15

Slide 15 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 16

Slide 16 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 17

Slide 17 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 18

Slide 18 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 19

Slide 19 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 20

Slide 20 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 21

Slide 21 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 2 Contention (0 < σ < 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 22

Slide 22 text

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 2 Contention (0 < σ < 1) 3 Coherency (0 < κ < 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

Slide 23

Slide 23 text

How to Quantify Scalability Universal Scalability Law (USL) Measurement meets Model X(p) X(1) Thruput data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

Slide 24

Slide 24 text

How to Quantify Scalability Universal Scalability Law (USL) Measurement meets Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

Slide 25

Slide 25 text

How to Quantify Scalability Universal Scalability Law (USL) Measurement meets Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup ←− p 1 + σ (p − 1) + κ p(p − 1) USL model c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

Slide 26

Slide 26 text

How to Quantify Scalability Universal Scalability Law (USL) Measurement meets Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup ←− p 1 + σ (p − 1) + κ p(p − 1) USL model 0 20 40 60 80 100 5 10 15 20 Processors (p) Speedup S(p) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

Slide 27

Slide 27 text

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

Slide 28

Slide 28 text

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

Slide 29

Slide 29 text

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression I’ll use R stats tools throughout this talk: FOSS with 40 yr history since S at Bell Labs GDAT: Guerrilla Data Analysis Techniques c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

Slide 30

Slide 30 text

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression I’ll use R stats tools throughout this talk: FOSS with 40 yr history since S at Bell Labs GDAT: Guerrilla Data Analysis Techniques Magic functions in R: nls() nonlinear regression → σ, κ in one swell foop optimize() to estimate Xdata (1) if missing predict() smooth interpolation/extrapolation from data plot() with various bells & whistles c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

Slide 31

Slide 31 text

Applying the USL Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 14 / 55

Slide 32

Slide 32 text

Applying the USL Varnish Varnish Data provided by D. Popa (DigitAir, RO) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 15 / 55

Slide 33

Slide 33 text

Applying the USL Varnish Varnish Architecture HTTP accelerator Reverse web proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55

Slide 34

Slide 34 text

Applying the USL Varnish Varnish Architecture HTTP accelerator Reverse web proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) ... but is it? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55

Slide 35

Slide 35 text

Applying the USL Varnish Varnish JMeter Measurements Example (Read raw data and plot in R) data <- read.table(fname,header=TRUE,sep="\t") print(data) plot(data$N,data$X_N,type="b") 0 100 200 300 400 0 100 200 300 Varnish JMeter Speedup Data Load generators (N) Speedup S(N) By typing data into R console: > data N X_N Speed Effcy 1 1 1.4 1.000000 1.0000000 2 2 2.7 1.928571 0.9642857 3 5 6.4 4.571429 0.9142857 4 10 12.8 9.142857 0.9142857 5 25 32.0 22.857143 0.9142857 6 50 64.0 45.714286 0.9142857 7 75 98.0 70.000000 0.9333333 8 100 131.0 93.571429 0.9357143 9 150 197.0 140.714286 0.9380952 10 250 320.0 228.571429 0.9142857 11 300 392.0 280.000000 0.9333333 12 400 518.0 370.000000 0.9250000 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 17 / 55

Slide 36

Slide 36 text

Applying the USL Varnish Varnish Comparison with Linear Scaling 0 100 200 300 400 0 100 200 300 Varnish JMeter Speedup Data Load generators (N) Speedup S(N) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 18 / 55

Slide 37

Slide 37 text

Applying the USL Varnish Varnish meets the USL 0 100 200 300 400 0 100 200 300 Load generators (N) Speedup S(N) USL Fit to Varnish Speedup Data σ = 2e-04 κ = 0 R2 = 0.9992 pmax = NaN Smax = NaN Sroof = 4234.67 Z(sec) = NaN TS = 1111141521 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 19 / 55

Slide 38

Slide 38 text

Applying the USL Varnish USL Scalability Projection 0 1000 2000 3000 4000 5000 0 500 1000 1500 2000 2500 3000 Load generators (N) Speedup S(N) USL Projection for Varnish σ = 2e-04 κ = 0 R2 = 0.9992 pmax = NaN Smax = NaN Sroof = 4234.67 Z(sec) = NaN TS = 1111141530 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 20 / 55

Slide 39

Slide 39 text

Applying the USL Memcached Memcached Joint work with S. Subramanyam (Sun, USA) and S. Parvu (Nokia, FI) Presented at Velocity 2010 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 21 / 55

Slide 40

Slide 40 text

Applying the USL Memcached Memcached Scalability Scaleup Scaleout c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 22 / 55

Slide 41

Slide 41 text

Applying the USL Memcached Memcached Scaleout Strategy Distributed cache of key-value pairs Pre-loaded from RDBMS Deploy mcd on tier of cheap, older CPUs (but not multicores) Single threaded mcd ok — until next hardware roll (i.e., multicores) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 23 / 55

Slide 42

Slide 42 text

Applying the USL Memcached Memcached Measurements Example (Read in raw data and plot it) data <- read.table(fname,header=TRUE,sep="\t") print(data) plot(data$N,data$X_N,type="b") 0 2 4 6 8 10 12 0 1 2 3 4 Raw Speedup Data Threads (N) Thruput X(N) Typing data into R console: > data N X_N 1 1 89 2 2 160 3 4 272 4 8 333 5 10 352 6 12 339 7 14 315 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 24 / 55

Slide 43

Slide 43 text

Applying the USL Memcached Memcached Regression Analysis Example (Normalize, check efficiencies, USL fit) > data p X_p Speed Effcy 1 1 89 1.000000 1.0000000 2 2 160 1.797753 0.8988764 3 4 272 3.056180 0.7640449 4 8 333 3.741573 0.4676966 5 10 352 3.955056 0.3955056 6 12 339 3.808989 0.3174157 7 14 315 3.539326 0.2528090 > summary(usl) Formula: Speed ˜ p/(1 + sigma * (p - 1) + kappa * p * (p - 1)) Parameters: Estimate Std. Error t value Pr(>|t|) sigma 0.025517 0.014830 1.721 0.146 kappa 0.020958 0.001746 12.003 7.08e-05 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0.08918 on 5 degrees of freedom Algorithm "port", convergence message: relative convergence (4) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 25 / 55

Slide 44

Slide 44 text

Applying the USL Memcached Memcached USL Scalability Fit 0 2 4 6 8 10 12 0 1 2 3 4 Threads (N) Speedup S(N) USL Analysis of Memcached σ = 0.0255 κ = 0.020958 R2 = 0.9925 pmax = 6.82 Smax = 3.44 Sroof = 39.19 Z(sec) = 0 TS = 1111141517 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 26 / 55

Slide 45

Slide 45 text

Applying the USL Memcached MCD Scalability Improvements (Sun SPARC patch) 0 10 20 30 40 50 0 5 10 15 20 25 Threads (N) Speedup S(N) mcd 1.2.8 mcd 1.3.2 mcd 1.3.2 + patch c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 27 / 55

Slide 46

Slide 46 text

Applying the USL Tomcat Java Application Tomcat Scalability Data provided by M. Chawla (Germany) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 28 / 55

Slide 47

Slide 47 text

Applying the USL Tomcat Java Application USL Fit to Initial Production Data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 29 / 55

Slide 48

Slide 48 text

Applying the USL Tomcat Java Application Extended Production Data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 30 / 55

Slide 49

Slide 49 text

Applying the USL Tomcat Java Application USL Fit to Extended Production Data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 31 / 55

Slide 50

Slide 50 text

Applying the USL Sirius (Zookeeper) Comcast Sirius and Apache Zookeeper “Sirius: Distributing and Coordinating Application Reference Data” USENIX ;login: Oct 2014, Figure 4 (PDF) “ZooKeeper: Wait-free coordination for Internet-scale systems” USENIX Annual Tech. Conf., 2010 (PDF) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 32 / 55

Slide 51

Slide 51 text

Applying the USL Sirius (Zookeeper) Sirius: Distributed coordination by voting 2 4 6 8 10 12 14 0 500 1000 1500 USL Model of Sirius Scalability Cluster size Writes per second σ = 0.037 κ = 0.1649 R2 = 0.9591 NJG Wednesday, November 12, 2014 Sirius Sirius-NoBrain Sirius-NoDisk USL model c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 33 / 55

Slide 52

Slide 52 text

Superlinear Scaling Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 34 / 55

Slide 53

Slide 53 text

Superlinear Scaling Superlinear Scaling Joint work with P. Puglia (BofA) and K. Tomasette (Comcast) Published in journal Comm. ACM, Vol.58 No.4, April 2015 and online at ACM Queue (unabridged) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 35 / 55

Slide 54

Slide 54 text

Superlinear Scaling Remember this? Superlinear Linear Sublinear Processors Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 36 / 55

Slide 55

Slide 55 text

Superlinear Scaling Remember this? Superlinear Linear Sublinear Processors Speedup More than 100% efficient!(???) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 36 / 55

Slide 56

Slide 56 text

Superlinear Scaling “Speedup in Parallel Contexts.” http://en.wikipedia.org/wiki/Speedup#Speedup_in_Parallel_Contexts “Where does super-linear speedup come from?” http://stackoverflow.com/questions/4332967/where-does-super-linear-speedup-come-from “Sun Fire X2270 M2 Super-Linear Scaling of Hadoop TeraSort and CloudBurst Benchmarks.” https://blogs.oracle.com/BestPerf/entry/20090920_x2270m2_hadoop Haas, R. “Scalability, in Graphical Form, Analyzed.” http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Sutter, H. 2008. “Going Superlinear.” Dr. Dobb’s Journal 33(3), March. http://www.drdobbs.com/cpp/going-superlinear/206100542 Sutter, H. 2008. “Super Linearity and the Bigger Machine.” Dr. Dobb’s Journal 33(4), April. http://www.drdobbs.com/parallel/super-linearity-and-the-bigger-machine/206903306 “SDN analytics and control using sFlow standard — Superlinear.” http://blog.sflow.com/2010/09/superlinear.html Eijkhout, V. 2014. Introduction to High Performance Scientific Computing. Lulu.com c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 37 / 55

Slide 57

Slide 57 text

Superlinear Scaling What it looks like . . . . . c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 38 / 55

Slide 58

Slide 58 text

Superlinear Scaling What it looks like Oracle plot of Hadoop on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

Slide 59

Slide 59 text

Superlinear Scaling What it looks like Oracle plot of Hadoop on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) Linear superlinearity ??? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

Slide 60

Slide 60 text

Superlinear Scaling What it looks like Oracle plot of Hadoop on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) Linear superlinearity ??? ← Ship it !!! c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

Slide 61

Slide 61 text

Superlinear Scaling Perpetual motion Reminiscent of perpetual motion c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 40 / 55

Slide 62

Slide 62 text

Superlinear Scaling Perpetual motion Perpetual motion Perpetual motion machines: Perpetual motion contraptions violate conservation of energy law. Super efficiency is tantamount to more than 100% output. Even if you know it’s wrong, proving it is the hard part. Superlinear scalability: Superlinearity exceeds 100% of total capacity. Violates the Universal Scalability Law (USL) bounds. Again, proving it wrong is the hard part. Requires serious analysis and debugging. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 41 / 55

Slide 63

Slide 63 text

Superlinear Scaling Perpetual motion TeraSort Cluster Simulations Need controlled environment to study superlinearity TeraSort workload sorts 1 TB of data in parallel TeraSort has benchmarked Hadoop MapReduce performance We used just 100 GB data input (not benchmarking anything) Simulate in AWS cloud (more flexible and much cheaper) Many test runs, some done in parallel Table 1: Amazon EC2 Configurations Optimized Processor vCPU Memory Instance Network for Arch number (GiB) Storage (GB) Performance BigMem Memory 64-bit 4 34.2 1 x 850 Moderate BigDisk Compute 64-bit 8 7 4 x 420 High c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 42 / 55

Slide 64

Slide 64 text

Superlinear Scaling Perpetual motion Hadoop MapReduce Architecture ."."." ."."." Mapper" tasks" Reducer" tasks" Node%2% Node%p%% Shuffle" exchange" Node%1% Mapper" tasks" Reducer" tasks" DataNode" Reducer" tasks" Mapper" tasks" JobTracker" NameNode" Job" Client" Input" Map(k,v)" Sort" ParEEon" Input" Merge" Reduce(k,[v])" Output" DataNode" DataNode" Load"from"HDFS" Write"to"HDFS" 100 GB input data 840 Mappers 3 Reducers/node EC2 nodes: p = 1 . . . 200 p = 1 runtimes ∼ 5 hrs Cloudera 4.7.0 dsn Apache Whirr Linux perf tools c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 43 / 55

Slide 65

Slide 65 text

Superlinear Scaling Hunting the Superlinear Snark USL Model of BigMem p = 50 Speedup 0 50 100 150 0 50 100 150 USL Model of BigMem Hadoop TS Data EC2 m2 nodes (p) Speedup S(p) σ = −0.0288 κ = 0.000447 R2 = 0.9974 pmax = 47.96 Smax = 73.48 Sroof = N A pcross = 64.5 TS = 1311140942 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 44 / 55

Slide 66

Slide 66 text

Superlinear Scaling Hunting the Superlinear Snark USL Model of BigMem p = 150 Speedup 0 50 100 150 0 50 100 150 USL Model of BigMem Hadoop TS Data EC2 m2 nodes (p) Speedup S(p) σ = −0.0089 κ = 9e-05 R2 = 0.977 pmax = 105.72 Smax = 99.53 Sroof = N A pcross = 99.14 TS = 1311140942 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 45 / 55

Slide 67

Slide 67 text

Superlinear Scaling Hunting the Superlinear Snark A Sign of Superlinearity USL contention coefficient is negative: σ = −0.0288 σ = −0.0089 The sign that superlinear scaling is really there (get it ) Positive σ means capacity loss due to overhead Negative σ therefore implies capacity gain or credit But what could provide such credit? And like a credit card, do you have to pay for it later? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 46 / 55

Slide 68

Slide 68 text

Superlinear Scaling Hunting the Superlinear Snark Recall from AMZ EC2 Configs Optimized Processor vCPU Memory Instance Network for Arch number (GiB) Storage (GB) Performance BigMem Memory 64-bit 4 34.2 1 x 850 Moderate BigDisk Compute 64-bit 8 7 4 x 420 High From Table 1: 1 BigMem has a 1 disk per EC2 node type 2 BigDisk has 4 disks per EC2 node type c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 47 / 55

Slide 69

Slide 69 text

Superlinear Scaling Hunting the Superlinear Snark Speedup on p = 10 BigMem Nodes (1 disk) 0 5 10 15 0 5 10 15 BigMem Hadoop Terasort Speedup Data EC2 m2 nodes (p) Speedup S(p) Superlinear region Sublinear region c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 48 / 55

Slide 70

Slide 70 text

Superlinear Scaling Hunting the Superlinear Snark Speedup on p = 10 BigDisk Nodes (4 disks) 0 5 10 15 0 5 10 15 BigDisk Hadoop Terasort Speedup Data EC2 c1 nodes (p) Speedup S(p) Superlinear region Sublinear region c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 49 / 55

Slide 71

Slide 71 text

Superlinear Scaling Hunting the Superlinear Snark Brief Explanation 1 Apparent capacity credit produced by IO bottleneck: Credit = Gradual reduction in IO constraint Relaxation of the latent IO bandwidth constraint. Constraint decreases with cluster size p = 1, 2, 3, . . . 2 IO bottleneck induces random Reducer retries: Up to 10% variation in runtimes Stretches measured runtimes Distorts normalization of the speedup data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55

Slide 72

Slide 72 text

Superlinear Scaling Hunting the Superlinear Snark Brief Explanation 1 Apparent capacity credit produced by IO bottleneck: Credit = Gradual reduction in IO constraint Relaxation of the latent IO bandwidth constraint. Constraint decreases with cluster size p = 1, 2, 3, . . . 2 IO bottleneck induces random Reducer retries: Up to 10% variation in runtimes Stretches measured runtimes Distorts normalization of the speedup data Details are discussed in the unabridged ACM Queue article c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55

Slide 73

Slide 73 text

Superlinear Payback Trap Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 51 / 55

Slide 74

Slide 74 text

Superlinear Payback Trap Summary USL requires σ, κ > 0 for S(p) to be concave function Convex efficiencies S(p)/p > 100% do (appear to) exist Data → σ < 0 in USL model is a superlinear detector Super efficiency is not free Like perpetual motion, it’s an illusion You will pay the piper eventually Debugging latent capacity credit can be very tricky c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55

Slide 75

Slide 75 text

Superlinear Payback Trap Summary USL requires σ, κ > 0 for S(p) to be concave function Convex efficiencies S(p)/p > 100% do (appear to) exist Data → σ < 0 in USL model is a superlinear detector Super efficiency is not free Like perpetual motion, it’s an illusion You will pay the piper eventually Debugging latent capacity credit can be very tricky Theorem (Gunther 2012) USL Payback Trap: Superlinearity is always followed by severe loss of speedup in the payback region Verified by Kris Tomasette on April 15, 2014 Superlinear Payback Processors Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55

Slide 76

Slide 76 text

Superlinear Payback Trap The Visual Takeaway People think it’s this Superlinear Linear Sublinear Processors Speedup But it’s really this Superlinear Payback Processo Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55

Slide 77

Slide 77 text

Superlinear Payback Trap The Visual Takeaway People think it’s this Superlinear Linear Sublinear Processors Speedup But it’s really this Superlinear Payback Processo Speedup USL explains Terasort superlinearity on Hadoop Superlinear effects do appear in other guises (see the cited links) More and more apps becoming massively distributed Look for negative USL σ in your performance data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55

Slide 78

Slide 78 text

Superlinear Payback Trap More about USL Modeling Chapters 6 and 14 Chapters 4–6 Also check out: USL web page Guerrilla classes c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 54 / 55

Slide 79

Slide 79 text

Superlinear Payback Trap Thank you! Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com Twitter/DrQz Facebook [email protected] +1-510-537-5758 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 55 / 55