Hadoop Super Scaling Dr. Neil Gunther — @DrQz Performance Dynamics Labs Salesforce Tech Talk August 8, San Francisco SM c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 1 / 55

What this talk is about Superlinear Linear Sublinear Processors Speedup Scalability: Performance gain due to increasing resource capacity c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 2 / 55

Qualitative Scalability Qualitative means: Scalability from an operational view Scalability as conﬁguration recipes Lot’s a words (on blogs), but no numbers Cost-beneﬁt analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

Qualitative Scalability Qualitative means: Scalability from an operational view Scalability as conﬁguration recipes Lot’s a words (on blogs), but no numbers Cost-beneﬁt analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. But how? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

Qualitative Scalability Qualitative means: Scalability from an operational view Scalability as conﬁguration recipes Lot’s a words (on blogs), but no numbers Cost-beneﬁt analysis demands numbers, not words. Need to measure scalability appropriately it to quantify it. But how? Need controlled measurements (e.g., Apache JMeter) Cannot understand scalability by monitoring Prod systems. The human brain is not built for that. Need to transform time-series data to informational performance metrics. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 3 / 55

Quantitative Scalability Google 2005 on MapReduce: “If scaling were perfect, performance would be proportional to the number of machines. In our test, it was 0.98 of the machines.” Since the data records we wish to process do live on many machines, it would be fruitful to exploit the combined computing power to perform these analyses. In particular, if the individual steps can be expressed as query operations that can be evaluated one record at a time, we can distribute the calculation across all the machines and achieve very high throughput. The results of these operations will then require an aggregation phase. For example, if we are counting records, we need to gather the counts from the individual machines before we can report the total count. We therefore break our calculations into two phases. The ﬁrst phase evaluates the analysis on each record individually, while the second phase aggregates the results (Figure 2). The system described in this paper goes even further, however. The analysis in the ﬁrst phase is expressed in a new procedural programming language that executes one record at a time, in isolation, to calculate query results for each record. The second phase is restricted to a set of predeﬁned aggregators that process the intermediate results generated by the ﬁrst phase. By restricting the calculations to this model, we can achieve very high throughput. Although not all calculations ﬁt this model well, the ability to harness a thousand or more machines with a few lines of code provides some compensation. !""#$"%&'#( !"#$%&'#%()*$(' +,-$$%&'&.$.' ! ! )*+&$#,( /.0'&.$.' Figure 2: The overall ﬂow of ﬁltering, aggregating, and collating. Each stage typically involves less data than the previous. Of course, there are still many subproblems that remain to be solved. The calculation must be divided into pieces and distributed across the machines holding the data, keeping the computation as near the data as possible to avoid network bottlenecks. And when there are many machines there is a high probability of some of them failing during the analysis, so the system must be 3 Translation: MR scalability is 98% of ideal linear scaling Scalability is a function , not a single number Diminishing returns due to increasing overhead Want to express overhead loss quantitatively But what (mathematical) function? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 4 / 55

The Speedup Metric Commonly used in the context of parallel processing performance I’ll denote it by the symbol Sp in this talk Expect Sp = p if linear with p parallel processors Superlinear if Sp > p Example (MIT Swarm processor) Some of these speedup proﬁles look superlinear(?) 1 32 64 Speedup 1c 32c 64c bfs 117x 1c 32c 64c sssp 1c 32c 64c astar 1c 32c 64c msf 1c 32c 64c des 1c 32c 64c silo Swarm Software-only parallel Figure 9. Speedup of Swarm and state-of-the-art software-parallel implementations from 1 to 64 cores, relative to a tuned serial implementation running on a system of the same size. At 64 cores, Swarm programs are 43 to 117 times faster than the serial versions and 2.7 to 18 times faster than software-parallel versions. 80 100 (%) 1,200 1,400 sed 2.6K 2.6K 2.3K 2.7K “Unlocking Ordered Parallelism with the Swarm Architecture,” IEEE Micro, Issue No. 03, vol.36, 105–117 (2016 ) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 5 / 55

How to Quantify Scalability Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 6 / 55

How to Quantify Scalability Components of Scalability Equal bang for your buck: Concurrency c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 7 / 55

How to Quantify Scalability Components of Scalability Diminishing returns: Contention cost c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 8 / 55

How to Quantify Scalability Components of Scalability Resource saturation: More contention cost c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 9 / 55

How to Quantify Scalability Components of Scalability Negative returns: Coherency of non-local data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 10 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 2 Contention (0 < σ < 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Universal Scalability Law (USL) p processors or processes provide system load Sp speedup performance function ≡ normalized thruput Question: What kind of function is Sp ? Answer: A rational function 1 Sp(σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) The three Cs: 1 Concurrency 2 Contention (0 < σ < 1) 3 Coherency (0 < κ < 1) 1 NJG. “A Simple Capacity Model of Massively Parallel Transaction Systems,” CMG Conf. 1993 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 11 / 55

How to Quantify Scalability Universal Scalability Law (USL) Measurement meets Model X(p) X(1) Thruput data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

How to Quantify Scalability Universal Scalability Law (USL) Measurement meets Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

How to Quantify Scalability Universal Scalability Law (USL) Measurement meets Model X(p) X(1) Thruput data −→ Sp(σ, κ) Speedup ←− p 1 + σ (p − 1) + κ p(p − 1) USL model c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 12 / 55

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression I’ll use R stats tools throughout this talk: FOSS with 40 yr history since S at Bell Labs GDAT: Guerrilla Data Analysis Techniques c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

How to Quantify Scalability Universal Scalability Law (USL) How do we determine σ and κ? S(p, σ, κ) = p 1 + σ (p − 1) + κ p(p − 1) Brute force measurements (good luck!) Data from controlled measurements, e.g., JMeter Clever way: Apply statistical regression I’ll use R stats tools throughout this talk: FOSS with 40 yr history since S at Bell Labs GDAT: Guerrilla Data Analysis Techniques Magic functions in R: nls() nonlinear regression → σ, κ in one swell foop optimize() to estimate Xdata (1) if missing predict() smooth interpolation/extrapolation from data plot() with various bells & whistles c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 13 / 55

Applying the USL Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 14 / 55

Applying the USL Varnish Varnish Architecture HTTP accelerator Reverse web proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55

Applying the USL Varnish Varnish Architecture HTTP accelerator Reverse web proxy caching system Sits in front of classic web server Caching handled by virtual memory Claim: Highly scalable (read: linear) ... but is it? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 16 / 55

Applying the USL Memcached Memcached Joint work with S. Subramanyam (Sun, USA) and S. Parvu (Nokia, FI) Presented at Velocity 2010 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 21 / 55

Applying the USL Memcached Memcached Scaleout Strategy Distributed cache of key-value pairs Pre-loaded from RDBMS Deploy mcd on tier of cheap, older CPUs (but not multicores) Single threaded mcd ok — until next hardware roll (i.e., multicores) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 23 / 55

Applying the USL Memcached Memcached Measurements Example (Read in raw data and plot it) data <- read.table(fname,header=TRUE,sep="\t") print(data) plot(data$N,data$X_N,type="b") 0 2 4 6 8 10 12 0 1 2 3 4 Raw Speedup Data Threads (N) Thruput X(N) Typing data into R console: > data N X_N 1 1 89 2 2 160 3 4 272 4 8 333 5 10 352 6 12 339 7 14 315 c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 24 / 55

Applying the USL Tomcat Java Application Tomcat Scalability Data provided by M. Chawla (Germany) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 28 / 55

Applying the USL Tomcat Java Application USL Fit to Initial Production Data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 29 / 55

Applying the USL Tomcat Java Application USL Fit to Extended Production Data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 31 / 55

Superlinear Scaling Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 34 / 55

Superlinear Scaling Superlinear Scaling Joint work with P. Puglia (BofA) and K. Tomasette (Comcast) Published in journal Comm. ACM, Vol.58 No.4, April 2015 and online at ACM Queue (unabridged) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 35 / 55

Superlinear Scaling Remember this? Superlinear Linear Sublinear Processors Speedup More than 100% efﬁcient!(???) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 36 / 55

Superlinear Scaling “Speedup in Parallel Contexts.” http://en.wikipedia.org/wiki/Speedup#Speedup_in_Parallel_Contexts “Where does super-linear speedup come from?” http://stackoverflow.com/questions/4332967/where-does-super-linear-speedup-come-from “Sun Fire X2270 M2 Super-Linear Scaling of Hadoop TeraSort and CloudBurst Benchmarks.” https://blogs.oracle.com/BestPerf/entry/20090920_x2270m2_hadoop Haas, R. “Scalability, in Graphical Form, Analyzed.” http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html Sutter, H. 2008. “Going Superlinear.” Dr. Dobb’s Journal 33(3), March. http://www.drdobbs.com/cpp/going-superlinear/206100542 Sutter, H. 2008. “Super Linearity and the Bigger Machine.” Dr. Dobb’s Journal 33(4), April. http://www.drdobbs.com/parallel/super-linearity-and-the-bigger-machine/206903306 “SDN analytics and control using sFlow standard — Superlinear.” http://blog.sflow.com/2010/09/superlinear.html Eijkhout, V. 2014. Introduction to High Performance Scientiﬁc Computing. Lulu.com c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 37 / 55

Superlinear Scaling What it looks like Oracle plot of Hadoop on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

Superlinear Scaling What it looks like Oracle plot of Hadoop on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) Linear superlinearity ??? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

Superlinear Scaling What it looks like Oracle plot of Hadoop on SunFire cluster Superlinear speedup on 16-node SunFire (158% linear scaling) Linear superlinearity ??? ← Ship it !!! c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 39 / 55

Superlinear Scaling Perpetual motion Perpetual motion Perpetual motion machines: Perpetual motion contraptions violate conservation of energy law. Super efﬁciency is tantamount to more than 100% output. Even if you know it’s wrong, proving it is the hard part. Superlinear scalability: Superlinearity exceeds 100% of total capacity. Violates the Universal Scalability Law (USL) bounds. Again, proving it wrong is the hard part. Requires serious analysis and debugging. c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 41 / 55

Superlinear Scaling Perpetual motion TeraSort Cluster Simulations Need controlled environment to study superlinearity TeraSort workload sorts 1 TB of data in parallel TeraSort has benchmarked Hadoop MapReduce performance We used just 100 GB data input (not benchmarking anything) Simulate in AWS cloud (more ﬂexible and much cheaper) Many test runs, some done in parallel Table 1: Amazon EC2 Conﬁgurations Optimized Processor vCPU Memory Instance Network for Arch number (GiB) Storage (GB) Performance BigMem Memory 64-bit 4 34.2 1 x 850 Moderate BigDisk Compute 64-bit 8 7 4 x 420 High c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 42 / 55

Superlinear Scaling Hunting the Superlinear Snark A Sign of Superlinearity USL contention coefﬁcient is negative: σ = −0.0288 σ = −0.0089 The sign that superlinear scaling is really there (get it ) Positive σ means capacity loss due to overhead Negative σ therefore implies capacity gain or credit But what could provide such credit? And like a credit card, do you have to pay for it later? c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 46 / 55

Superlinear Scaling Hunting the Superlinear Snark Recall from AMZ EC2 Conﬁgs Optimized Processor vCPU Memory Instance Network for Arch number (GiB) Storage (GB) Performance BigMem Memory 64-bit 4 34.2 1 x 850 Moderate BigDisk Compute 64-bit 8 7 4 x 420 High From Table 1: 1 BigMem has a 1 disk per EC2 node type 2 BigDisk has 4 disks per EC2 node type c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 47 / 55

Superlinear Scaling Hunting the Superlinear Snark Brief Explanation 1 Apparent capacity credit produced by IO bottleneck: Credit = Gradual reduction in IO constraint Relaxation of the latent IO bandwidth constraint. Constraint decreases with cluster size p = 1, 2, 3, . . . 2 IO bottleneck induces random Reducer retries: Up to 10% variation in runtimes Stretches measured runtimes Distorts normalization of the speedup data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55

Superlinear Scaling Hunting the Superlinear Snark Brief Explanation 1 Apparent capacity credit produced by IO bottleneck: Credit = Gradual reduction in IO constraint Relaxation of the latent IO bandwidth constraint. Constraint decreases with cluster size p = 1, 2, 3, . . . 2 IO bottleneck induces random Reducer retries: Up to 10% variation in runtimes Stretches measured runtimes Distorts normalization of the speedup data Details are discussed in the unabridged ACM Queue article c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 50 / 55

Superlinear Payback Trap Outline 1 How to Quantify Scalability Components of Scalability Universal Scalability Law (USL) 2 Applying the USL Varnish Memcached Tomcat Java Application Sirius (Zookeeper) 3 Superlinear Scaling What it looks like Perpetual motion Hunting the Superlinear Snark 4 Superlinear Payback Trap c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 51 / 55

Superlinear Payback Trap Summary USL requires σ, κ > 0 for S(p) to be concave function Convex efﬁciencies S(p)/p > 100% do (appear to) exist Data → σ < 0 in USL model is a superlinear detector Super efﬁciency is not free Like perpetual motion, it’s an illusion You will pay the piper eventually Debugging latent capacity credit can be very tricky c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55

Superlinear Payback Trap Summary USL requires σ, κ > 0 for S(p) to be concave function Convex efﬁciencies S(p)/p > 100% do (appear to) exist Data → σ < 0 in USL model is a superlinear detector Super efﬁciency is not free Like perpetual motion, it’s an illusion You will pay the piper eventually Debugging latent capacity credit can be very tricky Theorem (Gunther 2012) USL Payback Trap: Superlinearity is always followed by severe loss of speedup in the payback region Veriﬁed by Kris Tomasette on April 15, 2014 Superlinear Payback Processors Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 52 / 55

Superlinear Payback Trap The Visual Takeaway People think it’s this Superlinear Linear Sublinear Processors Speedup But it’s really this Superlinear Payback Processo Speedup c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55

Superlinear Payback Trap The Visual Takeaway People think it’s this Superlinear Linear Sublinear Processors Speedup But it’s really this Superlinear Payback Processo Speedup USL explains Terasort superlinearity on Hadoop Superlinear effects do appear in other guises (see the cited links) More and more apps becoming massively distributed Look for negative USL σ in your performance data c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 53 / 55

Superlinear Payback Trap More about USL Modeling Chapters 6 and 14 Chapters 4–6 Also check out: USL web page Guerrilla classes c 2016 Performance Dynamics Labs Hadoop Super Scaling August 8, 2016 54 / 55