Slide 1

Slide 1 text

Quantifying Scalability FTW How to do a scalability surge in ∆t < 1 hour Dr. Neil J. Gunther Performance Dynamics SURGE 2010 Sept 30 – Oct 1 SM c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 1 / 45

Slide 2

Slide 2 text

Scaling vs. Scalability Outline 1 Scaling vs. Scalability 2 Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 2 / 45

Slide 3

Slide 3 text

Scaling vs. Scalability Motivation for This Talk Practical methodology for assessing the cost-benefit of a given scalability strategy quantify system scalability scalability is not a single number (it’s a function) all measurements are wrong by definition need a framework to validate data measurement + model == information Scalability: sustainable performance under increasing load (size N) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 3 / 45

Slide 4

Slide 4 text

Scaling vs. Scalability Jack and the Beanstalk Jack climbs a magic beanstalk up into the clouds (10,000 ft?) Guarded by a giant who is 10 times bigger than Jack “Fee-fie-foe-fum!” and all that c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 4 / 45

Slide 5

Slide 5 text

Scaling vs. Scalability Where Are All the Giants? Can giants exist? Can 10,000’ beanstalk exist? Guinness world record Robert P. Wadlow (USA) Height: 8’11” (2.72 m) Jack Height: 1.8 m tall (L) Weight: 90 kg Giant (10x bigger) Height: 18 m tall (10 × L) L3 × 90 kg = 103 × 90 kg Weight: 90,000 kg A bone-crushing 100 tons! c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 5 / 45

Slide 6

Slide 6 text

Scaling vs. Scalability Scaling vs. Scalability Natural scaling Inherent critical limits to sustainable loads When the load (volume) exceeds the material strength (supporting area), things tend to snap Load ∼ L3 (volume), but strength ∼ L2 (cross-section area) Computer scalability No critical limit Point of diminishing returns Scalability is about sustainable size c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 6 / 45

Slide 7

Slide 7 text

Scaling vs. Scalability Natural System Scaling Weight Strength 0.0 0.5 1.0 1.5 2.0 Weight 0.2 0.4 0.6 0.8 1.0 1.2 1.4 Scalability Giant’s legs, beanstalks, bridges, collapse where the curves cross c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 7 / 45

Slide 8

Slide 8 text

Scaling vs. Scalability Computer System Scaling Scaling Degradation 0 200 400 600 800 1000 Users 100 200 300 400 500 600 Scalability Critical point is maximum in throughput curve Beyond max performance degradation or retrograde scalability c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 8 / 45

Slide 9

Slide 9 text

Scaling vs. Scalability Web 2.0 Scalability Fails Twitter.com Amazon EC2 Cuil.com Apple iStore Google Gmail WolframAlpha c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 9 / 45

Slide 10

Slide 10 text

Scaling vs. Scalability Scalability is Not a Number Google 2005 paper “Parallel Analysis with Sawzall” “If scaling were perfect, performance would be proportional to the number of machines... In our test, the effect is to contribute 0.98 machines.” Translation: Not 100% linear but 98% of linear or C++, while capable of handling such tasks, are more awkward to use and require more effort on the part of the programmer. Still, Awk and Python are not panaceas; for instance, they have no inherent facilities for processing data on multiple machines. Since the data records we wish to process do live on many machines, it would be fruitful to exploit the combined computing power to perform these analyses. In particular, if the individual steps can be expressed as query operations that can be evaluated one record at a time, we can distribute the calculation across all the machines and achieve very high throughput. The results of these operations will then require an aggregation phase. For example, if we are counting records, we need to gather the counts from the individual machines before we can report the total count. We therefore break our calculations into two phases. The first phase evaluates the analysis on each record individually, while the second phase aggregates the results (Figure 2). The system described in this paper goes even further, however. The analysis in the first phase is expressed in a new procedural programming language that executes one record at a time, in isolation, to calculate query results for each record. The second phase is restricted to a set of predefined aggregators that process the intermediate results generated by the first phase. By restricting the calculations to this model, we can achieve very high throughput. Although not all calculations fit this model well, the ability to harness a thousand or more machines with a few lines of code provides some compensation. !""#$"%&'#( !"#$%&'#%()*$(' +,-$$%&'&.$.' ! ! )*+&$#,( /.0'&.$.' Figure 2: The overall flow of filtering, aggregating, and collating. Each stage typically involves less data than the previous. Of course, there are still many subproblems that remain to be solved. The calculation must be divided into pieces and distributed across the machines holding the data, keeping the computation as near the data as possible to avoid network bottlenecks. And when there are many machines there is a high probability of some of them failing during the analysis, so the system must be 3 Theo Schlossnagle: “Linear scaling is simply a falsehood” p.71 Scalability is a function Not a number Always limits, e.g., throughput capacity Want to quantify such limits c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 10 / 45

Slide 11

Slide 11 text

Components of Scalability Outline 1 Scaling vs. Scalability 2 Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 11 / 45

Slide 12

Slide 12 text

Components of Scalability Math Phobes Can Relax Proper quantification involves math Quantifying scalability requires some math But nothing as complicated as this1 Pr{Murphy} = (U + C + I) × (10 − S) 20 × A 1 − sin(F/10) I have no idea what this equation is (ask Theo ) 1Source: Theo Schlossnagle, Scalable Intenet Architectures, p.12 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 12 / 45

Slide 13

Slide 13 text

Components of Scalability Equal Bang for the Buck (Concurrency) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 13 / 45

Slide 14

Slide 14 text

Components of Scalability Cost of Sharing Resources (Contention) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 14 / 45

Slide 15

Slide 15 text

Components of Scalability Diminishing Returns (Saturation) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 15 / 45

Slide 16

Slide 16 text

Components of Scalability Negative ROI (Inconsistency Delays) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 16 / 45

Slide 17

Slide 17 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 18

Slide 18 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes C is the scalability function of N c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 19

Slide 19 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 20

Slide 20 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 21

Slide 21 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 22

Slide 22 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (amount α) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 23

Slide 23 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (amount α) 3 Consistency as in ACID & CAP Thm (amount β) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 24

Slide 24 text

Components of Scalability Universal Scalability Law (USL) Pulling it all together: N users or processes C is the scalability function of N C(N, α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (amount α) 3 Consistency as in ACID & CAP Thm (amount β) Theorem (Universality) Only need α, β coefficients to determine maximum in C(N) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Slide 25

Slide 25 text

Problem: Bad scalability data Outline 1 Scaling vs. Scalability 2 Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 18 / 45

Slide 26

Slide 26 text

Problem: Bad scalability data Data Are Not Divine Data come from the Devil Models come from God Skepticism should rule! Theorem Data + Models ≡ Insight Data needs to be put in prison (a model) and made to confess the truth Corollary Waterboarding your data is ok c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 19 / 45

Slide 27

Slide 27 text

Problem: Bad scalability data Scalability Measurements J2EE web application Throughput measurements using Apache Jmeter c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 20 / 45

Slide 28

Slide 28 text

Problem: Bad scalability data Bad Data The Problem Monotonically increasing, looks ok visually but some data are > 100% efficient Can’t haz Or your have some very serious explaining to do c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 21 / 45

Slide 29

Slide 29 text

Problem: Bad scalability data Put Your Data in Prison Excel table of various USL quantities. Column F is scaling efficiency: C/N. Between N = 5 and 150 vusers, efficiencies > 1.0. Can’t have more than 100% of anything. Need to explain? Data + Model == Information Merely attempting to set up the USL model in Excel or R, shows measurement data (not the model) are wrong. c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 22 / 45

Slide 30

Slide 30 text

Problem: eBay 1.0 scalability Outline 1 Scaling vs. Scalability 2 Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 23 / 45

Slide 31

Slide 31 text

Problem: eBay 1.0 scalability eBay 1.0 Capacity Upgrades The Problem Want to compare capacity upgrades for Sun E10K backend Running ORA dbms for both OLTP and DSS eBay 1.0 had no performance measurements of their app eBay Inc. was just hiring into a QA/load-test group No scalability measurements Sun PS provided me with their M-values M-values ⇒ α 0.005 But that’s only α = 1 2 % contention ... WTF !? ORA dmbs is more typically α ≈ 3% But at least we have things quantified c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 24 / 45

Slide 32

Slide 32 text

Problem: eBay 1.0 scalability eBay 1.0 Optimistic Projections Python: CPU Upgrade Scenarios - Optimistic 0.00 50.00 100.00 150.00 200.00 250.00 0 4 8 12 16 20 24 28 32 Weeks since 8/5/99 Total Utilization (%) 52way@333 52way@400 64way@333 64way@400 1 E10K 2 E10Ks c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 25 / 45

Slide 33

Slide 33 text

Problem: eBay 1.0 scalability How to keep the peace? The Solution Apply the USL model to Sun’s M-values C(N) = N 1 + α(N − 1) + βN(N − 1) ORA backend ⇒ α 0.03 Simply re-run the USL curves with that value. Voila! Creates a scalability envelope eBay mileage will vary within this scalability envelope c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 26 / 45

Slide 34

Slide 34 text

Problem: eBay 1.0 scalability eBay 1.0 Realistic Projections CPU Upgrade Scenarios - Realistic 0.00 50.00 100.00 150.00 200.00 250.00 0 4 8 12 16 20 24 28 32 Weeks since 8/5/99 Total Utilization (%) 52way@333 52way@400 64way@333 64way@400 1 E10K 2 E10Ks c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 27 / 45

Slide 35

Slide 35 text

Problem: memcache scalability Outline 1 Scaling vs. Scalability 2 Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 28 / 45

Slide 36

Slide 36 text

Problem: memcache scalability Velocity 2010, June 24 Velocity 2010, June 24 2 2 Scalability Scalability c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 29 / 45

Slide 37

Slide 37 text

Problem: memcache scalability Scale out with memcache Tiers of older servers Servers often blades Mostly single processor Single threading ok c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 30 / 45

Slide 38

Slide 38 text

Problem: memcache scalability But ... Datacenter HW gets rolled Single-CPU blades will be replaced with multicores Multicores will be the only game in town (HW vendor decision) The Problem memcached is thread limited on multicores c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 31 / 45

Slide 39

Slide 39 text

Problem: memcache scalability The evidence Velocity 2010, June 24 Velocity 2010, June 24 12 12 Memcached Memcached scaling is thread limited scaling is thread limited c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 32 / 45

Slide 40

Slide 40 text

Problem: memcache scalability Performance measurement rig SunFire X4170 w/ 64 GB RAM2 Intel Nehalem multicores 2 processor sockets ⇒ 2 quad-cores == 8 cores Intel SMT enabled ⇒ 2 threads/ core 16 virtual CPUs seen by Solaris Load generators ←→ 10Gbps link ←→ SUT 2Joint work with Sun, pre-Oracle acquistion c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 33 / 45

Slide 41

Slide 41 text

Problem: memcache scalability USL regression in Excel c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 34 / 45

Slide 42

Slide 42 text

Problem: memcache scalability USL regression in R # Standard non-linear least squares (NLS) fit using USL model usl <- nls(Norm ˜ N/(1 + alpha * (N-1) + beta * N * (N-1)), input, start=c(alpha=0.1, beta=0.01)) # Get alpha & beta parameters for use in plot legend x.coef <- coef(usl) # Determine sum-of-squares for R-squared coeff from NLS fit sse <- sum((input$Norm - predict(usl))ˆ2) sst <- sum((input$Norm - mean(input$Norm))ˆ2) # Calculate Nmax and X(Nmax) Nmax<-sqrt((1-x.coef[’alpha’])/x.coef[’beta’]) Xmax<-input$X_N[1]* Nmax/(1 + x.coef[’alpha’] * (Nmax-1) + x.coef[’beta’] * Nmax * (Nmax-1)) # Plot all the results plot(x<-c(0:max(input$N)), input$X_N[1] ...) title("USL Scalability") points(input$N, input$X_N) legend("bottom", legend=eval(parse(text=sprintf(...) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 35 / 45

Slide 43

Slide 43 text

Problem: memcache scalability Raw bench data p Xp 50 100 150 200 250 300 10 20 30 40 50 60 Data smoother p Xp 50 100 150 200 250 300 10 20 30 40 50 60 USL fit p Xp 50 100 150 200 250 300 10 20 30 40 50 60 USL fit + CI bands p Xp 50 100 150 200 250 300 10 20 30 40 50 60 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 36 / 45

Slide 44

Slide 44 text

Problem: memcache scalability The envelope please! Table: Intel 2-socket duo core + SMT Version α β Nmax 1.2.8 0.0255 0.0210 7 1.4.1 0.0821 0.0207 6 1.4.5 0.0988 0.0209 6 Little’s law 3: N = X(R + Z) threads Also know R is on the order of ms (10−3 s), so latency dominated by client-side “think time” Z = 5 s in tests Avg X ≈ 350 KOPS on Intel quad-core Therefore: N ≈ 350 × 103 × 5 = 1, 750, 000 threads Same as users, assuming 1 user process per thread 3See e.g., Scalable Intenet Architectures, p.127 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 37 / 45

Slide 45

Slide 45 text

Problem: memcache scalability SPARC Solaris mods Table: SPARC T2 + Solaris Version α β Nmax Vanilla 0.0041 0.0092 22 Modified 0.0000 0.0004 48 The Solution Partitioned mcd hash table Single hash table contention avoided by partitioning table Solaris patches improve scalability to ≈ 40 threads Throughput X increases from 200 → 400 KOPS on SPARC CMT Can’t assume same 2x win on x86 arch c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 38 / 45

Slide 46

Slide 46 text

Summary and Review Outline 1 Scaling vs. Scalability 2 Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 39 / 45

Slide 47

Slide 47 text

Summary and Review Why Should You Care? Werner Vogels, Amazon CTO “Scalability is hard because it cannot be an after-thought. Good scalability is possible, but only if we architect and engineer our systems to take scalability into account.” Old reason: Concurrent programming was hard on SMPs New reason: Multicores are SMPs on a chip (HW vendor decision) More threads enable higher concurrency, shorter user latencies But it’s hard: beware the 3rd C in the USL (β coefficient) Theo Schlossnagle, OmniTI CEO “Simply having a solution that scales horizontally doesn’t mean that you are safe.” c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 40 / 45

Slide 48

Slide 48 text

Summary and Review Where is Your Application? Class A Class B Ideal concurrency (α, β = 0) Contention-only (α > 0, β = 0) Shared-nothing platform Message-based queueing (e.g., MQSeries) Google text search Message Passing Interface (MPI) applications Lexus–Nexus search Transaction monitors (e.g., Tuxedo) Read-only queries Polling service (e.g., VMWare) Peer-to-peer (e.g., Skype) Class C Class D Incoherent-only (α = 0, β > 0) Worst case (α, β > 0) Scientific HPC computations Anything with shared writes Online analytic processing (OLAP) Hotel reservation system Data mining Banking online transaction processing (OLTP) Decision support software (DSS), Java database connectivity (JDBC) c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 41 / 45

Slide 49

Slide 49 text

Summary and Review USL Scalability Zones Think scalability zones rather than Scalability curves A B C 0 20 40 60 80 100 120 N 0 200 400 600 800 1000 X N Websphere measurements (dots) A Asynchronous messaging (average queue lengths) B Synchronous messaging (worst queue lengths) C Synchronous messaging + pairwise exchanges c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 42 / 45

Slide 50

Slide 50 text

Resources and Coordinates Outline 1 Scaling vs. Scalability 2 Components of Scalability 3 Problem: Bad scalability data 4 Problem: eBay 1.0 scalability 5 Problem: memcache scalability 6 Summary and Review 7 Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 43 / 45

Slide 51

Slide 51 text

Resources and Coordinates Resources and Coordinates Castro Valley, California, 94552 Resources: Books Training USL tools Coordinates: www.perfdynamics.com perfdynamics.blogspot.com twitter.com/DrQz [email protected] +1-510-537-5758 Chapters 4–6 c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 44 / 45

Slide 52

Slide 52 text

Resources and Coordinates c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 45 / 45