Dr. Neil Gunther
October 01, 2010
160

# Quantifying Scalability FTW

Presented at SURGE 2010 Conference

October 01, 2010

## Transcript

1. Quantifying Scalability FTW
How to do a scalability surge in ∆t < 1 hour
Dr. Neil J. Gunther
Performance Dynamics
SURGE 2010
Sept 30 – Oct 1
SM
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 1 / 45

2. Scaling vs. Scalability
Outline
1 Scaling vs. Scalability
2 Components of Scalability
4 Problem: eBay 1.0 scalability
5 Problem: memcache scalability
6 Summary and Review
7 Resources and Coordinates
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 2 / 45

3. Scaling vs. Scalability
Motivation for This Talk
Practical methodology for assessing the cost-beneﬁt of a given
scalability strategy
quantify system scalability
scalability is not a single number (it’s a function)
all measurements are wrong by deﬁnition
need a framework to validate data
measurement + model == information
Scalability: sustainable performance under increasing load (size N)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 3 / 45

4. Scaling vs. Scalability
Jack and the Beanstalk
Jack climbs a magic
beanstalk up into the
clouds (10,000 ft?)
Guarded by a giant who
is 10 times bigger than
Jack
“Fee-ﬁe-foe-fum!” and all
that
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 4 / 45

5. Scaling vs. Scalability
Where Are All the Giants?
Can giants exist?
Can 10,000’ beanstalk exist?
Guinness world record
Height: 8’11” (2.72 m)
Jack
Height: 1.8 m tall (L)
Weight: 90 kg
Giant (10x bigger)
Height: 18 m tall (10 × L)
L3 × 90 kg = 103 × 90 kg
Weight: 90,000 kg
A bone-crushing 100 tons!
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 5 / 45

6. Scaling vs. Scalability
Scaling vs. Scalability
Natural scaling
Inherent critical limits to sustainable loads
When the load (volume) exceeds the material strength (supporting
area), things tend to snap
Load ∼ L3 (volume), but strength ∼ L2 (cross-section area)
Computer scalability
No critical limit
Point of diminishing returns
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 6 / 45

7. Scaling vs. Scalability
Natural System Scaling
Weight
Strength
0.0 0.5 1.0 1.5 2.0
Weight
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Scalability
Giant’s legs, beanstalks, bridges, collapse where the curves cross
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 7 / 45

8. Scaling vs. Scalability
Computer System Scaling
Scaling
0 200 400 600 800 1000
Users
100
200
300
400
500
600
Scalability
Critical point is maximum in throughput curve
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 8 / 45

9. Scaling vs. Scalability
Web 2.0 Scalability Fails
Amazon EC2
Cuil.com
Apple iStore
WolframAlpha
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 9 / 45

10. Scaling vs. Scalability
Scalability is Not a Number
Google 2005 paper “Parallel Analysis with Sawzall”
“If scaling were perfect, performance would be proportional to the number of machines... In
our test, the effect is to contribute 0.98 machines.”
Translation: Not 100% linear but 98% of linear
or C++, while capable of handling such tasks, are more awkward to use and require more effort
on the part of the programmer. Still, Awk and Python are not panaceas; for instance, they have no
inherent facilities for processing data on multiple machines.
Since the data records we wish to process do live on many machines, it would be fruitful to exploit
the combined computing power to perform these analyses. In particular, if the individual steps
can be expressed as query operations that can be evaluated one record at a time, we can distribute
the calculation across all the machines and achieve very high throughput. The results of these
operations will then require an aggregation phase. For example, if we are counting records, we
need to gather the counts from the individual machines before we can report the total count.
We therefore break our calculations into two phases. The ﬁrst phase evaluates the analysis on
each record individually, while the second phase aggregates the results (Figure 2). The system
described in this paper goes even further, however. The analysis in the ﬁrst phase is expressed in a
new procedural programming language that executes one record at a time, in isolation, to calculate
query results for each record. The second phase is restricted to a set of predeﬁned aggregators
that process the intermediate results generated by the ﬁrst phase. By restricting the calculations
to this model, we can achieve very high throughput. Although not all calculations ﬁt this model
well, the ability to harness a thousand or more machines with a few lines of code provides some
compensation.
!""#\$"%&'#(
!"#\$%&'#%()*\$('
+,-\$\$%&'&.\$.'
!
!
)*+&\$#,(
/.0'&.\$.'
Figure 2: The overall ﬂow of ﬁltering, aggregating, and collating. Each stage typically
involves less data than the previous.
Of course, there are still many subproblems that remain to be solved. The calculation must be
divided into pieces and distributed across the machines holding the data, keeping the computation
as near the data as possible to avoid network bottlenecks. And when there are many machines
there is a high probability of some of them failing during the analysis, so the system must be
3
Theo Schlossnagle: “Linear scaling is simply a falsehood” p.71
Scalability is a function
Not a number
Always limits, e.g., throughput capacity
Want to quantify such limits
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 10 / 45

11. Components of Scalability
Outline
1 Scaling vs. Scalability
2 Components of Scalability
4 Problem: eBay 1.0 scalability
5 Problem: memcache scalability
6 Summary and Review
7 Resources and Coordinates
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 11 / 45

12. Components of Scalability
Math Phobes Can Relax
Proper quantiﬁcation involves math
Quantifying scalability requires some math
But nothing as complicated as this1
Pr{Murphy} =
(U + C + I) × (10 − S)
20
×
A
1 − sin(F/10)
I have no idea what this equation is (ask Theo )
1Source: Theo Schlossnagle, Scalable Intenet Architectures, p.12
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 12 / 45

13. Components of Scalability
Equal Bang for the Buck (Concurrency)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 13 / 45

14. Components of Scalability
Cost of Sharing Resources (Contention)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 14 / 45

15. Components of Scalability
Diminishing Returns (Saturation)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 15 / 45

16. Components of Scalability
Negative ROI (Inconsistency Delays)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 16 / 45

17. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

18. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
C is the scalability function of N
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

19. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
C is the scalability function of N
C(N, α, β) =
N
1 + α (N − 1) + β N(N − 1)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

20. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
C is the scalability function of N
C(N, α, β) =
N
1 + α (N − 1) + β N(N − 1)
Three Cs:
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

21. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
C is the scalability function of N
C(N, α, β) =
N
1 + α (N − 1) + β N(N − 1)
Three Cs:
1
Concurrency
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

22. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
C is the scalability function of N
C(N, α, β) =
N
1 + α (N − 1) + β N(N − 1)
Three Cs:
1
Concurrency
2
Contention (amount α)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

23. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
C is the scalability function of N
C(N, α, β) =
N
1 + α (N − 1) + β N(N − 1)
Three Cs:
1
Concurrency
2
Contention (amount α)
3
Consistency as in ACID & CAP Thm (amount β)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

24. Components of Scalability
Universal Scalability Law (USL)
Pulling it all together:
N users or processes
C is the scalability function of N
C(N, α, β) =
N
1 + α (N − 1) + β N(N − 1)
Three Cs:
1
Concurrency
2
Contention (amount α)
3
Consistency as in ACID & CAP Thm (amount β)
Theorem (Universality)
Only need α, β coefﬁcients to determine maximum in C(N)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 17 / 45

Outline
1 Scaling vs. Scalability
2 Components of Scalability
4 Problem: eBay 1.0 scalability
5 Problem: memcache scalability
6 Summary and Review
7 Resources and Coordinates
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 18 / 45

Data Are Not Divine
Data come from the Devil Models come from God
Skepticism should rule!
Theorem
Data + Models ≡ Insight
Data needs to be put in prison (a model) and made to confess the truth
Corollary
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 19 / 45

Scalability Measurements
J2EE web application
Throughput measurements using Apache Jmeter
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 20 / 45

The Problem
Monotonically increasing, looks ok visually
but some data are > 100% efﬁcient
Can’t haz
Or your have some very serious explaining to do
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 21 / 45

Excel table of various USL quantities.
Column F is scaling efﬁciency: C/N. Between N = 5 and 150 vusers,
efﬁciencies > 1.0. Can’t have more than 100% of anything. Need to explain?
Data + Model == Information
Merely attempting to set up the USL model in Excel or R, shows
measurement data (not the model) are wrong.
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 22 / 45

30. Problem: eBay 1.0 scalability
Outline
1 Scaling vs. Scalability
2 Components of Scalability
4 Problem: eBay 1.0 scalability
5 Problem: memcache scalability
6 Summary and Review
7 Resources and Coordinates
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 23 / 45

31. Problem: eBay 1.0 scalability
The Problem
Want to compare capacity upgrades for Sun E10K backend
Running ORA dbms for both OLTP and DSS
eBay 1.0 had no performance measurements of their app
eBay Inc. was just hiring into a QA/load-test group
No scalability measurements
Sun PS provided me with their M-values
M-values ⇒ α 0.005
But that’s only α =
1
2
% contention ... WTF !?
ORA dmbs is more typically α ≈ 3%
But at least we have things quantiﬁed
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 24 / 45

32. Problem: eBay 1.0 scalability
eBay 1.0 Optimistic Projections
Python: CPU Upgrade Scenarios - Optimistic
0.00
50.00
100.00
150.00
200.00
250.00
0 4 8 12 16 20 24 28 32
Weeks since 8/5/99
Total Utilization (%)
[email protected]
[email protected]
[email protected]
[email protected]
1 E10K
2 E10Ks
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 25 / 45

33. Problem: eBay 1.0 scalability
How to keep the peace?
The Solution
Apply the USL model to Sun’s M-values
C(N) =
N
1 + α(N − 1) + βN(N − 1)
ORA backend ⇒ α 0.03
Simply re-run the USL curves with that value. Voila!
Creates a scalability envelope
eBay mileage will vary within this scalability envelope
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 26 / 45

34. Problem: eBay 1.0 scalability
eBay 1.0 Realistic Projections
0.00
50.00
100.00
150.00
200.00
250.00
0 4 8 12 16 20 24 28 32
Weeks since 8/5/99
Total Utilization (%)
[email protected]
[email protected]
[email protected]
[email protected]
1 E10K
2 E10Ks
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 27 / 45

35. Problem: memcache scalability
Outline
1 Scaling vs. Scalability
2 Components of Scalability
4 Problem: eBay 1.0 scalability
5 Problem: memcache scalability
6 Summary and Review
7 Resources and Coordinates
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 28 / 45

36. Problem: memcache scalability
Velocity 2010, June 24
Velocity 2010, June 24 2
2
Scalability
Scalability
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 29 / 45

37. Problem: memcache scalability
Scale out with memcache
Tiers of older servers
Mostly single processor
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 30 / 45

38. Problem: memcache scalability
But ...
Datacenter HW gets rolled
Single-CPU blades will be replaced with multicores
Multicores will be the only game in town (HW vendor decision)
The Problem
memcached is thread limited on multicores
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 31 / 45

39. Problem: memcache scalability
The evidence
Velocity 2010, June 24
Velocity 2010, June 24 12
12
Memcached
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 32 / 45

40. Problem: memcache scalability
Performance measurement rig
SunFire X4170 w/ 64 GB RAM2
Intel Nehalem multicores
2 processor sockets ⇒ 2 quad-cores == 8 cores
Intel SMT enabled ⇒ 2 threads/ core
16 virtual CPUs seen by Solaris
2Joint work with Sun, pre-Oracle acquistion
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 33 / 45

41. Problem: memcache scalability
USL regression in Excel
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 34 / 45

42. Problem: memcache scalability
USL regression in R
# Standard non-linear least squares (NLS) fit using USL model
usl <- nls(Norm ˜ N/(1 + alpha * (N-1) + beta * N * (N-1)),
input, start=c(alpha=0.1, beta=0.01))
# Get alpha & beta parameters for use in plot legend
x.coef <- coef(usl)
# Determine sum-of-squares for R-squared coeff from NLS fit
sse <- sum((input\$Norm - predict(usl))ˆ2)
sst <- sum((input\$Norm - mean(input\$Norm))ˆ2)
# Calculate Nmax and X(Nmax)
Nmax<-sqrt((1-x.coef[’alpha’])/x.coef[’beta’])
Xmax<-input\$X_N[1]* Nmax/(1 + x.coef[’alpha’] * (Nmax-1) +
x.coef[’beta’] * Nmax * (Nmax-1))
# Plot all the results
plot(x<-c(0:max(input\$N)), input\$X_N[1] ...)
title("USL Scalability")
points(input\$N, input\$X_N)
legend("bottom", legend=eval(parse(text=sprintf(...)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 35 / 45

43. Problem: memcache scalability
Raw bench data
p
Xp
50
100
150
200
250
300
10 20 30 40 50 60
Data smoother
p
Xp
50
100
150
200
250
300
10 20 30 40 50 60
USL fit
p
Xp
50
100
150
200
250
300
10 20 30 40 50 60
USL fit + CI bands
p
Xp
50
100
150
200
250
300
10 20 30 40 50 60
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 36 / 45

44. Problem: memcache scalability
Table: Intel 2-socket duo core + SMT
Version α β Nmax
1.2.8 0.0255 0.0210 7
1.4.1 0.0821 0.0207 6
1.4.5 0.0988 0.0209 6
Little’s law 3: N = X(R + Z) threads
Also know R is on the order of ms (10−3 s), so latency dominated
by client-side “think time” Z = 5 s in tests
Avg X ≈ 350 KOPS on Intel quad-core
Therefore: N ≈ 350 × 103 × 5 = 1, 750, 000 threads
Same as users, assuming 1 user process per thread
3See e.g., Scalable Intenet Architectures, p.127
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 37 / 45

45. Problem: memcache scalability
SPARC Solaris mods
Table: SPARC T2 + Solaris
Version α β Nmax
Vanilla 0.0041 0.0092 22
Modiﬁed 0.0000 0.0004 48
The Solution
Partitioned mcd hash table
Single hash table contention avoided by partitioning table
Solaris patches improve scalability to ≈ 40 threads
Throughput X increases from 200 → 400 KOPS on SPARC CMT
Can’t assume same 2x win on x86 arch
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 38 / 45

46. Summary and Review
Outline
1 Scaling vs. Scalability
2 Components of Scalability
4 Problem: eBay 1.0 scalability
5 Problem: memcache scalability
6 Summary and Review
7 Resources and Coordinates
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 39 / 45

47. Summary and Review
Why Should You Care?
Werner Vogels, Amazon CTO
“Scalability is hard because it cannot be an after-thought. Good scalability is
possible, but only if we architect and engineer our systems to take scalability
into account.”
Old reason: Concurrent programming was hard on SMPs
New reason: Multicores are SMPs on a chip (HW vendor decision)
More threads enable higher concurrency, shorter user latencies
But it’s hard: beware the 3rd C in the USL (β coefﬁcient)
Theo Schlossnagle, OmniTI CEO
“Simply having a solution that scales horizontally doesn’t mean that you are
safe.”
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 40 / 45

48. Summary and Review
Class A Class B
Ideal concurrency (α, β = 0) Contention-only (α > 0, β = 0)
Shared-nothing platform Message-based queueing (e.g., MQSeries)
Google text search Message Passing Interface (MPI) applications
Lexus–Nexus search Transaction monitors (e.g., Tuxedo)
Read-only queries Polling service (e.g., VMWare)
Peer-to-peer (e.g., Skype)
Class C Class D
Incoherent-only (α = 0, β > 0) Worst case (α, β > 0)
Scientiﬁc HPC computations Anything with shared writes
Online analytic processing (OLAP) Hotel reservation system
Data mining Banking online transaction processing (OLTP)
Decision support software (DSS), Java database connectivity (JDBC)
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 41 / 45

49. Summary and Review
USL Scalability Zones
Think scalability zones rather than Scalability curves
A
B
C
0 20 40 60 80 100 120
N
0
200
400
600
800
1000
X N
Websphere measurements (dots)
A Asynchronous messaging (average queue lengths)
B Synchronous messaging (worst queue lengths)
C Synchronous messaging + pairwise exchanges
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 42 / 45

50. Resources and Coordinates
Outline
1 Scaling vs. Scalability
2 Components of Scalability
4 Problem: eBay 1.0 scalability
5 Problem: memcache scalability
6 Summary and Review
7 Resources and Coordinates
c 2010 Performance Dynamics Quantifying Scalability FTW October 1, 2010 43 / 45

51. Resources and Coordinates
Resources and Coordinates
Castro Valley, California, 94552
Resources:
Books
Training
USL tools
Coordinates:
www.perfdynamics.com
perfdynamics.blogspot.com