Superlinear Speedup: The Perpetual Motion of Parallel Performance

First public presentation (2013) of superlinear performance analyzed using the universal scalability law (USL). Includes the Payback theorem at the end.

of “super linear” data starting c. 2010: Some users complain USL doesn’t work for superlinearity! But precious little correct data (e.g., none on Wikipedia) Likely to see more superlinearity in distributed systems Can’t just ignore it or people will abandon USL Super linear speedup described on Wikipedia (must be true) Add 3rd parameter to USL: To ﬁt superlinear data Headache the size of an elephant April 2012 discovered stunningly simple result No modiﬁcation to USL equation (Huh?) Ramiﬁcations for scalability analysis are quite profound Like perpetual motion, if it’s too good to be true... c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 2 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead Calculate scalability curve from performance measurements c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

6 and 14 Chapters 4–6 Also check out: Special USL web page Guerrilla perf and CaP classes c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 5 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (0 < α < 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

C(N) = N 1 + α (N − 1) + β N(N − 1) Gene Amdahl (1967): brute force measurement for α Clever way: Apply statistical regression I will use R: FOSS package with 40 yr history (since S at Bell Labs) Sophisticated/accurate statistical tools Interpreted programming language (cf. Mathematica) Magic functions in R: nls() nonlinear LSQ ﬁt (α, β in one swell foop) optimize() to estimate X(1) if missing predict() for smooth interpolation/extrapolation from data plot() with many variants c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 8 / 64

key-value pairs Pre-loaded from RDBMS Tier of cheap, older CPUs (e.g., not multicore) Single threading ok, until next hardware roll c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 12 / 64

proxy caching system Sits in front of classic web server Caching handled by virtual memory Highly scalable (linear) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 17 / 64

contraptions violate conservation of energy law. Super efﬁciency is tantamount to getting more than 100% of something. You know it’s wrong but proving it is usually the harder part. a. Z-Torque bicycle crank b. Negative Kelvin temperatures c. Superluminal neutrinos Performance super efﬁciency Superlinear scalability (hardware or software) exhibits measured throughput performance that exceeds 100% of available capacity. Needs explaining (or debugging). c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 33 / 64

12, 2013) Inventor tries to raise $1000s in start-up capital through crowd funding a super-efﬁcient bicycle crank. [Source: Slashdot] c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 34 / 64

12, 2013) Inventor tries to raise $1000s in start-up capital through crowd funding a super-efﬁcient bicycle crank. [Source: Slashdot] Bug: Bad physics Somebody doesn’t understand vector moments. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 34 / 64

3, 2013) Ultracold potassium gas reaches T < 0 ◦K. Impossible! Published in Nature. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 35 / 64

3, 2013) Ultracold potassium gas reaches T < 0 ◦K. Impossible! Published in Nature. Normal ground state Flipped ground state Bug: Maybe not Depends how you deﬁne temperature. Shortly, we’ll see negative time. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 35 / 64

2011) Italian OPERA experiment measured LHC neutrinos vν > c with 6σ conﬁdence. Einstein wrong! Published arXiv.org > hep-ex > arXiv:1109.4897 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 36 / 64

2011) Italian OPERA experiment measured LHC neutrinos vν > c with 6σ conﬁdence. Einstein wrong! Published arXiv.org > hep-ex > arXiv:1109.4897 Bug: Dec 14, 2011 Screwed by a $0.50 ﬁber connector not being screwed tight. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 36 / 64

Ê Ê Ê Ê Gradient inflection Gradient maximum 0 5 10 15 20 0 5 10 15 20 General form appears to be: Ideal linear slope: C(N)/N = 100% Data above linear slope: C(N)/N > 100% Point of inﬂection Otherwise convex upward: C(N) → ∞ Maximum in gradient Degradation beyond max Is it always like this? c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 41 / 64

γ) = N exp(−γ(N − 1)) + α(N − 1) + βN(N − 1) (1) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Tue 11 Oct 2011 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 3-Parameter Model Properties of eqn. (1): e−γ(N−1) → 1 as γ → 0 γ = 0 same as USL NLS ﬁt parameters: α = 0.001 β = 0.00425 γ = 0.1 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 42 / 64

ﬁt an elephant. With ﬁve I can make his trunk wiggle.” —John von Neumann params = 1 params = 2 params = 3 params = 4 params = 1 params = 2 params = 3 params = 4 See my animated blog post: A Winking Pink Elephant c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 43 / 64

N 1 + α(N − 1) + βN(N − 1) (2) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Thu 19 Apr 2012 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 2-Parameter Model NLS ﬁt parameters: α = −0.0859 β = 0.0064 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 44 / 64

N 1 + α(N − 1) + βN(N − 1) (2) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Thu 19 Apr 2012 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 2-Parameter Model NLS ﬁt parameters: α = −0.0859 β = 0.0064 Properties of eqn. (2): It’s our fave USL (Hello!) But α < 0 allowed Capacity credit Still have β > 0 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 44 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting My talk took less than zero time or negative time (∆ttalk < 0) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting My talk took less than zero time or negative time (∆ttalk < 0) Think of the non-trip time as a time credit Proposition (Faster than parallel) Negative α induces a negative execution time (i.e., a time credit) due to latent additional resources (e.g., more memory or cache) and that translates into performance that is faster than parallel. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

as system is scaled out Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0 1 2 3 4 5 6 p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 49 / 64

as system is scaled out Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 p 0.5 0.5 1.0 C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 50 / 64

how it would appear in scalability measurements 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Linear 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Sublinear 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Superlinear Can generalize this concept to nonlinear scalability c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 51 / 64

= N 1 − α(N − 1) + βN(N − 1) Superlinear Payback 0 5 10 15 20 N 0 5 10 15 20 C N (a) Data in superlinear zone where C(N)/N > 100% like perpetual motion (b) Data in payback zone paying the piper sudden degradation where C(N)/N 100% (c) Is it always like this? c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 56 / 64

N N C N Proof. Linear bound: C(N)/N = 1 (dashed line) Super efﬁcient region: Csl (N)/N > 1 Superlinear segment curved upward by α < 0 (convex function) Asymptote at N = Nα (vertical line) where Csl (N, −α) → ∞ c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 58 / 64

N C N N C N Proof. A physical capacity bound must exist (dashed horizontal line) Csl (N) scaling curve will saturate below that bound (2nd red segment) That saturation segment must cross linear bound at Nx Therefore, must be an inﬂection point in Csl (N) at N± < Nx c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 59 / 64

N C N N C N Proof. Inﬂection point N± joins superlinear and saturation segments Csl (N) crosses linear bound at Nx = |α/β| Since α < 0, crossing can only arise from coherency term with β > 0 Hence, superlinearity always induces coherency roll off (payback) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 60 / 64

Requires α, β > 0 for C(N) to be concave function Superlinear measurements C(N)/N > 1 do exist Extra ﬁtting parameter C(N, α, β, γ) ⇒ JvN elephants Discovered superlinear USL with α < 0 Super-efﬁciencies are not free Like perpetual motion: no free lunch pay the piper eventually debugging it is the hard part Thm: Superlinearity always followed by capacity degradation More (Oracle ???) superlinear measurements would be good c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 63 / 64