Superlinear Speedup: The Perpetual Motion of Parallel Performance

First public presentation (2013) of superlinear performance analyzed using the universal scalability law (USL). Includes the Payback theorem at the end.

Superlinear Speedup The Perpetual Motion of Parallel Performance Dr. Neil Gunther Performance Dynamics Hotsos Symposium March 5, 2013 SM c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 1 / 64

Outline Quick review 20 years of USL scalability analysis Appearance of “super linear” data starting c. 2010: Some users complain USL doesn’t work for superlinearity! But precious little correct data (e.g., none on Wikipedia) Likely to see more superlinearity in distributed systems Can’t just ignore it or people will abandon USL Super linear speedup described on Wikipedia (must be true) Add 3rd parameter to USL: To ﬁt superlinear data Headache the size of an elephant April 2012 discovered stunningly simple result No modiﬁcation to USL equation (Huh?) Ramiﬁcations for scalability analysis are quite profound Like perpetual motion, if it’s too good to be true... c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 2 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL How to Quantify Scalability Previous USL presentations at Hotsos: Hotsos 2007: “Guerrilla Scalability: How To Do Virtual Load Testing” Hotsos 2010: “How to Quantify Oracle Database Scalability: Fundamentals” Hotsos 2011: “Brooks, Cooks, and Response Time Scalability” Equal bang for the buck: linear concurrency Diminishing Returns: contention overhead Negative return on investment: coherency overhead Calculate scalability curve from performance measurements c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 4 / 64

Review of USL Also ended up in my books Chapters 6 and 14 Chapters 4–6 Also check out: Special USL web page Guerrilla perf and CaP classes c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 5 / 64

Review of USL Universal Scalability Law (USL) N virtual users or processes provide load c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Universal Scalability Law (USL) N virtual users or processes provide load C(N) relative capacity function of N c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Universal Scalability Law (USL) N virtual users or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Universal Scalability Law (USL) N virtual users or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Universal Scalability Law (USL) N virtual users or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Universal Scalability Law (USL) N virtual users or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (0 < α < 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Universal Scalability Law (USL) N virtual users or processes provide load C(N) relative capacity function of N But what function? CN(α, β) = N 1 + α (N − 1) + β N(N − 1) Three Cs: 1 Concurrency 2 Contention (0 < α < 1) 3 Coherency (0 < β < 1) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 6 / 64

Review of USL Concave shape of USL function Xdata(N) Xdata(1) → CN(α, β) = N 1 + α(N − 1) + βN(N − 1) 0 2 4 6 8 10 N 0.2 0.4 0.6 0.8 1.0 1.2 1.4 C Α,Β Handles scalability degradation (universal) Goal is to get rid of scalability maximum c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 7 / 64

Review of USL How do we determine α and β? C(N) = N 1 + α (N − 1) + β N(N − 1) Gene Amdahl (1967): brute force measurement for α Clever way: Apply statistical regression I will use R: FOSS package with 40 yr history (since S at Bell Labs) Sophisticated/accurate statistical tools Interpreted programming language (cf. Mathematica) Magic functions in R: nls() nonlinear LSQ ﬁt (α, β in one swell foop) optimize() to estimate X(1) if missing predict() for smooth interpolation/extrapolation from data plot() with many variants c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 8 / 64

Application of USL Memcache Memcache Joint work with S. Subramanyam (Sun, USA) and S. Parvu (Sun, FI) Presented at Velocity 2010 conference c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 10 / 64

Application of USL Memcache Memcache: Scaleout strategy Distributed cache of key-value pairs Pre-loaded from RDBMS Tier of cheap, older CPUs (e.g., not multicore) Single threading ok, until next hardware roll c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 12 / 64

Application of USL Varnish Varnish Data by D. Popa (DigitAir, RO) via S. Parvu (Nokia, FI) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 16 / 64

Application of USL Varnish Varnish: architecture HTTP accelerator Reverse web proxy caching system Sits in front of classic web server Caching handled by virtual memory Highly scalable (linear) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 17 / 64

Superlinearity Something for nothing Recent examples Perpetual motion Perpetual motion contraptions violate conservation of energy law. Super efﬁciency is tantamount to getting more than 100% of something. You know it’s wrong but proving it is usually the harder part. a. Z-Torque bicycle crank b. Negative Kelvin temperatures c. Superluminal neutrinos Performance super efﬁciency Superlinear scalability (hardware or software) exhibits measured throughput performance that exceeds 100% of available capacity. Needs explaining (or debugging). c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 33 / 64

Superlinearity Something for nothing a. Z-Torque bicycle crank Conjecture (Jan 12, 2013) Inventor tries to raise $1000s in start-up capital through crowd funding a super-efﬁcient bicycle crank. [Source: Slashdot] c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 34 / 64

Superlinearity Something for nothing a. Z-Torque bicycle crank Conjecture (Jan 12, 2013) Inventor tries to raise $1000s in start-up capital through crowd funding a super-efﬁcient bicycle crank. [Source: Slashdot] Bug: Bad physics Somebody doesn’t understand vector moments. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 34 / 64

Superlinearity Something for nothing b. Negative Kelvin temperatures Conjecture (Jan 3, 2013) Ultracold potassium gas reaches T < 0 ◦K. Impossible! Published in Nature. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 35 / 64

Superlinearity Something for nothing b. Negative Kelvin temperatures Conjecture (Jan 3, 2013) Ultracold potassium gas reaches T < 0 ◦K. Impossible! Published in Nature. Normal ground state Flipped ground state Bug: Maybe not Depends how you deﬁne temperature. Shortly, we’ll see negative time. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 35 / 64

Superlinearity Something for nothing c. Superluminal neutrinos Conjecture (Sept 23, 2011) Italian OPERA experiment measured LHC neutrinos vν > c with 6σ conﬁdence. Einstein wrong! Published arXiv.org > hep-ex > arXiv:1109.4897 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 36 / 64

Superlinearity Something for nothing c. Superluminal neutrinos Conjecture (Sept 23, 2011) Italian OPERA experiment measured LHC neutrinos vν > c with 6σ conﬁdence. Einstein wrong! Published arXiv.org > hep-ex > arXiv:1109.4897 Bug: Dec 14, 2011 Screwed by a $0.50 ﬁber connector not being screwed tight. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 36 / 64

Superlinearity Something for nothing Application superlinearity—This is what it looks like 0 20 40 60 80 0 50000 100000 150000 200000 Clients (N) TPS X(N) Raw data for PG92flX c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 37 / 64

Superlinearity Something for nothing Another way to screw everything up Median Throughput Comparison Threads Throughput, NOT/10sec 0 2000 4000 6000 8000 10000 1 4 16 64 256 1024 Clustrix ! 3 Nodes Clustrix ! 6 Nodes Clustrix ! 9 Nodes Intel SSD HP/FusionIO See the problem? Don’t use log-linear axes. (And certainly not base-2 logs.) Without warning the reader ... BIG TIME! c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 39 / 64

Superlinearity Mathematica modeling Generic form of superlinear scaling Ê Ê Ê Ê Ê Ê Gradient inflection Gradient maximum 0 5 10 15 20 0 5 10 15 20 General form appears to be: Ideal linear slope: C(N)/N = 100% Data above linear slope: C(N)/N > 100% Point of inﬂection Otherwise convex upward: C(N) → ∞ Maximum in gradient Degradation beyond max Is it always like this? c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 41 / 64

Superlinearity Mathematica modeling Plausible 3-parameter USL model CN (α, β, γ) = N exp(−γ(N − 1)) + α(N − 1) + βN(N − 1) (1) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Tue 11 Oct 2011 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 3-Parameter Model Properties of eqn. (1): e−γ(N−1) → 1 as γ → 0 γ = 0 same as USL NLS ﬁt parameters: α = 0.001 β = 0.00425 γ = 0.1 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 42 / 64

Superlinearity Mathematica modeling Parameterized Elephant “With four parameters I can ﬁt an elephant. With ﬁve I can make his trunk wiggle.” —John von Neumann params = 1 params = 2 params = 3 params = 4 params = 1 params = 2 params = 3 params = 4 See my animated blog post: A Winking Pink Elephant c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 43 / 64

Superlinearity Mathematica modeling Magic Moment !!! CN (α, β) = N 1 + α(N − 1) + βN(N − 1) (2) Ê Ê Ê Ê Ê Ê Neil J. Gunther, Thu 19 Apr 2012 0 5 10 15 20 25 N 0 5 10 15 CHNL USL 2-Parameter Model NLS ﬁt parameters: α = −0.0859 β = 0.0064 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 44 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little Story: I was supposed to give a talk c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little Story: I was supposed to give a talk but the meeting got cancelled c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting My talk took less than zero time or negative time (∆ttalk < 0) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α A Little Story: I was supposed to give a talk but the meeting got cancelled That means my talk took zero elapsed time (∆ttalk = 0) But that’s assuming I was already in the room It was cancelled before I made the trip to the meeting My talk took less than zero time or negative time (∆ttalk < 0) Think of the non-trip time as a time credit Proposition (Faster than parallel) Negative α induces a negative execution time (i.e., a time credit) due to latent additional resources (e.g., more memory or cache) and that translates into performance that is faster than parallel. c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 45 / 64

Superlinearity Mathematica modeling The Meaning of Negative α in USL Initial unit of computing capacity p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 46 / 64

Superlinearity Mathematica modeling Positive α Some fraction of original capacity lost to overhead Α p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 47 / 64

Superlinearity Mathematica modeling Negative α Some fraction of original capacity is added (opposite sign) Α Α p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 48 / 64

Superlinearity Mathematica modeling Positive α Capacity Scaling Growing capacity loss as system is scaled out Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0 1 2 3 4 5 6 p C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 49 / 64

Superlinearity Mathematica modeling Negative α Capacity Scaling Growing capacity increase as system is scaled out Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 p 0.5 0.5 1.0 C p c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 50 / 64

Superlinearity Mathematica modeling Negative α in the Data This is how it would appear in scalability measurements 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Linear 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Sublinear 0 1 2 3 4 5 6 p 0 1 2 3 4 5 6 C p Superlinear Can generalize this concept to nonlinear scalability c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 51 / 64

Superlinearity Postgres 9.2FL superlinearity Superlinear scaling zones CN (α, β) = N 1 − α(N − 1) + βN(N − 1) Superlinear Payback 0 5 10 15 20 N 0 5 10 15 20 C N (a) Data in superlinear zone where C(N)/N > 100% like perpetual motion (b) Data in payback zone paying the piper sudden degradation where C(N)/N 100% (c) Is it always like this? c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 56 / 64

Superlinearity Postgres 9.2FL superlinearity Visual proof: Superlinear asymptote N C N N C N Proof. Linear bound: C(N)/N = 1 (dashed line) Super efﬁcient region: Csl (N)/N > 1 Superlinear segment curved upward by α < 0 (convex function) Asymptote at N = Nα (vertical line) where Csl (N, −α) → ∞ c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 58 / 64

Superlinearity Postgres 9.2FL superlinearity Visual proof: Upper bound and Saturation N C N N C N Proof. A physical capacity bound must exist (dashed horizontal line) Csl (N) scaling curve will saturate below that bound (2nd red segment) That saturation segment must cross linear bound at Nx Therefore, must be an inﬂection point in Csl (N) at N± < Nx c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 59 / 64

Superlinearity Postgres 9.2FL superlinearity Visual proof: Inﬂection, Crossing and Degradation N C N N C N Proof. Inﬂection point N± joins superlinear and saturation segments Csl (N) crosses linear bound at Nx = |α/β| Since α < 0, crossing can only arise from coherency term with β > 0 Hence, superlinearity always induces coherency roll off (payback) c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 60 / 64

Summary Summary USL is 2-parameter scalability model C(N, α, β) Requires α, β > 0 for C(N) to be concave function Superlinear measurements C(N)/N > 1 do exist Extra ﬁtting parameter C(N, α, β, γ) ⇒ JvN elephants Discovered superlinear USL with α < 0 Super-efﬁciencies are not free Like perpetual motion: no free lunch pay the piper eventually debugging it is the hard part Thm: Superlinearity always followed by capacity degradation More (Oracle ???) superlinear measurements would be good c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 63 / 64

Summary Thank you for attending! Castro Valley, California www.perfdynamics.com perfdynamics.blogspot.com Twitter/DrQz Facebook [email protected] +1-510-537-5758 c 2014 Performance Dynamics Superlinear Speedup October 15, 2014 64 / 64