Slide 1

Slide 1 text

Velocity 6-24-2010 Velocity 6-24-2010 1 1 Hidden Scalability Hidden Scalability Gotchas Gotchas in in Memcached Memcached and Friends and Friends Neil Neil Gunther Gunther, , Performance Dynamics Performance Dynamics Shanti Subramanyam Shanti Subramanyam, , Oracle Corp Oracle Corporation oration Stefan Stefan Parvu Parvu, , Oracle Finland Oracle Finland

Slide 2

Slide 2 text

Velocity 6-24-2010 Velocity 6-24-2010 2 2 Scalability Scalability

Slide 3

Slide 3 text

Velocity 6-24-2010 Velocity 6-24-2010 3 3 Memcached Memcached scale out scale out • Tier of older servers • Mostly single CPU • Single threading ok

Slide 4

Slide 4 text

Velocity 6-24-2010 Velocity 6-24-2010 4 4 Scalability Strategies Scalability Strategies • Qualitative scalability – Scale up, e.g., big SMP servers – Scale out,e.g, many cheap servers (Unis) • Quantitative scalability – What this talk is about – Need controlled measurements – Need numbers to see cost-benefit

Slide 5

Slide 5 text

Velocity 6-24-2010 Velocity 6-24-2010 5 5 Been Bad for Web 2.0 Been Bad for Web 2.0 Lately Lately

Slide 6

Slide 6 text

Velocity 6-24-2010 Velocity 6-24-2010 6 6 Capacity Planning Capacity Planning • You know you need it – The planning bit, especially – Data ain’t information – Info is hidden in the data • Just like finance, you need a model Metrics + Models == Information

Slide 7

Slide 7 text

Velocity 6-24-2010 Velocity 6-24-2010 7 7 Controlled Controlled Measurements Measurements

Slide 8

Slide 8 text

Velocity 6-24-2010 Velocity 6-24-2010 8 8 Why Controlled Measurements? Why Controlled Measurements? Trying to predict scalability by looking at time series data is like trying to predict the stock mkt by watching the DJX ticker

Slide 9

Slide 9 text

Velocity 6-24-2010 Velocity 6-24-2010 9 9 Bad Throughput Bad Throughput Measurements Measurements Need x-axis to be load (N) defined in terms of processes or users Need throughput measured in steady state (which this isn’t)

Slide 10

Slide 10 text

Velocity 6-24-2010 Velocity 6-24-2010 10 10 Average Throughput in Time Average Throughput in Time This is what steady state looks like as function of time. It corresponds to ONE throughput load point (N).

Slide 11

Slide 11 text

Velocity 6-24-2010 Velocity 6-24-2010 11 11 Controlled MCD Tests Controlled MCD Tests Load Drivers 2 Sun Fire X4170 2 sockets, 64 GB SUT Memcached Sun Fire X4170 2 sockets, 64 GB 10 Gbe Switch

Slide 12

Slide 12 text

Velocity 6-24-2010 Velocity 6-24-2010 12 12 Memcached Memcached scaling is thread limited scaling is thread limited

Slide 13

Slide 13 text

Velocity 6-24-2010 Velocity 6-24-2010 13 13 Better on SPARC Better on SPARC Multicore Multicore

Slide 14

Slide 14 text

Velocity 6-24-2010 Velocity 6-24-2010 14 14 Quantifying Scalability Quantifying Scalability Universal Universal Scalability Law Scalability Law USL USL

Slide 15

Slide 15 text

Velocity 6-24-2010 Velocity 6-24-2010 15 15 1. Equal Bang for The Buck 1. Equal Bang for The Buck Capacity Load Ideal parallelism

Slide 16

Slide 16 text

Velocity 6-24-2010 Velocity 6-24-2010 16 16 2. Cost of Sharing Resources 2. Cost of Sharing Resources Capacity Load

Slide 17

Slide 17 text

Velocity 6-24-2010 Velocity 6-24-2010 17 17 3. Resource Limitation 3. Resource Limitation Capacity Load Amdahl’s law

Slide 18

Slide 18 text

Velocity 6-24-2010 Velocity 6-24-2010 18 18 4. Degradation Negative Return 4. Degradation Negative Return Load Capacity

Slide 19

Slide 19 text

Velocity 6-24-2010 Velocity 6-24-2010 19 19 Universal Scalability Law (USL) Universal Scalability Law (USL) ! C(N) = N 1+ "(N #1) + $N(N #1) Concurrency α = 0, β = 0 Contention α > 0, β = 0 Coherency α > 0, β > 0

Slide 20

Slide 20 text

Velocity 6-24-2010 Velocity 6-24-2010 20 20 USL regression in Excel USL regression in Excel

Slide 21

Slide 21 text

Velocity 6-24-2010 Velocity 6-24-2010 21 21 Memcached Memcached Scalability Scalability Quantitative USL Analysis Quantitative USL Analysis

Slide 22

Slide 22 text

Velocity 6-24-2010 Velocity 6-24-2010 22 22 Scalability of Scalability of mcd mcd 1.2.8 1.2.8 Nmax = 7 α = 0.0255, β = 0.0210

Slide 23

Slide 23 text

Velocity 6-24-2010 Velocity 6-24-2010 23 23 Scalability of Scalability of mcd mcd 1.4.1 1.4.1 Nmax = 6 α = 0.0821, β = 0.0207

Slide 24

Slide 24 text

Velocity 6-24-2010 Velocity 6-24-2010 24 24 Scalability of Scalability of mcd mcd 1.4.5 1.4.5 Nmax = 6 α = 0.0988, β = 0.0209

Slide 25

Slide 25 text

Velocity 6-24-2010 Velocity 6-24-2010 25 25 Scalability of SPARC version Scalability of SPARC version α = 0, β = 0.000434 Nmax = 22 α = 0.0041, β = 0.00197

Slide 26

Slide 26 text

Velocity 6-24-2010 Velocity 6-24-2010 26 26 USL projected scalability USL projected scalability α = 0.0041, β = 0.00197 Nmax = 22 Nmax = 48 α = 0, β = 0.000434

Slide 27

Slide 27 text

Velocity 6-24-2010 Velocity 6-24-2010 27 27 Parameter interpretation Parameter interpretation • Why α ~ 0 – Cache further partitioned – Single lock replaced by multiple locks • Why β > 0? – Is it in mcd code? – Could it be in O/S, H/W, …?

Slide 28

Slide 28 text

Velocity 6-24-2010 Velocity 6-24-2010 28 28 Scaling Among Friends Scaling Among Friends Scalability as a function of Scalability as a function of virtual virtual users users ( (“ “friends friends” ”) not threads ) not threads

Slide 29

Slide 29 text

Velocity 6-24-2010 Velocity 6-24-2010 29 29 JAppServer JAppServer USL Analysis USL Analysis N = 700 users α = 0.00001486 β = 6.7E-9 N = 1200 users α = 0 β = 2.4E-7

Slide 30

Slide 30 text

Velocity 6-24-2010 Velocity 6-24-2010 30 30 Scalability on Amazon EC2 Scalability on Amazon EC2 Nmax = 22 α = 0.038988298 β = 0.001432176

Slide 31

Slide 31 text

Velocity 6-24-2010 Velocity 6-24-2010 31 31 Memcached Gotchas Memcached Gotchas

Slide 32

Slide 32 text

Velocity 6-24-2010 Velocity 6-24-2010 32 32 Just throw more hardware at Just throw more hardware at it! it!

Slide 33

Slide 33 text

Velocity 6-24-2010 Velocity 6-24-2010 33 33 Old scaling rules will be broken Old scaling rules will be broken • Current scale-out strategy relies on using older cheap hardware • Older hardware is often single CPU – Single-threadedness of mcd is ok • Newer hardware will be multicore – New hardware is faster with lots of cores – But mcd won’t be able to utilize all cores – Multiple mcd instances are mgmt headache

Slide 34

Slide 34 text

Velocity 6-24-2010 Velocity 6-24-2010 34 34 Single threading can wreck you Single threading can wreck you

Slide 35

Slide 35 text

Velocity 6-24-2010 Velocity 6-24-2010 35 35 Summary Summary • Current mcd versions are thread limited – OK for older uniprocessor servers – Not OK for deployment on new multicores – Reason: unused processor capacity • Controlled measurements – Not time-series prod data – Steady state throughput • Quantify scalability – Data + Models == Information – Goal is reduce contention (α) and coherency (β) – Nmax: increased from 6 to 48 threads

Slide 36

Slide 36 text

Velocity 6-24-2010 Velocity 6-24-2010 36 36 Resources Resources • Neil • perfdynamics.blogspot.com • twitter.com/DrQz • www.perfdynamics.com/books.html • www.perfdynamics.com/Manifesto/USLscalability.html • Shanti – perfwork.wordpress.com – twitter.com/shantiS • Stefan – www.systemdatarecorder.org – twitter.com/sperformance