Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hidden Scalability Gotchas in Memcached and Friends

Hidden Scalability Gotchas in Memcached and Friends

Presentation at Velocity - Web Performance and Operations Conference, The Hyatt Regency Santa Clara, CA, June 22-24, 2010

Ced140140e9ae226f0d9ef0fbb84a3a1?s=128

Dr. Neil Gunther

June 24, 2010
Tweet

Transcript

  1. Velocity 6-24-2010 Velocity 6-24-2010 1 1 Hidden Scalability Hidden Scalability

    Gotchas Gotchas in in Memcached Memcached and Friends and Friends Neil Neil Gunther Gunther, , Performance Dynamics Performance Dynamics Shanti Subramanyam Shanti Subramanyam, , Oracle Corp Oracle Corporation oration Stefan Stefan Parvu Parvu, , Oracle Finland Oracle Finland
  2. Velocity 6-24-2010 Velocity 6-24-2010 2 2 Scalability Scalability

  3. Velocity 6-24-2010 Velocity 6-24-2010 3 3 Memcached Memcached scale out

    scale out • Tier of older servers • Mostly single CPU • Single threading ok
  4. Velocity 6-24-2010 Velocity 6-24-2010 4 4 Scalability Strategies Scalability Strategies

    • Qualitative scalability – Scale up, e.g., big SMP servers – Scale out,e.g, many cheap servers (Unis) • Quantitative scalability – What this talk is about – Need controlled measurements – Need numbers to see cost-benefit
  5. Velocity 6-24-2010 Velocity 6-24-2010 5 5 Been Bad for Web

    2.0 Been Bad for Web 2.0 Lately Lately
  6. Velocity 6-24-2010 Velocity 6-24-2010 6 6 Capacity Planning Capacity Planning

    • You know you need it – The planning bit, especially – Data ain’t information – Info is hidden in the data • Just like finance, you need a model Metrics + Models == Information
  7. Velocity 6-24-2010 Velocity 6-24-2010 7 7 Controlled Controlled Measurements Measurements

  8. Velocity 6-24-2010 Velocity 6-24-2010 8 8 Why Controlled Measurements? Why

    Controlled Measurements? Trying to predict scalability by looking at time series data is like trying to predict the stock mkt by watching the DJX ticker
  9. Velocity 6-24-2010 Velocity 6-24-2010 9 9 Bad Throughput Bad Throughput

    Measurements Measurements Need x-axis to be load (N) defined in terms of processes or users Need throughput measured in steady state (which this isn’t)
  10. Velocity 6-24-2010 Velocity 6-24-2010 10 10 Average Throughput in Time

    Average Throughput in Time This is what steady state looks like as function of time. It corresponds to ONE throughput load point (N).
  11. Velocity 6-24-2010 Velocity 6-24-2010 11 11 Controlled MCD Tests Controlled

    MCD Tests Load Drivers 2 Sun Fire X4170 2 sockets, 64 GB SUT Memcached Sun Fire X4170 2 sockets, 64 GB 10 Gbe Switch
  12. Velocity 6-24-2010 Velocity 6-24-2010 12 12 Memcached Memcached scaling is

    thread limited scaling is thread limited
  13. Velocity 6-24-2010 Velocity 6-24-2010 13 13 Better on SPARC Better

    on SPARC Multicore Multicore
  14. Velocity 6-24-2010 Velocity 6-24-2010 14 14 Quantifying Scalability Quantifying Scalability

    Universal Universal Scalability Law Scalability Law USL USL
  15. Velocity 6-24-2010 Velocity 6-24-2010 15 15 1. Equal Bang for

    The Buck 1. Equal Bang for The Buck Capacity Load Ideal parallelism
  16. Velocity 6-24-2010 Velocity 6-24-2010 16 16 2. Cost of Sharing

    Resources 2. Cost of Sharing Resources Capacity Load
  17. Velocity 6-24-2010 Velocity 6-24-2010 17 17 3. Resource Limitation 3.

    Resource Limitation Capacity Load Amdahl’s law
  18. Velocity 6-24-2010 Velocity 6-24-2010 18 18 4. Degradation Negative Return

    4. Degradation Negative Return Load Capacity
  19. Velocity 6-24-2010 Velocity 6-24-2010 19 19 Universal Scalability Law (USL)

    Universal Scalability Law (USL) ! C(N) = N 1+ "(N #1) + $N(N #1) Concurrency α = 0, β = 0 Contention α > 0, β = 0 Coherency α > 0, β > 0
  20. Velocity 6-24-2010 Velocity 6-24-2010 20 20 USL regression in Excel

    USL regression in Excel
  21. Velocity 6-24-2010 Velocity 6-24-2010 21 21 Memcached Memcached Scalability Scalability

    Quantitative USL Analysis Quantitative USL Analysis
  22. Velocity 6-24-2010 Velocity 6-24-2010 22 22 Scalability of Scalability of

    mcd mcd 1.2.8 1.2.8 Nmax = 7 α = 0.0255, β = 0.0210
  23. Velocity 6-24-2010 Velocity 6-24-2010 23 23 Scalability of Scalability of

    mcd mcd 1.4.1 1.4.1 Nmax = 6 α = 0.0821, β = 0.0207
  24. Velocity 6-24-2010 Velocity 6-24-2010 24 24 Scalability of Scalability of

    mcd mcd 1.4.5 1.4.5 Nmax = 6 α = 0.0988, β = 0.0209
  25. Velocity 6-24-2010 Velocity 6-24-2010 25 25 Scalability of SPARC version

    Scalability of SPARC version α = 0, β = 0.000434 Nmax = 22 α = 0.0041, β = 0.00197
  26. Velocity 6-24-2010 Velocity 6-24-2010 26 26 USL projected scalability USL

    projected scalability α = 0.0041, β = 0.00197 Nmax = 22 Nmax = 48 α = 0, β = 0.000434
  27. Velocity 6-24-2010 Velocity 6-24-2010 27 27 Parameter interpretation Parameter interpretation

    • Why α ~ 0 – Cache further partitioned – Single lock replaced by multiple locks • Why β > 0? – Is it in mcd code? – Could it be in O/S, H/W, …?
  28. Velocity 6-24-2010 Velocity 6-24-2010 28 28 Scaling Among Friends Scaling

    Among Friends Scalability as a function of Scalability as a function of virtual virtual users users ( (“ “friends friends” ”) not threads ) not threads
  29. Velocity 6-24-2010 Velocity 6-24-2010 29 29 JAppServer JAppServer USL Analysis

    USL Analysis N = 700 users α = 0.00001486 β = 6.7E-9 N = 1200 users α = 0 β = 2.4E-7
  30. Velocity 6-24-2010 Velocity 6-24-2010 30 30 Scalability on Amazon EC2

    Scalability on Amazon EC2 Nmax = 22 α = 0.038988298 β = 0.001432176
  31. Velocity 6-24-2010 Velocity 6-24-2010 31 31 Memcached Gotchas Memcached Gotchas

  32. Velocity 6-24-2010 Velocity 6-24-2010 32 32 Just throw more hardware

    at Just throw more hardware at it! it!
  33. Velocity 6-24-2010 Velocity 6-24-2010 33 33 Old scaling rules will

    be broken Old scaling rules will be broken • Current scale-out strategy relies on using older cheap hardware • Older hardware is often single CPU – Single-threadedness of mcd is ok • Newer hardware will be multicore – New hardware is faster with lots of cores – But mcd won’t be able to utilize all cores – Multiple mcd instances are mgmt headache
  34. Velocity 6-24-2010 Velocity 6-24-2010 34 34 Single threading can wreck

    you Single threading can wreck you
  35. Velocity 6-24-2010 Velocity 6-24-2010 35 35 Summary Summary • Current

    mcd versions are thread limited – OK for older uniprocessor servers – Not OK for deployment on new multicores – Reason: unused processor capacity • Controlled measurements – Not time-series prod data – Steady state throughput • Quantify scalability – Data + Models == Information – Goal is reduce contention (α) and coherency (β) – Nmax: increased from 6 to 48 threads
  36. Velocity 6-24-2010 Velocity 6-24-2010 36 36 Resources Resources • Neil

    • perfdynamics.blogspot.com • twitter.com/DrQz • www.perfdynamics.com/books.html • www.perfdynamics.com/Manifesto/USLscalability.html • Shanti – perfwork.wordpress.com – twitter.com/shantiS • Stefan – www.systemdatarecorder.org – twitter.com/sperformance