Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Hidden Scalability Gotchas in Memcached and Friends

Hidden Scalability Gotchas in Memcached and Friends

Presentation at Velocity - Web Performance and Operations Conference, The Hyatt Regency Santa Clara, CA, June 22-24, 2010

Dr. Neil Gunther

June 24, 2010
Tweet

More Decks by Dr. Neil Gunther

Other Decks in Technology

Transcript

  1. Velocity 6-24-2010
    Velocity 6-24-2010 1
    1
    Hidden Scalability
    Hidden Scalability Gotchas
    Gotchas
    in
    in Memcached
    Memcached and Friends
    and Friends
    Neil
    Neil Gunther
    Gunther,
    , Performance Dynamics
    Performance Dynamics
    Shanti Subramanyam
    Shanti Subramanyam,
    , Oracle Corp
    Oracle Corporation
    oration
    Stefan
    Stefan Parvu
    Parvu,
    , Oracle Finland
    Oracle Finland

    View Slide

  2. Velocity 6-24-2010
    Velocity 6-24-2010 2
    2
    Scalability
    Scalability

    View Slide

  3. Velocity 6-24-2010
    Velocity 6-24-2010 3
    3
    Memcached
    Memcached scale out
    scale out
    • Tier of older servers
    • Mostly single CPU
    • Single threading ok

    View Slide

  4. Velocity 6-24-2010
    Velocity 6-24-2010 4
    4
    Scalability Strategies
    Scalability Strategies
    • Qualitative scalability
    – Scale up, e.g., big SMP servers
    – Scale out,e.g, many cheap servers (Unis)
    • Quantitative scalability
    – What this talk is about
    – Need controlled measurements
    – Need numbers to see cost-benefit

    View Slide

  5. Velocity 6-24-2010
    Velocity 6-24-2010 5
    5
    Been Bad for Web 2.0
    Been Bad for Web 2.0 Lately
    Lately

    View Slide

  6. Velocity 6-24-2010
    Velocity 6-24-2010 6
    6
    Capacity Planning
    Capacity Planning
    • You know you need it
    – The planning bit, especially
    – Data ain’t information
    – Info is hidden in the data
    • Just like finance, you need a model
    Metrics + Models == Information

    View Slide

  7. Velocity 6-24-2010
    Velocity 6-24-2010 7
    7
    Controlled
    Controlled Measurements
    Measurements

    View Slide

  8. Velocity 6-24-2010
    Velocity 6-24-2010 8
    8
    Why Controlled Measurements?
    Why Controlled Measurements?
    Trying to predict scalability by looking at time series data is
    like trying to predict the stock mkt by watching the DJX ticker

    View Slide

  9. Velocity 6-24-2010
    Velocity 6-24-2010 9
    9
    Bad Throughput
    Bad Throughput Measurements
    Measurements
    Need x-axis to be load (N) defined in terms of processes or users
    Need throughput measured in steady state (which this isn’t)

    View Slide

  10. Velocity 6-24-2010
    Velocity 6-24-2010 10
    10
    Average Throughput in Time
    Average Throughput in Time
    This is what steady state looks like as function of time.
    It corresponds to ONE throughput load point (N).

    View Slide

  11. Velocity 6-24-2010
    Velocity 6-24-2010 11
    11
    Controlled MCD Tests
    Controlled MCD Tests
    Load Drivers
    2 Sun Fire X4170
    2 sockets, 64 GB
    SUT
    Memcached
    Sun Fire X4170
    2 sockets, 64 GB
    10 Gbe Switch

    View Slide

  12. Velocity 6-24-2010
    Velocity 6-24-2010 12
    12
    Memcached
    Memcached scaling is thread limited
    scaling is thread limited

    View Slide

  13. Velocity 6-24-2010
    Velocity 6-24-2010 13
    13
    Better on SPARC
    Better on SPARC Multicore
    Multicore

    View Slide

  14. Velocity 6-24-2010
    Velocity 6-24-2010 14
    14
    Quantifying Scalability
    Quantifying Scalability
    Universal
    Universal Scalability Law
    Scalability Law
    USL
    USL

    View Slide

  15. Velocity 6-24-2010
    Velocity 6-24-2010 15
    15
    1. Equal Bang for The Buck
    1. Equal Bang for The Buck
    Capacity
    Load
    Ideal parallelism

    View Slide

  16. Velocity 6-24-2010
    Velocity 6-24-2010 16
    16
    2. Cost of Sharing Resources
    2. Cost of Sharing Resources
    Capacity
    Load

    View Slide

  17. Velocity 6-24-2010
    Velocity 6-24-2010 17
    17
    3. Resource Limitation
    3. Resource Limitation
    Capacity
    Load
    Amdahl’s law

    View Slide

  18. Velocity 6-24-2010
    Velocity 6-24-2010 18
    18
    4. Degradation Negative Return
    4. Degradation Negative Return
    Load
    Capacity

    View Slide

  19. Velocity 6-24-2010
    Velocity 6-24-2010 19
    19
    Universal Scalability Law (USL)
    Universal Scalability Law (USL)
    !
    C(N) =
    N
    1+ "(N #1) + $N(N #1)
    Concurrency
    α = 0, β = 0
    Contention
    α > 0, β = 0
    Coherency
    α > 0, β > 0

    View Slide

  20. Velocity 6-24-2010
    Velocity 6-24-2010 20
    20
    USL regression in Excel
    USL regression in Excel

    View Slide

  21. Velocity 6-24-2010
    Velocity 6-24-2010 21
    21
    Memcached
    Memcached Scalability
    Scalability
    Quantitative USL Analysis
    Quantitative USL Analysis

    View Slide

  22. Velocity 6-24-2010
    Velocity 6-24-2010 22
    22
    Scalability of
    Scalability of mcd
    mcd 1.2.8
    1.2.8
    Nmax = 7
    α = 0.0255, β = 0.0210

    View Slide

  23. Velocity 6-24-2010
    Velocity 6-24-2010 23
    23
    Scalability of
    Scalability of mcd
    mcd 1.4.1
    1.4.1
    Nmax = 6
    α = 0.0821, β = 0.0207

    View Slide

  24. Velocity 6-24-2010
    Velocity 6-24-2010 24
    24
    Scalability of
    Scalability of mcd
    mcd 1.4.5
    1.4.5
    Nmax = 6
    α = 0.0988, β = 0.0209

    View Slide

  25. Velocity 6-24-2010
    Velocity 6-24-2010 25
    25
    Scalability of SPARC version
    Scalability of SPARC version
    α = 0, β = 0.000434
    Nmax = 22
    α = 0.0041, β = 0.00197

    View Slide

  26. Velocity 6-24-2010
    Velocity 6-24-2010 26
    26
    USL projected scalability
    USL projected scalability
    α = 0.0041, β = 0.00197
    Nmax = 22
    Nmax = 48
    α = 0, β = 0.000434

    View Slide

  27. Velocity 6-24-2010
    Velocity 6-24-2010 27
    27
    Parameter interpretation
    Parameter interpretation
    • Why α ~ 0
    – Cache further partitioned
    – Single lock replaced by multiple locks
    • Why β > 0?
    – Is it in mcd code?
    – Could it be in O/S, H/W, …?

    View Slide

  28. Velocity 6-24-2010
    Velocity 6-24-2010 28
    28
    Scaling Among Friends
    Scaling Among Friends
    Scalability as a function of
    Scalability as a function of
    virtual
    virtual users
    users (
    (“
    “friends
    friends”
    ”) not threads
    ) not threads

    View Slide

  29. Velocity 6-24-2010
    Velocity 6-24-2010 29
    29
    JAppServer
    JAppServer USL Analysis
    USL Analysis
    N = 700 users
    α = 0.00001486
    β = 6.7E-9
    N = 1200 users
    α = 0
    β = 2.4E-7

    View Slide

  30. Velocity 6-24-2010
    Velocity 6-24-2010 30
    30
    Scalability on Amazon EC2
    Scalability on Amazon EC2
    Nmax = 22
    α = 0.038988298
    β = 0.001432176

    View Slide

  31. Velocity 6-24-2010
    Velocity 6-24-2010 31
    31
    Memcached Gotchas
    Memcached Gotchas

    View Slide

  32. Velocity 6-24-2010
    Velocity 6-24-2010 32
    32
    Just throw more hardware at
    Just throw more hardware at it!
    it!

    View Slide

  33. Velocity 6-24-2010
    Velocity 6-24-2010 33
    33
    Old scaling rules will be broken
    Old scaling rules will be broken
    • Current scale-out strategy relies on using
    older cheap hardware
    • Older hardware is often single CPU
    – Single-threadedness of mcd is ok
    • Newer hardware will be multicore
    – New hardware is faster with lots of cores
    – But mcd won’t be able to utilize all cores
    – Multiple mcd instances are mgmt headache

    View Slide

  34. Velocity 6-24-2010
    Velocity 6-24-2010 34
    34
    Single threading can wreck you
    Single threading can wreck you

    View Slide

  35. Velocity 6-24-2010
    Velocity 6-24-2010 35
    35
    Summary
    Summary
    • Current mcd versions are thread limited
    – OK for older uniprocessor servers
    – Not OK for deployment on new multicores
    – Reason: unused processor capacity
    • Controlled measurements
    – Not time-series prod data
    – Steady state throughput
    • Quantify scalability
    – Data + Models == Information
    – Goal is reduce contention (α) and coherency (β)
    – Nmax: increased from 6 to 48 threads

    View Slide

  36. Velocity 6-24-2010
    Velocity 6-24-2010 36
    36
    Resources
    Resources
    • Neil
    • perfdynamics.blogspot.com
    • twitter.com/DrQz
    • www.perfdynamics.com/books.html
    • www.perfdynamics.com/Manifesto/USLscalability.html
    • Shanti
    – perfwork.wordpress.com
    – twitter.com/shantiS
    • Stefan
    – www.systemdatarecorder.org
    – twitter.com/sperformance

    View Slide