Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Little's Law in 3D and Storage Performance

Little's Law in 3D and Storage Performance

Presentation at the Northern California CMG regional meeting.

Dr. Neil Gunther

August 07, 2012
Tweet

More Decks by Dr. Neil Gunther

Other Decks in Technology

Transcript

  1. Little’s Law in 3D and Storage Performance
    NorCal CMG Meeting
    Dr. Neil Gunther
    Performance Dynamics
    August 7, 2012
    SM
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 1 / 34

    View Slide

  2. Background
    Outline
    1 Background
    Review Little’s Law
    The Utilization Law
    2 Throughput-Delay Plots
    Need for Speed
    Benchmarking Paradox
    Paradox Resolved
    3 Storage Performance
    Throughput
    Latency
    Concurrency
    4 Conclusion
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 2 / 34

    View Slide

  3. Background
    Little’s Law
    1 What is it?
    1
    If your data don’t fit LL, change your data!
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34

    View Slide

  4. Background
    Little’s Law
    1 What is it?
    N = XR
    1
    If your data don’t fit LL, change your data!
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34

    View Slide

  5. Background
    Little’s Law
    1 What is it?
    N = XR
    An immutable law of
    performance 1
    1
    If your data don’t fit LL, change your data!
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34

    View Slide

  6. Background
    Little’s Law
    1 What is it?
    N = XR
    An immutable law of
    performance 1
    2 Why is it important?
    1
    If your data don’t fit LL, change your data!
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34

    View Slide

  7. Background
    Little’s Law
    1 What is it?
    N = XR
    An immutable law of
    performance 1
    2 Why is it important?
    L = λW proven 1961
    1
    If your data don’t fit LL, change your data!
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34

    View Slide

  8. Background
    Little’s Law
    1 What is it?
    N = XR
    An immutable law of
    performance 1
    2 Why is it important?
    L = λW proven 1961
    Algebraic simplification
    1
    If your data don’t fit LL, change your data!
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34

    View Slide

  9. Background
    Little’s Law
    1 What is it?
    N = XR
    An immutable law of
    performance 1
    2 Why is it important?
    L = λW proven 1961
    Algebraic simplification
    Cross-checking
    J.D.C. Little’s lore (in his own words): perfdynamics.blogspot.com/2011/07/
    1
    If your data don’t fit LL, change your data!
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 3 / 34

    View Slide

  10. Background
    A Little Historical Perspective
    LL is not based on queueing theory
    LL relates inventory and manufacturing cycle time
    John Little (now 84) is not a computer performance analyst
    Prof. Little did not invent his own law
    LL was known to A. K. Erlang more than 100 years ago
    There are actually two versions of Little’s law
    A Paradox
    1 LL expresses the fact that R decreases with increasing X
    2 Benchmarks show R increases with increasing throughput X
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 4 / 34

    View Slide

  11. Background
    Purpose of This Talk
    1 Review LL (both versions)
    2 Resolve the XR paradox by introducing 3D version of LL
    3 Apply LL to understand IOPS bottleneck
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 5 / 34

    View Slide

  12. Background Review Little’s Law
    Little’s Law at the System Level
    In steady state, the mean rate of arrival (λ) of customers into a system is equal to the mean
    output rate or throughput (X) of customers departing the system.
    λ = X (1)
    The total number of customers, requests, processes, threads (N) in the system is given by:
    N = λR = XR (2)
    where R is the mean total time spent in the system.
    Classic Little’s law
    N is the mean number of customers/requests in residence.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 6 / 34

    View Slide

  13. Background Review Little’s Law
    Little’s Law at the Device Level
    If the system is like a grocery store, the device level is like a checkout lane.
    Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com
    Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com
    Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com
    Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com
    Please link back to the page you downloaded this from, or just link to parkablogs.blogspot.com
    At any device (labelled k = 1, 2, . . .), equation (2) yields the local number of customers/requests
    (Qk ) enqueued:
    Qk = λRk (3)
    where Rk is the time in residence at the device. Rk is defined as the sum of the service time
    (Sk ) at the cashier and the time (Wk ) spent waiting to get serviced by the cashier:
    Rk = Wk + Sk (4)
    The total number, N, in the global system (2) is the sum of all the customers/requests enqueued
    at each device:
    N = Q1 + Q2 + · · · + Qk (5)
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 7 / 34

    View Slide

  14. Background The Utilization Law
    Little’s Law and Device Utilization
    The utilization of the device comes from (3) by ignoring the waiting time contribution. Logically,
    this is equivalent to letting W → 0:
    Qk = λRk
    = λ(Wk + Sk )
    → λSk (6)
    We changed the right side of (6), so the left side must also be changed. But to what? It has to be
    number (like N) and Qk can be unbounded: Qk < ∞ (but not infinite).
    Call the “new” number ρk (to agree with queueing literature) so that (6) becomes:
    ρk = λSk (7)
    Since the cashier cannot service more than one customer at a time:
    ρk < 1 (8)
    or ρk < 100%, on average.
    Little’s utilization law
    The utilization ρk is the mean number of customers/requests in service at device k.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 8 / 34

    View Slide

  15. Throughput-Delay Plots
    Outline
    1 Background
    Review Little’s Law
    The Utilization Law
    2 Throughput-Delay Plots
    Need for Speed
    Benchmarking Paradox
    Paradox Resolved
    3 Storage Performance
    Throughput
    Latency
    Concurrency
    4 Conclusion
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 9 / 34

    View Slide

  16. Throughput-Delay Plots Need for Speed
    Speed, Distance and Time
    Example
    Driving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will you
    travel in 15 minutes?
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 10 / 34

    View Slide

  17. Throughput-Delay Plots Need for Speed
    Speed, Distance and Time
    Example
    Driving on the freeway at 60 mph. At that speed, you travel a mile a minute. How far will you
    travel in 15 minutes?
    Answer
    In a quarter of an hour you will travel one quarter the distance you would have covered in an
    hour. Therefore, in 15 minutes you will travel 15 miles.
    Congratulations! You just used LL without realizing it.
    Let X be the speed, R the elapsed time and N the miles covered:
    N = X R
    15 miles = 60 mph ×
    15
    60
    hours
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 10 / 34

    View Slide

  18. Throughput-Delay Plots Need for Speed
    Speed and Delay are Inversely Related
    Example
    Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How
    fast do you need to go?
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34

    View Slide

  19. Throughput-Delay Plots Need for Speed
    Speed and Delay are Inversely Related
    Example
    Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How
    fast do you need to go?
    The answer may not be so obvious, but not to worry. We can still use LL.
    Answer
    N = X R
    15 miles = X ×
    10
    60
    hours
    Solving for X:
    X = 15 × 6 = 90 mph
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34

    View Slide

  20. Throughput-Delay Plots Need for Speed
    Speed and Delay are Inversely Related
    Example
    Now, suppose it’s an emergency and you need to cover the same distance in 10 minutes. How
    fast do you need to go?
    The answer may not be so obvious, but not to worry. We can still use LL.
    Answer
    N = X R
    15 miles = X ×
    10
    60
    hours
    Solving for X:
    X = 15 × 6 = 90 mph
    Theorem (Inverse Proportion of LL)
    To reduce the delay R (elapsed time), the speed X must be increased.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 11 / 34

    View Slide

  21. Throughput-Delay Plots Need for Speed
    XR Plots of LL
    N 1
    N 15
    N 50
    0 5 10 15
    R
    0
    5
    10
    15
    X
    N 1
    N 15
    N 50
    0 5 10 15
    X
    0
    5
    10
    15
    R
    Example was for the N = 15 miles curve
    Time for N = 15 miles is reduced by going from green to red dot
    Different distance means a different curve
    Curves are symmetric about the diagonal
    Can flip X and R axes w/o changing the curves
    Independent variable goes on x-axis
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 12 / 34

    View Slide

  22. Throughput-Delay Plots Benchmarking Paradox
    Benchmark XR Plots
    NSPLab Dec 2007 Hitachi Jan 2012
    8
    SPEC SFS: Performance Plots
    NFSops/Second
    Response Time (mSec)
    0
    5
    10
    15
    20
    25
    30
    35
    40
    45
    50
    0 500 1000 1500 2000 2500 3000
    SC2000
    NS6000
    SPEC SFS97
    !"#$%'()'*+,*%&-.'(&-/(.&)&.$#0(*'12$*'%'-#"(+,*(3456(5'&*.7(5'*8'*(9:;:(+,*(57&*'<,$-#( (
    @$#7($/A'(.,-#'-#(.*&BA"(C",A$/(A$-'"DE(F?(&.7$'8'"(&*,2-/(9G(1)"E(H'+,*'(/'=*&/$-=(#,(&*,2-/(9:(
    I<5(2-/'*(,8'*A,&/(.,-/$#$,-"E(B$#7(JK(H'.,%$-=(#7'(H,##A'-'.LM(F;:($"(&HA'(#,(/'A$8'*(G:(I<5E(
    (B7$.7(),$-#($#(H'.,%'"(A$%$#'/(H0(#7'(#7*,2=7)2#(N$-.*'&"'/(#7$"(H'-'+$#('8'-(%,*'(#7&-(#7'(%'&"2*'/(G:P(=&$-M(
    Q2*$-=(.,-#'-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-='"($-(12'*0()'*+,*%&-.'(.,%)&*'/(#,($/A'M(
    Fusion-io SQLServer 2010
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 13 / 34

    View Slide

  23. Throughput-Delay Plots Benchmarking Paradox
    LL is 3-Dimensional
    0
    20
    40
    X QPS
    0.0
    0.5
    1.0
    1.5
    2.0
    R s
    0
    50
    100
    N
    Three variables (like PVT in chemistry)
    3D surface
    Like a cone but not rotationally symmetric about apex
    Square edges cause hyperbolic contours
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 14 / 34

    View Slide

  24. Throughput-Delay Plots Benchmarking Paradox
    Fusion-io Benchmark
    !"#$%'()'*+,*%&-.'(&-/(.&)&.$#0(*'12$*'%'-#"(+,*(3456(5'&*.7(5'*8'*(9:;:(+,*(57&*'<,$-#( (
    @$#7($/A'(.,-#'-#(.*&BA"(C",A$/(A$-'"DE(F?(&.7$'8'"(&*,2-/(9G(1)"E(H'+,*'(/'=*&/$-=(#,(&*,2-/(9:(
    I<5(2-/'*(,8'*A,&/(.,-/$#$,-"E(B$#7(JK(H'.,%$-=(#7'(H,##A'-'.LM(F;:($"(&HA'(#,(/'A$8'*(G:(I<5E(
    (B7$.7(),$-#($#(H'.,%'"(A$%$#'/(H0(#7'(#7*,2=7)2#(N$-.*'&"'/(#7$"(H'-'+$#('8'-(%,*'(#7&-(#7'(%'&"2*'/(G:P(=&$-M(
    Q2*$-=(.,-#'-#(.*&BA$-=M(F?(7&"(-,("$=-$+$.&-#(.7&-='"($-(12'*0()'*+,*%&-.'(.,%)&*'/(#,($/A'M(
    J#($"(&.7$'8$-=(.*&BA(&-/(12'*0(A,&/("')&*$,-(H0(2"$-=(&-(&//$#$,-&A("'#(,+("'*8'*"($-(&(
    /'/$.'/("'&*.7(*,BM(F;:(='#"(",%'(/'=*&/$,-E(&"(.,-#'-#()*,.'""$-=(&-/(12'*$'"(.,%)'#'(
    +,*(#7'("&%'(N.,-/$#$,-"M(4A",(-,#'(#7&-(#7'(.,-#'-#(.*&BA$-=(*'(,-(F;:($"(9:P(7$=7'*(#7&-(,-(F?(/2*$-=(
    #7$"(#'"#E(&"(#7'($-.*'&"'/(JK()'*+,*%&-.'(&AA,B"(+,*(H'##'*(7&-/A$-=(,+(#7'(.,-.2**'-#(
    ,)'*$,-"M(
    "#$%!&$'()!
    R(
    *+,)-!,#$%!&$'()!
    67'(#&HA'(H'A,B("7,B"(#7'(.,%H$-'/($-.*'&"'($-(/$"L(2"&='(,-(&AA(-,/'"(&+#'*(#7'(8&*$,2"(
    .,-#'-#(",2*.'"(7&8'(H''-($-/'S'/M(
    Actual data (with and without FIO)
    5 10 15 20 25 30
    X
    0.0
    0.5
    1.0
    1.5
    R
    Extracted data
    SQL Server RDBMS: Measure X in QPS and R in s at each load (N)
    Two curves: before (red) and after (blue) application of FIO device
    Manually extracted pertinent data points
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 15 / 34

    View Slide

  25. Throughput-Delay Plots Paradox Resolved
    Back to the Paradox
    The XR Paradox
    1 LL says R decreases with increasing X (3D contour lines)
    2 Benchmarks show R increases with increasing throughput X
    5 10 15 20 25 30
    X
    0.0
    0.5
    1.0
    1.5
    R
    Extracted data
    0 10 20 30 40 50
    X
    0.0
    0.5
    1.0
    1.5
    2.0
    R
    Data moves on LL contours
    The Resolution
    Superimpose LL 3D contours onto 2D benchmark data.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 16 / 34

    View Slide

  26. Throughput-Delay Plots Paradox Resolved
    2D Projection of 3D Surface
    22 24 26 28 30
    X QPS
    0.6
    0.8
    1.0
    1.2
    1.4
    R s
    Theorem (Gunther 2012)
    All benchmark data “moves” along LL contours.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 17 / 34

    View Slide

  27. Storage Performance
    Outline
    1 Background
    Review Little’s Law
    The Utilization Law
    2 Throughput-Delay Plots
    Need for Speed
    Benchmarking Paradox
    Paradox Resolved
    3 Storage Performance
    Throughput
    Latency
    Concurrency
    4 Conclusion
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 18 / 34

    View Slide

  28. Storage Performance Throughput
    System Level Query Rate
    Example
    Suppose processing a query requires the execution of 100 K instructions on the CPU. The CPU can
    execute 10 GIPS.
    1 IPQ: 100 K = 100 × 103 instruction per application query
    2 IPS: 10 GIPS = 10 × 109 cpu instructions per second
    The throughput (or request rate) for queries is:
    λQPS =
    IPS
    IPQ
    =
    10 × 109
    100 × 103
    =
    1010
    105
    = 100, 000
    The steady state assumption (1) tells us:
    λQPS = 100 KQPS = XQPS
    (9)
    A maximum of 100 KQPS can be processed
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 19 / 34

    View Slide

  29. Storage Performance Throughput
    Storage Device IO Rate
    Example (cont’d)
    Assume further that within the query instructions a single IO is issued. The CPU thread must
    wait before the rest of the query instructions can be completed.
    This creates a nice convenience since λIOPS ≡ λQPS
    .
    λIOPS =
    QPS
    IOPQ
    =
    105
    1
    = 100, 000
    λIOPS = 100 KIOPS = XIOPS
    (10)
    Device IOPS
    But this is aggregate IOPS. How many IOPS can a single disk do?
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 20 / 34

    View Slide

  30. Storage Performance Throughput
    Device IOPS Rating
    Example (Seagate IOPS)
    A Seagate Barracuda 7200 RPM disk is capable of about 100 IOPS. Follows from combined
    seek time and RPS time being on the order of 10 ms. Hence:
    IOPS =
    1
    0.010
    = 100 (11)
    Simple arithmetic suggests that 1000 Seagate Barracudas would needed to accommodate the
    100 KIOPS aggregate throughput being considered here.
    Caveat emptor
    Note that (11) is a rearrangement of the LL utilization law (7):
    λIOPS =
    ρ
    Sdisk
    (12)
    with ρ = 1. Hence, it is the theoretical maximum possible IOPS that this disk can support. In
    practice, the sustainable IOPS rate will be considerably lower.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 21 / 34

    View Slide

  31. Storage Performance Latency
    Storage Latency
    Example (cont’d)
    If the storage device is capable of responding to an IO request in 1 ms (10x Seagate
    Barracuda), the processor needs to issue 100 concurrent IO requests to the storage system so
    that it can complete 100 KQPS. If the storage device were 10 times faster (e.g., SSD), then the
    processor would only need to be handing a 10th as many IO requests, or just 10 concurrent
    requests.
    Sdisk = 10−3 s
    Sssd = 10−4 s
    Applying the LL utilization law (7):
    ρdisk = λIOPS
    Sdisk = 105 × 10−3 = 100 (13)
    Suggests we need more than 100 spindles. Similary, for faster SSD devices:
    ρssd = λIOPS
    Sssd = 105 × 10−4 = 10 (14)
    Latency
    Latency is an ill-defined word that means different things to different technical people. Need the
    more exacting language of queueing theory to see where different latencies arise.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 22 / 34

    View Slide

  32. Storage Performance Latency
    Tandem Queue Model
    Since computer systems are not deterministic, we represent CPU and storage as a queueing
    network with two stages:
    Scpu Sdev Snk
    !
    Src
    !
    Queries are sourced by the application at an aggregate request rate of λ = 100 KQPS and the
    CPU issues IO requests at the rate of 100 KIOPS.
    However, from (13) we know ρdisk = 100 or 10,000% !!
    Trouble
    This violates the utilization bound ρdisk < 1 given by (8).
    We already suspected we would need at least 100 spindles from (13).
    But how should the disks be arranged to give the correct latencies?
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 23 / 34

    View Slide

  33. Storage Performance Latency
    Parallel Disk Queues
    The message from LL (8) is that we need many (q) disks operating in parallel.
    Scpu
    !
    !/q
    !/q
    !/q
    !/q
    Sink
    !
    Source
    !
    Parallel disks divide the total throughput (λ) into q substreams, each load-balanced with equal
    rate λ/q. Moreover, considering (13), we can write:
    ρdisk =
    100
    q
    < 1 (15)
    LL tells us we actually need more than q = 100 disks to satisfy the utilization bound.
    Disk Arrays
    This is why typical storage subsystems are configured as arrays.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 24 / 34

    View Slide

  34. Storage Performance Latency
    CPU Latency
    CPU service time (i.e., execution time) for a query:
    SCPU =
    IPQ
    IPS
    = 10−5 seconds (16)
    i.e., 10 µs per query . The mean CPU utilization is:
    ρCPU = λQPS
    SCPU = 105 × 10−5 = 1
    which is right on the edge of the utilization bound.
    Scpu
    Scpu
    Scpu
    Snk
    Src
    So, we need more than one core or execution unit.
    Duo-core
    LL tells us we need a duo-core, at least, to meet the utilization bound.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 25 / 34

    View Slide

  35. Storage Performance Concurrency
    Multicore with Infinite IOPS
    Example (cont’d)
    If the storage system is capable of responding to an IO request in 1 ms
    · · · · · ·
    If the storage were 10 times faster in responding with I/O requests...
    These numbers become Sdev in the following diagram.
    Sdev
    Sdev
    Sdev
    Scpu
    Scpu
    !
    Src
    ! Snk
    !/q
    !/q
    !/q
    !/q
    !
    We use this queueing model to examine both latency and concurrency effects.
    “Infinite IOPS” is represented by 1000 parallel storage devices.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 26 / 34

    View Slide

  36. Storage Performance Concurrency
    Queueing Model Results
    Example (cont’d)
    If the storage system is capable of responding to I/O requests in 1/1,000th of a second, then
    the CPU will need to issue N = 100 concurrent requests
    · · · · · ·
    If the storage were 10 times faster then the processor would only need to be handing 1/10th as
    many concurrent requests, or just N = 10 concurrent requests.
    Latency Concurrency
    Device (#) Service Residence Qk N
    CPU (2) 0.00001 0.0000133333 1.33333 1.33333
    Disk (1000) 0.001000 0.001111 0.1111 111.1
    SSD (1000) 0.000100 0.0001010 0.01010 10.10
    FIOa (1000) 1.000 × 10−6 1.000 × 10−6 0.0001 0.1000
    FIOb (1) 1.000 × 10−6 1.111 × 10−6 0.1111 0.1111
    The overall time in the system, per LL in eqn. (2), is the sum of the CPU residence time (1st row,
    3rd column) and the residence time of an IO at the respective storage device.
    With 1000 disks, N = 111.1 concurrent IOs.
    With 1000 SSDs, N = 10.1 concurrent IOs.
    With 1000 FIOs, N = 0.1 concurrent IOs. But wait! It gets even better...
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 27 / 34

    View Slide

  37. Conclusion
    Outline
    1 Background
    Review Little’s Law
    The Utilization Law
    2 Throughput-Delay Plots
    Need for Speed
    Benchmarking Paradox
    Paradox Resolved
    3 Storage Performance
    Throughput
    Latency
    Concurrency
    4 Conclusion
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 28 / 34

    View Slide

  38. Conclusion
    Latency Trumps IOPS
    CPU The residence time RCPU
    is 33% bigger than query execution time, SCPU
    .
    In general, this time can be reduced further with more cores.
    Disk All 1000 disks have S = 1 ms service time.
    Residence time is twice the service time.
    Concurrent IO threads Nio = 111.
    These threads also have to be managed by the OS (not shown).
    Threads management also uses up CPU cycles (not shown).
    Response time = 0.000013 + 0.001111 is dominated by disk latency.
    SSD Faster “SSD” (10x) with nominal S = 0.1 ms service time.
    Residence time is now close to service time.
    Concurrency is also reduced by 10x to N = 10 threads.
    Response time = 0.000013 + 0.0001010 still dominated by storage latency.
    FIOa Fusion flash service time S = 1 microsecond.
    Residence time is equal to the device service time.
    Concurrent IO threads N = 0.1 are negligible.
    Response time = 0.000013 + 0.000001 is now CPU-bound.
    FIOb Bigger message: Don’t need 1000 Fusion flash devices.
    Small NFIOa = 0.1 means a single FIO device has same IO concurrency.
    A single Fusion card can replace 1000 standard devices!
    SAN in your hand
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 29 / 34

    View Slide

  39. Conclusion
    ioDrive2 Duo 2.4TB
    From the Fusion-io web site
    Read bandwidth 3.0 GB/s
    Write bandwidth 2.5 GB/s
    Sequential read 892,000 IOPS
    Sequential write 935,000 IOPS
    Random read 285,000 IOPS
    Random write 725,000 IOPS
    Read access latency 68 µs
    Write access latency 15 µs
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 30 / 34

    View Slide

  40. Conclusion
    Summary
    LL is really 3D (3 variables: N, X, R).
    LL has 2 versions: N = XR (with waiting) and ρ = XS (no waiting).
    Assume no bandwidth limit and choose throughput target (here, 100 KQPS).
    With current tech, LL tells us we need parallel devices (disk array, multicore).
    Storage “latency” (service times) orders of magnitude longer than CPU execution times.
    The number of outstanding IOs determines the the total (response) time in the system to
    complete each application query: R = W + S.
    Rstor Rcpu so, storage latency dominates system response time.
    If can make Rstor Rcpu, then outstanding IOs become negligible.
    Application query times determined soley by the CPU execution time.
    A CPU-bound application is always the optimal goal.
    Fusion-io also eliminates IO controller latency: all data gets closer to CPU.
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 31 / 34

    View Slide

  41. Conclusion
    Table: Comparative storage device attributes
    Storage Type Relative Latency Relative
    Technology Persistent Controller Device Cost
    Disk Yes High High Low
    SSD Yes High Low High
    Fusion-IO Yes Low Low High
    RAM No Low Low Highest
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 32 / 34

    View Slide

  42. Conclusion
    Guerrilla Training
    Wanna learn about more stuff like this? Come to class
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 33 / 34

    View Slide

  43. Conclusion
    Thank you for your participation
    Performance Dynamics Company
    Castro Valley, California
    www.perfdynamics.com
    perfdynamics.blogspot.com
    twitter.com/DrQz
    facebook.com/Performance-Dynamics -Company
    [email protected]
    +1-510-537-5758
    c 2012 Performance Dynamics Little’s Law in 3D and Storage Performance August 7, 2012 34 / 34

    View Slide