$30 off During Our Annual Pro Sale. View Details »

Neha Narula on The Scalable Commutativity Rule

Neha Narula on The Scalable Commutativity Rule

Moore's law is over, or at least, we won't be making programs go faster by running on faster processors, but instead by parallelizing our code to use more of them. Reasoning about concurrent code is difficult; but it's also very hard to understand whether your design has latent scalability bottlenecks until you can actually run it on many cores. And what if the problem is in your interface, instead of just the implementation?

This paper presents a simple, elegant rule: whenever interface operations commute, they can be implemented in a way that scales.

The authors apply this idea to Linux, and create a new operating system by using the rule, sv6. Their paper also comes with software, COMMUTER, which can help developers evaluate their interfaces to find opportunities for scaling.

This is a very powerful idea, and probably has applications in other areas like distributed systems. In this talk I'll present the paper, and speculate a bit about where else this research could be useful.

Papers_We_Love

April 01, 2015
Tweet

More Decks by Papers_We_Love

Other Decks in Research

Transcript

  1. The Scalable Commutativity
    Rule
    by Austin Clements, Frans Kaashoek, Nickolai
    Zeldovich, Robert Morris, and Eddie Kohler

    Papers We Love NYC
    April 1, 2015

    View Slide

  2. Neha Narula
    Ph.D. candidate at MIT
    – Working on high performance concurrency
    control in databases and distributed systems
    – How do we get high performance and strong
    consistency?
    Formerly @Google

    http://nehanaru.la
    @neha

    View Slide

  3. A Few Caveats

    View Slide

  4. A Few Caveats

    View Slide

  5. View Slide

  6. Talk Outline
    •  Problem
    •  Scalable Commutativity Rule
    •  Applying the Rule
    •  Speculation

    View Slide

  7. CPU Trends

    View Slide

  8. CPU Trends

    View Slide

  9. A Scalability Bottleneck
    one contended cache line

    View Slide

  10. Cost of One Contended Cache Line

    View Slide

  11. Current Software Development
    •  Benchmark, re-design, test
    •  Hard to know what problems might arise in
    the future
    •  The real bottlenecks might be in the
    interface design, not just the
    implementation

    View Slide

  12. What Scales on Today’s Multicores?
    •  Cache coherence: the MESI protocol
    •  Reads do not conflict, reads and writes or
    writes and writes do
    •  Conflict-free is a good proxy for scalability
    Two operations are scalable if they are conflict-free.

    View Slide

  13. Talk Outline
    •  Problem
    •  Scalable Commutativity Rule
    •  Applying the Rule
    •  Speculation

    View Slide

  14. Interface Scalability

    View Slide

  15. Interface Scalability

    View Slide

  16. Interface Scalability
    Change the interface?

    View Slide

  17. The Scalable Commutativity Rule
    Whenever interface operations commute, they
    can be implemented in a way that scales.
    Commutes
    Scalable
    implementation
    exists
    creat with lowest fd ?
    creat -> 3
    creat -> 4

    View Slide

  18. The Scalable Commutativity Rule
    Whenever interface operations commute, they
    can be implemented in a way that scales.
    Commutes
    Scalable
    implementation
    exists
    creat with lowest fd

    View Slide

  19. The Scalable Commutativity Rule
    Whenever interface operations commute, they
    can be implemented in a way that scales.
    Commutes
    Scalable
    implementation
    exists
    creat with lowest fd
    creat with any fd ?
    creat -> 13
    creat -> 47

    View Slide

  20. The Scalable Commutativity Rule
    Whenever interface operations commute, they
    can be implemented in a way that scales.
    Commutes
    Scalable
    implementation
    exists
    creat with lowest fd
    creat with any fd
    rule

    View Slide

  21. Intuition Behind Rule
    When operations commute
    – The results are independent of order
    – Communication is unnecessary
    – And without communication, no conflicts

    View Slide

  22. Example: Reference Counter
    T1
    T2
    T3
    T4
    T5
    iszero() F
    iszero() F


    dec() 2



    dec() 1




    dec() 0
    R1 commutes; conflict free implementation: shared counter
    R2 does not commute because dec() returns counter value
    R1 R2

    View Slide

  23. Example: Reference Counter
    T1
    T2
    T3
    T4
    T5
    iszero() F
    iszero() F


    dec() ok



    dec() ok




    dec() ok
    R1 commutes; conflict free implementation: shared counter
    R2 does not commute because dec() returns counter value
    R2’ does commute; conflict-free implementation: per-core counter
    R3 depends on state
    Initial value > 3 Initial value ≤ 3
    R1 R2’
    R3

    View Slide

  24. Formalizing the Rule
    •  History
    •  Specification
    •  Reordering
    •  Commutativity

    View Slide

  25. Histories and Specifications
    A history H is sequence of invocations and
    responses on threads.
    A specification ζ defines an interface. ζ is the set
    of legal histories given the allowed behavior of the
    interface.

    View Slide

  26. Reordering
    A reordering H’ is a permutation of H that maintains
    operations order for each individual thread (H|t = H’|t for all
    t).

    View Slide

  27. Commutativity

    A region Y of a legal history XY SIM-
    commutes if every reordering Y’ of Y also
    yields a legal history and every legal extension
    Z of XY is also a legal extension of XY’.

    (And this must be true for every prefix of every
    reordering of Y.)

    View Slide

  28. The Formal Rule
    Let ζ be a specification with a reference
    implementation M. Consider a history where XY
    where Y commutes in XY and M can generate XY.

    There exists a correct implementation of ζ whose
    execution of XY is conflict-free in the commutative
    region Y.

    View Slide

  29. Talk Outline
    •  Problem
    •  Scalable Commutativity Rule
    •  Applying the Rule
    •  Speculation

    View Slide

  30. View Slide

  31. Commuter
    •  Input: Symbolic Model
    •  Analyzer computes
    commutativity conditions
    •  Testgen computes test cases
    •  Mtrace detects conflict

    View Slide

  32. Example: rename()
    rename(a, b) and rename(c, d) commute if:
    •  Both source files exist and all names are different
    •  Neither source file exists
    •  a xor c exists, and it is not the other rename's destination
    •  One call is a self-rename of an existing file and a ≠ c
    •  a and c are hard links to the same inode, a ≠ c, and b = d
    •  Both calls are self-renames

    Important to have discriminating commutativity conditions
    •  ∀states, rename almost never commutes
    •  More commutative cases ⇒ more opportunities to scale
    •  Captures more operations applications usually do

    View Slide

  33. View Slide

  34. Commuter Finds Non-scalable
    Cases in Linux
    •  Directory-wide locking
    •  File descriptor reference counts
    •  Address space-wide locking

    View Slide

  35. sv6: A Scalable OS
    •  POSIX-like operating system
    •  File system and virtual memory system
    follow commutativity rule
    •  Implementation using standard parallel
    programming techniques, but guided by
    Commuter

    View Slide

  36. (sv6)

    View Slide

  37. Remaining 1% Idempotent Updates
    •  Two lseeks of same FD to the same offset
    •  Two pwrites of same data to same offset

    View Slide

  38. Refining POSIX with the Rule
    •  Lowest FD versus any FD
    •  stat versus xstat
    •  Unordered sockets
    •  Delayed munmap
    •  fork+exec versus posix_spawn

    View Slide

  39. What Can We Learn?
    •  Embrace non-determinism
    •  Decompose compound operations
    •  Permit weak ordering
    •  Release resources asynchronously

    View Slide

  40. Commutative Operations Matter

    View Slide

  41. View Slide

  42. Talk Outline
    •  Problem
    •  Scalable Commutativity Rule
    •  Applying the Rule
    •  Speculation

    View Slide

  43. Limitations of the Rule
    •  Rule says a scalable implementation exists.
    – It might not have the best raw performance
    – You might need different scalable
    implementations for different regions
    – How do I find this implementation?
    •  The non-scalable non-commutativity rule
    •  Synchronized clocks

    View Slide

  44. Distributed Systems and Databases
    •  Reads still don’t conflict, but no cache
    coherence for invalidations
    •  Rule should still apply to message passing
    systems
    •  Commutative concurrency control

    View Slide

  45. View Slide

  46. View Slide

  47. View Slide

  48. View Slide

  49. Thanks!



    The Scalable Commutativity Rule
    http://pdos.csail.mit.edu/commuter/


    @neha

    View Slide