Neha Narula on The Scalable Commutativity Rule

Neha Narula on The Scalable Commutativity Rule

Moore's law is over, or at least, we won't be making programs go faster by running on faster processors, but instead by parallelizing our code to use more of them. Reasoning about concurrent code is difficult; but it's also very hard to understand whether your design has latent scalability bottlenecks until you can actually run it on many cores. And what if the problem is in your interface, instead of just the implementation?

This paper presents a simple, elegant rule: whenever interface operations commute, they can be implemented in a way that scales.

The authors apply this idea to Linux, and create a new operating system by using the rule, sv6. Their paper also comes with software, COMMUTER, which can help developers evaluate their interfaces to find opportunities for scaling.

This is a very powerful idea, and probably has applications in other areas like distributed systems. In this talk I'll present the paper, and speculate a bit about where else this research could be useful.

66402e897ef8d00d5a1ee30dcb5774f2?s=128

Papers_We_Love

April 01, 2015
Tweet

Transcript

  1. The Scalable Commutativity Rule by Austin Clements, Frans Kaashoek, Nickolai

    Zeldovich, Robert Morris, and Eddie Kohler Papers We Love NYC April 1, 2015
  2. Neha Narula Ph.D. candidate at MIT – Working on high performance

    concurrency control in databases and distributed systems – How do we get high performance and strong consistency? Formerly @Google http://nehanaru.la @neha
  3. A Few Caveats

  4. A Few Caveats

  5. None
  6. Talk Outline •  Problem •  Scalable Commutativity Rule •  Applying

    the Rule •  Speculation
  7. CPU Trends

  8. CPU Trends

  9. A Scalability Bottleneck one contended cache line

  10. Cost of One Contended Cache Line

  11. Current Software Development •  Benchmark, re-design, test •  Hard to

    know what problems might arise in the future •  The real bottlenecks might be in the interface design, not just the implementation
  12. What Scales on Today’s Multicores? •  Cache coherence: the MESI

    protocol •  Reads do not conflict, reads and writes or writes and writes do •  Conflict-free is a good proxy for scalability Two operations are scalable if they are conflict-free.
  13. Talk Outline •  Problem •  Scalable Commutativity Rule •  Applying

    the Rule •  Speculation
  14. Interface Scalability

  15. Interface Scalability

  16. Interface Scalability Change the interface?

  17. The Scalable Commutativity Rule Whenever interface operations commute, they can

    be implemented in a way that scales. Commutes Scalable implementation exists creat with lowest fd ? creat -> 3 creat -> 4
  18. The Scalable Commutativity Rule Whenever interface operations commute, they can

    be implemented in a way that scales. Commutes Scalable implementation exists creat with lowest fd
  19. The Scalable Commutativity Rule Whenever interface operations commute, they can

    be implemented in a way that scales. Commutes Scalable implementation exists creat with lowest fd creat with any fd ? creat -> 13 creat -> 47
  20. The Scalable Commutativity Rule Whenever interface operations commute, they can

    be implemented in a way that scales. Commutes Scalable implementation exists creat with lowest fd creat with any fd rule
  21. Intuition Behind Rule When operations commute – The results are independent

    of order – Communication is unnecessary – And without communication, no conflicts
  22. Example: Reference Counter T1 T2 T3 T4 T5 iszero() F

    iszero() F dec() 2 dec() 1 dec() 0 R1 commutes; conflict free implementation: shared counter R2 does not commute because dec() returns counter value R1 R2
  23. Example: Reference Counter T1 T2 T3 T4 T5 iszero() F

    iszero() F dec() ok dec() ok dec() ok R1 commutes; conflict free implementation: shared counter R2 does not commute because dec() returns counter value R2’ does commute; conflict-free implementation: per-core counter R3 depends on state Initial value > 3 Initial value ≤ 3 R1 R2’ R3
  24. Formalizing the Rule •  History •  Specification •  Reordering • 

    Commutativity
  25. Histories and Specifications A history H is sequence of invocations

    and responses on threads. A specification ζ defines an interface. ζ is the set of legal histories given the allowed behavior of the interface.
  26. Reordering A reordering H’ is a permutation of H that

    maintains operations order for each individual thread (H|t = H’|t for all t).
  27. Commutativity A region Y of a legal history XY SIM-

    commutes if every reordering Y’ of Y also yields a legal history and every legal extension Z of XY is also a legal extension of XY’. (And this must be true for every prefix of every reordering of Y.)
  28. The Formal Rule Let ζ be a specification with a

    reference implementation M. Consider a history where XY where Y commutes in XY and M can generate XY. There exists a correct implementation of ζ whose execution of XY is conflict-free in the commutative region Y.
  29. Talk Outline •  Problem •  Scalable Commutativity Rule •  Applying

    the Rule •  Speculation
  30. None
  31. Commuter •  Input: Symbolic Model •  Analyzer computes commutativity conditions

    •  Testgen computes test cases •  Mtrace detects conflict
  32. Example: rename() rename(a, b) and rename(c, d) commute if: • 

    Both source files exist and all names are different •  Neither source file exists •  a xor c exists, and it is not the other rename's destination •  One call is a self-rename of an existing file and a ≠ c •  a and c are hard links to the same inode, a ≠ c, and b = d •  Both calls are self-renames Important to have discriminating commutativity conditions •  ∀states, rename almost never commutes •  More commutative cases ⇒ more opportunities to scale •  Captures more operations applications usually do
  33. None
  34. Commuter Finds Non-scalable Cases in Linux •  Directory-wide locking • 

    File descriptor reference counts •  Address space-wide locking
  35. sv6: A Scalable OS •  POSIX-like operating system •  File

    system and virtual memory system follow commutativity rule •  Implementation using standard parallel programming techniques, but guided by Commuter
  36. (sv6)

  37. Remaining 1% Idempotent Updates •  Two lseeks of same FD

    to the same offset •  Two pwrites of same data to same offset
  38. Refining POSIX with the Rule •  Lowest FD versus any

    FD •  stat versus xstat •  Unordered sockets •  Delayed munmap •  fork+exec versus posix_spawn
  39. What Can We Learn? •  Embrace non-determinism •  Decompose compound

    operations •  Permit weak ordering •  Release resources asynchronously
  40. Commutative Operations Matter

  41. None
  42. Talk Outline •  Problem •  Scalable Commutativity Rule •  Applying

    the Rule •  Speculation
  43. Limitations of the Rule •  Rule says a scalable implementation

    exists. – It might not have the best raw performance – You might need different scalable implementations for different regions – How do I find this implementation? •  The non-scalable non-commutativity rule •  Synchronized clocks
  44. Distributed Systems and Databases •  Reads still don’t conflict, but

    no cache coherence for invalidations •  Rule should still apply to message passing systems •  Commutative concurrency control
  45. None
  46. None
  47. None
  48. None
  49. Thanks! The Scalable Commutativity Rule http://pdos.csail.mit.edu/commuter/ @neha