$30 off During Our Annual Pro Sale. View Details »

Silence is Golden: Coordination-Avoiding Systems Design

pbailis
August 21, 2015

Silence is Golden: Coordination-Avoiding Systems Design

MesosCon 2015 Keynote
26 August 2015
Seattle, WA

Talk video: https://www.youtube.com/watch?v=EYJnWttrC9k
More information: http://bailis.org/

Abstract:

Computer networks make it difficult to design scalable, robust distributed systems that exhibit good performance. Networks can be slow, have limited capacity, and are often unreliable. In an ideal world, we'd build systems that don't rely on the network at all. Unfortunately, as a slew of negative results like the CAP Theorem illustrate, this isn't always possible. Traditional systems abstractions like ACID transactions fundamentally require synchronous communication, or coordination, to implement. As a result, coordination-free systems designs often forego many programmer-friendly abstractions. These systems leave the task of reasoning about correctness to the application developer or, worse, to the end user.

In this talk, I'll discuss an alternative: system designs that coordinate only when necessary to guarantee application correctness. This coordination avoidance maximizes scalability and robustness by minimizing reliance on the network. To illustrate the power of coordination-avoiding systems design, I'll present several case studies from our research spanning database isolation guarantees, indexes and constraints, and open source applications. Perhaps surprisingly, even though traditional implementations of these tasks rely on coordination, many of these tasks don't actually require coordination for correctness. The resulting systems are among the fastest prototypes ever built and operated at scale. Based on these case studies, I'll provide concrete and practical design principles for reasoning about and applying coordination avoidance in the wild.

pbailis

August 21, 2015
Tweet

More Decks by pbailis

Other Decks in Technology

Transcript

  1. SILENCE IS GOLDEN
    COORDINATION-AVOIDING
    SYSTEMS DESIGN
    Peter Bailis
    @pbailis
    MesosCon 2015 Keynote
    21 August, Seattle, WA

    View Slide

  2. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    Reasoning about Distribution is Hard

    View Slide

  3. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    Reasoning about Distribution is Hard

    View Slide

  4. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    Reasoning about Distribution is Hard

    View Slide

  5. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    •Should you and I be able to
    simultaneously reserve rooms?
    •Can you reserve a room while I log in?
    •Can you tweet while I change my username?
    Reasoning about Distribution is Hard

    View Slide

  6. Simple, classic strategy:
    Hide concurrency by coordinating

    View Slide

  7. Mechanisms:
    Consensus (Paxos, VR, Raft)
    Zookeeper, etcd, Doozer
    ACID transactions
    Simple, classic strategy:
    Hide concurrency by coordinating
    Abstraction:
    Serial access to state
    Replicated State Machines

    View Slide

  8. Coordination is expensive
    Processes cannot make progress independently

    View Slide

  9. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  10. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  11. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  12. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  13. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  14. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  15. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  16. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  17. Coordination is expensive
    This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Processes cannot make progress independently

    View Slide

  18. A B C D E F G H
    IN-MEMORY
    LOCKING
    DISTRIBUTED TRANSACTIONS (EC2)
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    Number of Servers (Items) Accessed per Transaction

    View Slide

  19. A B C D E F G H
    IN-MEMORY
    LOCKING
    COORDINATED
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    DISTRIBUTED TRANSACTIONS (EC2)
    Number of Servers (Items) Accessed per Transaction

    View Slide

  20. A B C D E F G H
    IN-MEMORY
    LOCKING
    COORDINATED
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    DISTRIBUTED TRANSACTIONS (EC2)
    LOG SCALE!
    -398x
    Number of Servers (Items) Accessed per Transaction

    View Slide

  21. This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Coordination is expensive
    Processes cannot make progress independently

    View Slide

  22. This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Coordination is expensive
    Processes cannot make progress independently

    View Slide

  23. This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Coordination is expensive
    Processes cannot make progress independently

    View Slide

  24. This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Coordination is expensive
    Processes cannot make progress independently

    View Slide

  25. 133.7+ ms
    RTT

    View Slide

  26. 133.7+ ms
    RTT

    View Slide

  27. 133.7+ ms
    RTT
    85.1+ ms
    RTT

    View Slide

  28. This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Coordination is expensive
    Processes cannot make progress independently

    View Slide

  29. This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Coordination is expensive
    Processes cannot make progress independently

    View Slide

  30. This limits:
    1.) Scalability
    2.) Throughput
    3.) Low Latency
    4.) Availability
    Coordination is expensive
    Processes cannot make progress independently

    View Slide

  31. High cost!
    Scalability
    Throughput
    Latency
    Availability
    Simple, classic strategy:
    Hide concurrency by coordinating
    Abstraction:
    Serial access to state
    Fundamental
    penalties to

    View Slide

  32. Surely
    there’s a
    better way
    to build
    systems!

    View Slide

  33. Surely
    there’s a
    better way
    to build
    systems!

    View Slide

  34. Why do we feel it's necessary to yak in order to be comfortable?
    That's when you know you've found somebody really special: when
    you can just shut up for a minute and comfortably share silence.

    View Slide

  35. Why do we feel it's necessary to yak in order to be comfortable?
    That's when you know you've found somebody really special: when
    you can just shut up for a minute and comfortably share silence.

    View Slide

  36. Scalable systems
    can just shut up
    and comfortably share silence

    View Slide

  37. Scalable systems
    can just shut up
    and comfortably share silence
    1.) Why is shutting up good for systems?
    2.) When can systems comfortably share silence?
    This talk:

    View Slide

  38. Scalable systems
    can just shut up
    and comfortably share silence
    1.) Why is shutting up good for systems?
    2.) When can systems comfortably share silence?
    This talk:

    View Slide

  39. Why is shutting up good?

    View Slide

  40. Coordination-free systems:
    Why is shutting up good?

    View Slide

  41. Coordination-free systems:
    Why is shutting up good?

    View Slide

  42. Coordination-free systems:
    Why is shutting up good?

    View Slide

  43. Coordination-free systems:
    Why is shutting up good?
    `

    View Slide

  44. Coordination-free systems:
    1.) Enable infinite scale-out
    Why is shutting up good?
    `

    View Slide

  45. Coordination-free systems:
    1.) Enable infinite scale-out
    Why is shutting up good?
    `

    View Slide

  46. A B C D E F G H
    IN-MEMORY
    LOCKING
    COORDINATED
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    DISTRIBUTED TRANSACTIONS (EC2)
    -398x
    Number of Servers (Items) Accessed per Transaction

    View Slide

  47. A B C D E F G H
    IN-MEMORY
    LOCKING
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    COORDINATED
    COORDINATION-FREE
    DISTRIBUTED TRANSACTIONS (EC2)
    -398x
    Number of Servers (Items) Accessed per Transaction

    View Slide

  48. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    Why is shutting up good?

    View Slide

  49. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    Why is shutting up good?

    View Slide

  50. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    Why is shutting up good?

    View Slide

  51. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    Why is shutting up good?

    View Slide

  52. Why is shutting up good?
    Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    4.) Improve availability

    View Slide

  53. any replica can respond to any request
    “Always on” Availability

    View Slide

  54. any replica can respond to any request
    “Always on” Availability

    View Slide

  55. any replica can respond to any request
    “Always on” Availability

    View Slide

  56. any replica can respond to any request
    “Always on” Availability

    View Slide

  57. any replica can respond to any request
    “Always on” Availability

    View Slide

  58. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    4.) Guarantee “always on” response
    Why is shutting up good?

    View Slide

  59. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    4.) Guarantee “always on” response
    Why is shutting up good?

    View Slide

  60. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    4.) Guarantee “always on” response
    Why is shutting up good?

    View Slide

  61. Coordination-free systems:
    1.) Enable infinite scale-out
    2.) Improve throughput
    3.) Ensure low latency
    4.) Guarantee “always on” response
    Why is shutting up good?
    Silence is key to scalability!

    View Slide

  62. Scalable systems
    can just shut up
    and comfortably share silence
    1.) Why is shutting up good for systems?
    2.) When can systems comfortably share silence?
    This talk:

    View Slide

  63. Scalable systems
    can just shut up
    and comfortably share silence
    1.) Why is shutting up good for systems?
    2.) When can systems comfortably share silence?
    This talk:

    View Slide

  64. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    Reasoning about Distribution is Hard

    View Slide

  65. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    •Should you and I be able to
    simultaneously reserve rooms?
    •Can you reserve a room while I log in?
    •Can you tweet while I change my username?
    Reasoning about Distribution is Hard

    View Slide

  66. THOSE LIGHT CONES
    If operations happen concurrently…
    …ensure their side-effects can be
    COMPOSED

    View Slide

  67. THOSE LIGHT CONES
    If operations happen concurrently…
    …ensure their side-effects can be
    COMPOSED
    IN A WAY THAT MAKES “SENSE”

    View Slide

  68. IN A WAY THAT MAKES “SENSE”
    COMPOSED

    View Slide

  69. IN A WAY THAT MAKES “SENSE”
    COMPOSED (“merged”)

    View Slide

  70. IN A WAY THAT MAKES “SENSE”
    COMPOSED
    1+1=2 {“a”}+{“b”}={“a”, “b”}
    (“merged”)

    View Slide

  71. IN A WAY THAT MAKES “SENSE”
    COMPOSED
    1+1=2 {“a”}+{“b”}={“a”, “b”}
    (“merged”)
    (invariants over state will hold)

    View Slide

  72. IN A WAY THAT MAKES “SENSE”
    COMPOSED
    1+1=2 {“a”}+{“b”}={“a”, “b”}
    (“merged”)
    Counters are positive
    (invariants over state will hold)
    No two talks share a timeslot
    No NULL values
    Usernames are unique

    View Slide

  73. Key question: Can invariants can be violated by
    merging independent operations?

    View Slide

  74. Key question: Can invariants can be violated by
    merging independent operations?
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]

    View Slide

  75. Key question: Can invariants can be violated by
    merging independent operations?
    INVARIANT: User IDs are positive
    OPERATION: Save new user
    MERGE: Add both records to DB
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]

    View Slide

  76. Key question: Can invariants can be violated by
    merging independent operations?
    INVARIANT: User IDs are positive
    OPERATION: Save new user
    MERGE: Add both records to DB
    {}
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]

    View Slide

  77. Key question: Can invariants can be violated by
    merging independent operations?
    INVARIANT: User IDs are positive
    OPERATION: Save new user
    MERGE: Add both records to DB
    {}
    add
    {Stu,ID=1}
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]

    View Slide

  78. Key question: Can invariants can be violated by
    merging independent operations?
    INVARIANT: User IDs are positive
    OPERATION: Save new user
    MERGE: Add both records to DB
    {}
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]

    View Slide

  79. Key question: Can invariants can be violated by
    merging independent operations?
    INVARIANT: User IDs are positive
    OPERATION: Save new user
    MERGE: Add both records to DB
    {{Stu,ID=1},
    {Ann,ID=1}}
    {}
    MERGE
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]

    View Slide

  80. Key question: Can invariants can be violated by
    merging independent operations?
    INVARIANT: User IDs are positive
    OPERATION: Save new user
    MERGE: Add both records to DB
    {{Stu,ID=1},
    {Ann,ID=1}}
    Invariant
    holds!
    {}
    MERGE
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]

    View Slide

  81. Key question: Can invariants can be violated by
    merging independent operations?
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]
    INVARIANT: User IDs are unique
    OPERATION: Save new user
    MERGE: Add both records to DB

    View Slide

  82. Key question: Can invariants can be violated by
    merging independent operations?
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]
    INVARIANT: User IDs are unique
    OPERATION: Save new user
    MERGE: Add both records to DB

    View Slide

  83. Key question: Can invariants can be violated by
    merging independent operations?
    ICT:
    Invariant
    Confluence
    Test
    [VLDB 2015]
    INVARIANT: User IDs are unique
    OPERATION: Save new user
    MERGE: Add both records to DB
    {{Stu,ID=1},
    {Ann,ID=1}}
    Invariant
    broken!
    {}
    MERGE
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}

    View Slide

  84. Key question: Can invariants can be violated by
    merging independent operations?
    ICT: Invariant Confluence Test
    [VLDB 2015]

    View Slide

  85. Key question: Can invariants can be violated by
    merging independent operations?
    ICT: Invariant Confluence Test
    [VLDB 2015]
    ICT passes? Coordination not required

    View Slide

  86. Key question: Can invariants can be violated by
    merging independent operations?
    ICT: Invariant Confluence Test
    [VLDB 2015]
    ICT passes?
    ICT fails?
    Coordination not required
    Coordination required

    View Slide

  87. THOSE LIGHT CONES
    If operations happen concurrently…
    …ensure their side-effects can be
    COMPOSED
    IN A WAY THAT MAKES “SENSE”

    View Slide

  88. THOSE LIGHT CONES
    If operations happen concurrently…
    …ensure their side-effects can be
    COMPOSED
    IN A WAY THAT MAKES “SENSE”
    formalized by ICT

    View Slide

  89. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    When can we comfortably share silence?

    View Slide

  90. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    Can we simultaneously reserve rooms?
    Can I log in while you reserve a room?
    Can I tweet while you change your username?
    When can we comfortably share silence?

    View Slide

  91. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    Can we simultaneously reserve rooms?
    Can I log in while you reserve a room?
    Can I tweet while you change your username?
    When can we comfortably share silence?

    View Slide

  92. Attendee
    Login
    Room
    Reservations
    Social
    Media
    Monitoring Database
    Can we simultaneously reserve rooms?
    Can I log in while you reserve a room?
    Can I tweet while you change your username?
    When can we comfortably share silence?
    When operations are composable

    View Slide

  93. Constraint Operation Passes ICT?
    Equality, Inequality Any ???
    Generate unique ID Any ???
    Specify unique ID Insert ???
    > Increment ???
    > Decrement ???
    < Decrement ???
    < Increment ???
    Foreign Key Insert ???
    Foreign Key Delete ???
    Secondary Indexing Any ???
    Materialized Views Any ???
    AUTO_INCREMENT Insert ??? [VLDB 2015]
    Typical
    database
    constraints
    and
    operations
    (SQL)

    View Slide

  94. Constraint Operation Passes ICT?
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    > Increment Y
    > Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y
    AUTO_INCREMENT Insert N [VLDB 2015]
    Typical
    database
    constraints
    and
    operations
    (SQL)

    View Slide

  95. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-
    mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena

    View Slide

  96. 67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table

    View Slide

  97. 67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table
    86.9% PASS ICT
    [SIGMOD 2015]

    View Slide

  98. Always coordinating is inefficient!
    67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table
    86.9% PASS ICT
    [SIGMOD 2015]

    View Slide

  99. Everything Happens At Once
    Legacy Implementations Overcoordinate

    View Slide

  100. Users never read intermediate data
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate

    View Slide

  101. Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate

    View Slide

  102. Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;

    View Slide

  103. Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    Classic implementation:
    lock records during access

    View Slide

  104. name/record
    Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    Classic implementation:
    lock records during access

    View Slide

  105. name/record
    Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    Classic implementation:
    lock records during access

    View Slide

  106. name/record
    Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “peter”
    Classic implementation:
    lock records during access

    View Slide

  107. name/record
    Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “peter”
    Classic implementation:
    lock records during access

    View Slide

  108. name/record
    Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access

    View Slide

  109. name/record
    Users never read intermediate data
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access

    View Slide

  110. name/record
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access

    View Slide

  111. name/record
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access
    Better implementation:
    use multi-versioning, commit tag

    View Slide

  112. name/record
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access
    name/record Better implementation:
    use multi-versioning, commit tag

    View Slide

  113. name/record
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access
    name/record
    “peter”
    Better implementation:
    use multi-versioning, commit tag

    View Slide

  114. name/record
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access
    name/record
    “peter”
    Better implementation:
    use multi-versioning, commit tag
    “pbailis”

    View Slide

  115. name/record
    w(name=“peter”);/w(name=“pbailis”);/commit;
    Read Committed RDBMS
    Everything Happens At Once
    Legacy Implementations Overcoordinate
    r(name=“peter”);/commit;
    “pbailis”
    Classic implementation:
    lock records during access
    name/record
    “peter”
    Better implementation:
    use multi-versioning, commit tag
    “pbailis” OK

    View Slide

  116. Everything Happens At Once
    Next Level Technique: RAMP Transactions

    View Slide

  117. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;

    View Slide

  118. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    used in indexing, materialized views, foreign keys

    View Slide

  119. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    used in indexing, materialized views, foreign keys
    Classic implementation: lock records

    View Slide

  120. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    used in indexing, materialized views, foreign keys
    Classic implementation: lock records
    Result: typically implemented incorrectly at scale

    View Slide

  121. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;

    View Slide

  122. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata

    View Slide

  123. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record

    View Slide

  124. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    loc/record

    View Slide

  125. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    “talking”/(@t=10,/also/loc)
    loc/record

    View Slide

  126. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    “talking”/(@t=10,/also/loc)
    loc/record
    “seattle”/(@t=10,/also/status)

    View Slide

  127. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    “talking”/(@t=10,/also/loc)
    loc/record
    “seattle”/(@t=10,/also/status)
    OK

    View Slide

  128. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    “talking”/(@t=10,/also/loc)
    loc/record
    “seattle”/(@t=10,/also/status) OK
    OK

    View Slide

  129. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    “talking”/(@t=10,/also/loc)
    loc/record
    “seattle”/(@t=10,/also/status)
    OK

    View Slide

  130. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    “talking”/(@t=10,/also/loc)
    loc/record
    “seattle”/(@t=10,/also/status)
    OK

    View Slide

  131. Everything Happens At Once
    Next Level Technique: RAMP Transactions
    Desired property: see all updates, or see none
    w(status=“talking”);/w(loc=“seattle”);/commit;
    RAMP: multi-versioning with intention metadata
    status/record
    “talking”/(@t=10,/also/loc)
    loc/record
    “seattle”/(@t=10,/also/status)
    Key:
    Prevent read stalls
    Compact metadata
    SIGMOD 2014
    OK

    View Slide

  132. TPC-C

    View Slide

  133. 14/16 INVARIANTS
    PASS ICT
    TPC-C

    View Slide

  134. 14/16 INVARIANTS
    PASS ICT
    TPC-C
    scale to
    over 25x
    best listed result
    0 50 100 150 200
    2M
    4M
    6M
    8M
    10M
    12M
    14M
    Total Throughput (txn/s)
    0 50 100 150 200
    Number of Servers
    0
    20K
    40K
    60K
    80K
    Throughput (txn/s/server)
    6-11x faster than
    ACID/serializability
    8 16 32 48 64
    Number of Warehouses
    40K
    100K
    600K
    Throughput (txns/s)
    Coordination-Avoiding Serializable (2PL)

    View Slide

  135. Everything Happens At Once
    Key Design Patterns

    View Slide

  136. Everything Happens At Once
    Key Design Patterns
    • Datatype libraries can automatically merge operations
    e.g., Bloom^L, CRDTs

    View Slide

  137. Everything Happens At Once
    Key Design Patterns
    • Datatype libraries can automatically merge operations
    e.g., Bloom^L, CRDTs
    • Multi-versioning can prevent stalls during partial updates
    e.g., RAMP, COPS, SwiftCloud

    View Slide

  138. Everything Happens At Once
    Key Design Patterns
    • Datatype libraries can automatically merge operations
    e.g., Bloom^L, CRDTs
    • Multi-versioning can prevent stalls during partial updates
    e.g., RAMP, COPS, SwiftCloud
    •When you must coordinate, distribute as little as possible
    e.g., Transaction Chopping

    View Slide

  139. Rethink The API

    View Slide

  140. Rethink The API
    Read/Write Transaction
    Distributed Log
    Consensus Object
    Distributed Log
    Consensus Object

    View Slide

  141. Rethink The API
    Read/Write Transaction
    Distributed Log
    Consensus Object
    Are too low level!
    Distributed Log
    Consensus Object

    View Slide

  142. The Far Side,
    Gary Larson

    View Slide

  143. WHAT THE APPLICATION SAYS
    “post
    on
    timeline”
    “accept
    friend
    request”

    View Slide

  144. WHAT THE APPLICATION SAYS
    “post
    on
    timeline”
    “accept
    friend
    request”
    write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE SYSTEM HEARS
    read
    read
    read
    read
    read
    read
    write
    write
    write
    read
    read
    write
    read
    write
    write

    View Slide

  145. WHAT THE APPLICATION SAYS
    “post
    on
    timeline”
    “accept
    friend
    request”
    write read
    write
    read
    read
    write
    write
    read
    WHAT THE SYSTEM HEARS
    read
    read
    read
    read
    write
    write
    read
    read
    write
    read
    write
    write
    “post
    on
    timeline”
    “accept
    friend
    request”
    write
    write

    View Slide

  146. The Good Stuff (Papers)
    ICT in theory and practice
    Coordination-avoiding analytics
    Index, graph, and view maintenance
    Transaction isolation
    Upgrading existing stores
    Quantifying visibility
    SIGMOD 2015, VLDB 2015
    CIDR 2015
    SIGMOD 2014
    VLDB 2014
    SIGMOD 2013
    VLDB 2012, VLDBJ 2014

    View Slide

  147. To avoid coordination,
    maximize composability of
    operations
    Scalable systems can
    comfortably share
    silence

    View Slide

  148. To avoid coordination,
    maximize composability of
    operations
    Scalable systems can
    comfortably share
    silence
    Joint work with
    Ali Ghodsi, Alan Fekete,
    Joe Hellerstein, Ion Stoica,
    and many others (see bailis.org)

    View Slide

  149. To avoid coordination,
    maximize composability of
    operations
    @pbailis
    Scalable systems can
    comfortably share
    silence

    View Slide

  150. Many illustrations by the Noun Project (CC-Attribution):
    surprised by Julian Derveaux
    world by Wayne Tyler Sall
    database by Austin Condiff
    earth by Martin Vanco
    Woman by Simon Child
    Man by Simon Child
    Doctor by Simon Child
    David-Hockney by Simon Child
    Server by Simon Child
    clock by christoph robausch

    View Slide