Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Coordination and the Art of Scaling

pbailis
June 17, 2014

Coordination and the Art of Scaling

CloudantCON 2014
17 June 2014
http://www.cloudantcon.com/#schedule

For more information/details/nuance (!):
http://www.bailis.org/blog/
http://www.bailis.org/pubs.html
@pbailis

pbailis

June 17, 2014
Tweet

More Decks by pbailis

Other Decks in Technology

Transcript

  1. COORDINATION
    AND
    THE ART OF SCALING
    Peter Bailis • UC Berkeley • @pbailis
    CloudantCON 2014

    View full-size slide

  2. A distributed system
    is one in which the
    failure of a computer
    you didn't even know
    existed can render
    your own computer
    unusable.
    —Leslie Lamport
    2013 Turing Award Winner

    View full-size slide

  3. THE NETWORK
    INCURS LATENCY

    View full-size slide

  4. THE NETWORK
    INCURS LATENCY
    THE NETWORK
    IS UNRELIABLE

    View full-size slide

  5. THE NETWORK
    INCURS LATENCY
    THE NETWORK
    IS UNRELIABLE
    SO HOW CAN WE BUILD ROBUST
    AND SCALABLE DISTRIBUTED
    SYSTEMS?

    View full-size slide

  6. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE

    View full-size slide

  7. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    (SERIALIZABILITY/LINEARIZABILITY)

    View full-size slide

  8. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    Impose a total order on events in the system

    View full-size slide

  9. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    TIME
    Impose a total order on events in the system

    View full-size slide

  10. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    TIME
    Impose a total order on events in the system
    Ask
    Am
    anda: “how
    ’s the
    w
    eather on
    the
    farm
    ?”
    Am
    anda
    replies: “Let m
    e
    check
    w
    ith
    the
    tractor.”
    Am
    anda
    replies: “It’s a
    beautiful day!”
    Tractor replies: current tem
    perature
    is 75°F

    View full-size slide

  11. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    Impose a total order on events in the system
    TIME
    Illusion created by a partially ordered protocol

    View full-size slide

  12. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    TIME
    Impose a total order on events in the system
    Illusion created by a partially ordered protocol
    Remarkably powerful abstraction
    core to ACID transactions

    View full-size slide

  13. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    TIME
    Impose a total order on events in the system
    Illusion created by a partially ordered protocol
    Remarkably powerful abstraction
    This is the way you’d want to
    program distributed systems, but…
    core to ACID transactions

    View full-size slide

  14. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    TIME
    Impose a total order on events in the system
    Illusion created by a partially ordered protocol
    COST:

    View full-size slide

  15. THE SIMPLE ANSWER:
    SINGLE-SYSTEM IMAGE
    TIME
    Impose a total order on events in the system
    Illusion created by a partially ordered protocol
    COST:
    BLOCKING COMMUNICATION
    COORDINATION

    View full-size slide

  16. COORDINATION
    (BLOCKING COMMUNICATION)
    Can I make progress without waiting?

    View full-size slide

  17. COORDINATION
    (BLOCKING COMMUNICATION)
    Can I make progress without waiting?
    UNDER SINGLE SYSTEM IMAGE,
    MUST WAIT!

    View full-size slide

  18. COORDINATION
    REQUIRED?
    Throughput: 1/delay

    View full-size slide

  19. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources

    View full-size slide

  20. SERIALIZABLE TRANSACTIONS ON EC2
    IN-MEMORY
    LOCKING
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  21. 1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    SERIALIZABLE TRANSACTIONS ON EC2
    IN-MEMORY
    LOCKING
    LOG SCALE!
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  22. 1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    SERIALIZABLE TRANSACTIONS ON EC2
    IN-MEMORY
    LOCKING
    COORDINATED
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  23. SERIALIZABLE TRANSACTIONS ON EC2
    IN-MEMORY
    LOCKING
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    COORDINATED
    COORDINATION-FREE
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  24. SERIALIZABLE TRANSACTIONS ON EC2
    IN-MEMORY
    LOCKING
    SINGLE SERVER: 10x faster (multi-core parallelism)
    MULTI-SERVER: ~1000x faster
    1 2 3 4 5 6 7
    Number of Items per Transaction
    Throughput (txns/s)
    COORDINATED
    COORDINATION-FREE
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  25. do not support!
    SSI/serializability
    HANA

    View full-size slide

  26. do not support!
    SSI/serializability
    HANA
    Actian Ingres YES
    Aerospike NO!
    N
    Persistit NO!
    N
    Clustrix NO!
    N
    Greenplum YES
    IBM DB2 YES
    IBM Informix YES
    MySQL YES
    MemSQL NO!
    N
    MS SQL Server YES
    NuoDB NO!
    N
    Oracle 11G NO!
    N
    Oracle BDB YES
    Oracle BDB JE YES
    Postgres 9.2.2 YES
    SAP HANA NO!
    N
    ScaleDB NO!
    N
    VoltDB YES
    8/18 databases!
    surveyed did not
    “Highly Available Transactions: Virtues and Limitations” VLDB 2014

    View full-size slide

  27. do not support!
    SSI/serializability
    HANA
    Actian Ingres YES
    Aerospike NO!
    N
    Persistit NO!
    N
    Clustrix NO!
    N
    Greenplum YES
    IBM DB2 YES
    IBM Informix YES
    MySQL YES
    MemSQL NO!
    N
    MS SQL Server YES
    NuoDB NO!
    N
    Oracle 11G NO!
    N
    Oracle BDB YES
    Oracle BDB JE YES
    Postgres 9.2.2 YES
    SAP HANA NO!
    N
    ScaleDB NO!
    N
    VoltDB YES
    8/18 databases!
    surveyed did not
    15/18 used!
    weaker models!
    by default
    “Highly Available Transactions: Virtues and Limitations” VLDB 2014

    View full-size slide

  28. do not support!
    SSI/serializability
    HANA
    Actian Ingres YES
    Aerospike NO!
    N
    Persistit NO!
    N
    Clustrix NO!
    N
    Greenplum YES
    IBM DB2 YES
    IBM Informix YES
    MySQL YES
    MemSQL NO!
    N
    MS SQL Server YES
    NuoDB NO!
    N
    Oracle 11G NO!
    N
    Oracle BDB YES
    Oracle BDB JE YES
    Postgres 9.2.2 YES
    SAP HANA NO!
    N
    ScaleDB NO!
    N
    VoltDB YES
    8/18 databases!
    surveyed did not
    15/18 used!
    weaker models!
    by default
    “Highly Available Transactions: Virtues and Limitations” VLDB 2014

    View full-size slide

  29. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources

    View full-size slide

  30. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately

    View full-size slide

  31. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately
    SINGLE DC:
    .5 ms on public cloud
    5 µs on Infiniband

    View full-size slide

  32. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately
    SINGLE DC:
    .5 ms on public cloud
    5 µs on Infiniband
    MULTI-DC?

    View full-size slide

  33. 133.7+ ms
    RTT

    View full-size slide

  34. 133.7+ ms
    RTT

    View full-size slide

  35. 133.7+ ms
    RTT

    View full-size slide

  36. 133.7+ ms
    RTT
    85.1+ ms
    RTT

    View full-size slide

  37. THOSE LIGHT CONES_

    View full-size slide

  38. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately
    Unavailable during failures Progress despite failures

    View full-size slide

  39. COORDINATION-FREE
    EXECUTION
    IS
    KEY
    TO
    INDEFINITE
    SCALABILITY

    View full-size slide

  40. COORDINATION
    IS
    THE
    BANE
    OF
    SCALABLE
    SYSTEMS

    View full-size slide

  41. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately
    Unavailable during failures Progress despite failures
    WHEN DO WE HAVE TO COORDINATE?

    View full-size slide

  42. THAT SIMULTANEITY_

    View full-size slide

  43. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately
    Unavailable during failures Progress despite failures
    WHEN DO WE HAVE TO COORDINATE?

    View full-size slide

  44. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately
    Unavailable during failures Progress despite failures
    CAP Theorem
    (for recency guarantees)
    FLP result
    (for consensus; e.g., Paxos)
    WHEN DO WE HAVE TO COORDINATE?
    Davidson result
    (for SSI)

    View full-size slide

  45. COORDINATION
    REQUIRED?
    COORDINATION
    FREE?
    Throughput: 1/delay Limited by physical resources
    Latency: 1+ RTT Can return immediately
    Unavailable during failures Progress despite failures
    CAP Theorem
    (for recency guarantees)
    FLP result
    (for consensus; e.g., Paxos)
    BUT DO APPS ALWAYS HAVE TO COORDINATE?
    WHEN DO WE HAVE TO COORDINATE?
    Davidson result
    (for SSI)

    View full-size slide

  46. TICKET 241
    TICKET 242
    TICKET 243
    TICKET 244

    View full-size slide

  47. TICKET 241
    TICKET 242
    TICKET 243
    TICKET 244

    View full-size slide

  48. INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL

    View full-size slide

  49. INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL
    TICKET
    241
    TICKET
    242
    TICKET
    243

    View full-size slide

  50. INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL
    TICKET
    241
    TICKET
    241
    COORDINATION REQUIRED!

    View full-size slide

  51. INVARIANT: TICKET IDs SHOULD BE UNIQUE
    TICKET
    241
    TICKET
    242
    PRE-PARTITION ID SPACE
    (1,4,…)
    (2,5,…)
    (3,6,…)

    View full-size slide

  52. INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE
    TICKET
    241
    TICKET
    242
    COORDINATION-FREE!

    View full-size slide

  53. INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE
    COORDINATION-FREE!
    INVARIANT: TICKET IDs SHOULD BE UNIQUE
    PRE-PARTITION ID SPACE
    INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL
    COORDINATION REQUIRED!

    View full-size slide

  54. INVARIANT: TICKET IDs SHOULD BE NON-NEGATIVE
    COORDINATION-FREE!
    INVARIANT: TICKET IDs SHOULD BE UNIQUE
    PRE-PARTITION ID SPACE
    INVARIANT: TICKET IDs SHOULD BE SEQUENTIAL
    COORDINATION REQUIRED!
    WHEN DO WE HAVE TO COORDINATE?
    DEPENDS ON APPLICATION
    SAFE ANSWER: ALWAYS COORDINATE

    View full-size slide

  55. WHEN DO WE HAVE TO COORDINATE?
    SAFE ANSWER: ALWAYS COORDINATE

    View full-size slide

  56. WHEN DO WE HAVE TO COORDINATE?
    SAFE ANSWER: ALWAYS COORDINATE
    BETTER ANSWER:
    (YOUR TAX DOLLARS AT WORK)

    View full-size slide

  57. WHEN DO WE HAVE TO COORDINATE?
    SAFE ANSWER: ALWAYS COORDINATE
    BETTER ANSWER:
    COORDINATION
    AVOIDANCE
    COORDINATE ONLY WHEN STRICTLY NECESSARY
    MOVE COMMUNICATION TO BACKGROUND
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  58. SAFETY correctness always guaranteed
    LIVENESS database states agree (converge)

    View full-size slide

  59. Invariant Confluence is necessary and sufficient
    for ensuring safety, convergence, availability, and
    coordination-free execution.
    Invariant Confluence holds?!
    A safe, c-free execution strategy exists.
    Invariant Confluence fails?!
    No safe, c-free mechanism exists.
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  60. Invariant Operation C.F.
    Equality, Inequality Any ???
    Generate unique ID Any ???
    Specify unique ID Insert ???
    >! Increment ???
    >! Decrement ???
    < Decrement ???
    < Increment ???
    Foreign Key Insert ???
    Foreign Key Delete ???
    Secondary Indexing Any ???
    Materialized Views Any ???
    AUTO_INCREMENT Insert ???
    Typical DB!
    operations and !
    invariants!
    (SQL)
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  61. Invariant Operation C.F.
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    >! Increment Y
    >! Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y!
    AUTO_INCREMENT Insert N
    Typical DB!
    operations and !
    invariants!
    (SQL)
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  62. Test fails?
    Cannot avoid
    coordination
    Invariant Operation C.F.
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    >! Increment Y
    >! Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y!
    AUTO_INCREMENT Insert N
    Typical DB!
    operations and !
    invariants!
    (SQL)
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  63. Test fails?
    Cannot avoid
    coordination
    Invariant Operation C.F.
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    >! Increment Y
    >! Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y!
    AUTO_INCREMENT Insert N
    MANY TRADITIONAL DB APPS OK
    Typical DB!
    operations and !
    invariants!
    (SQL)
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  64. Test fails?
    Cannot avoid
    coordination
    Invariant Operation C.F.
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    >! Increment Y
    >! Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y!
    AUTO_INCREMENT Insert N
    MANY TRADITIONAL DB APPS OK
    Typical DB!
    operations and !
    invariants!
    (SQL)
    “Coordination-Avoiding Database Systems” arXiv:1402.2237

    View full-size slide

  65. FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013

    View full-size slide

  66. FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013
    FRIENDS
    FRIENDS

    View full-size slide

  67. as
    FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013
    FRIENDS
    FRIENDS

    View full-size slide

  68. as
    s
    FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013

    View full-size slide

  69. as
    FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013
    s
    Denormalized Friend List
    Fast reads…
    …multi-entity updates

    View full-size slide

  70. as
    FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013
    s
    Denormalized Friend List
    Fast reads…
    …multi-entity updates
    s

    View full-size slide

  71. as
    FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013
    s
    Denormalized Friend List
    Fast reads…
    …multi-entity updates
    s

    View full-size slide

  72. as
    FOREIGN KEY DEPENDENCIES
    “TAO: Facebook’s Distributed Data Store for the Social Graph”
    USENIX ATC 2013
    s
    Denormalized Friend List
    Fast reads…
    …multi-entity updates
    Not cleanly partitionable
    s

    View full-size slide

  73. NEED
    ATOMIC VISIBILITY
    FOREIGN KEY DEPENDENCIES
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  74. NEED
    ATOMIC VISIBILITY
    SEE ALL OF A TXN’S UPDATES, OR NONE OF THEM
    FOREIGN KEY DEPENDENCIES
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  75. NEED
    ATOMIC VISIBILITY
    SEE ALL OF A TXN’S UPDATES, OR NONE OF THEM
    FOREIGN KEY DEPENDENCIES
    SECONDARY INDEXING
    MATERIALIZED VIEWS
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  76. X=0 Y=0
    HOW TO ACHIEVE ATOMIC VISIBILITY
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  77. STRAWMAN: LOCKING
    X=0 Y=0
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  78. STRAWMAN: LOCKING
    X=0 Y=0
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  79. STRAWMAN: LOCKING
    X=0 Y=0
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  80. STRAWMAN: LOCKING
    X=1 Y=1
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  81. STRAWMAN: LOCKING
    X=1 Y=1
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  82. STRAWMAN: LOCKING
    X=1 Y=1
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  83. STRAWMAN: LOCKING
    X=1 Y=1
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  84. STRAWMAN: LOCKING
    X=1 Y=1
    W(X=1)
    W(Y=1)
    R(X=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  85. STRAWMAN: LOCKING
    X=1 Y=1
    W(X=1)
    W(Y=1)
    R(X=1)
    R(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  86. Y=0
    STRAWMAN: LOCKING
    X=1
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  87. Y=0
    STRAWMAN: LOCKING
    X=1
    W(X=1)
    W(Y=1)
    R(X=?)
    R(Y=?)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  88. Y=0
    STRAWMAN: LOCKING
    X=1
    W(X=1)
    W(Y=1)
    R(X=?)
    R(Y=?)
    ATOMIC VISIBILITY
    COUPLED WITH
    MUTUAL EXCLUSION
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  89. STRAWMAN: LOCKING
    X=1
    W(X=1)
    W(Y=1)
    Y=0
    R(X=?)
    R(Y=?)
    ATOMIC VISIBILITY
    COUPLED WITH
    MUTUAL EXCLUSION
    SLOW
    unavailable
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  90. TRANSACTIONS
    R
    A
    M
    P
    TOMIC
    EAD
    ULTI-
    ARTITION
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  91. TRANSACTIONS
    R
    A
    M
    P
    TOMIC
    EAD
    ULTI-
    ARTITION
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  92. TRANSACTIONS
    RAMP
    DECOUPLE
    ATOMIC VISIBILITY
    MUTUAL EXCLUSION
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  93. TRANSACTIONS
    RAMP
    DECOUPLE
    ATOMIC VISIBILITY
    MUTUAL EXCLUSION
    from
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  94. BASIC IDEA
    W(X=1)
    W(Y=1)
    Y=0
    R(X=?)
    R(Y=?)
    X=1
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  95. BASIC IDEA
    W(X=1)
    W(Y=1)
    Y=0
    R(X=?)
    R(Y=?)
    X=1
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  96. BASIC IDEA
    W(X=1)
    W(Y=1)
    Y=0
    R(X=?)
    R(Y=?)
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  97. BASIC IDEA
    W(X=1)
    W(Y=1)
    Y=0
    R(X=?)
    R(Y=?)
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1
    LIMITED
    MULTI-VERSIONING
    + METADATA
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  98. BASIC IDEA
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0 Y=0
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  99. BASIC IDEA
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0 Y=0
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  100. BASIC IDEA
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0
    Y=1
    Y=0
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  101. BASIC IDEA
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0
    Y=1
    Y=0
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  102. BASIC IDEA
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0
    Y=1
    Y=0
    W(X=1)
    W(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  103. BASIC IDEA
    W(X=1)
    W(Y=1)
    R(X=?)
    R(Y=?)
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1 [t=124, {Y}]
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0 [t=0, {}]
    Y=1 [t=124, {X}]
    Y=0 [t=0, {}]
    R(X=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  104. BASIC IDEA
    W(X=1)
    W(Y=1)
    R(X=?)
    R(Y=?)
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1 [t=124, {Y}]
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0 [t=0, {}]
    Y=1 [t=124, {X}]
    Y=0 [t=0, {}]
    R(Y=0)
    R(X=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  105. BASIC IDEA
    W(X=1)
    W(Y=1)
    R(X=?)
    R(Y=?)
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1 [t=124, {Y}]
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0 [t=0, {}]
    Y=1 [t=124, {X}]
    Y=0 [t=0, {}]
    R(Y=0)
    R(X=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  106. BASIC IDEA
    W(X=1)
    W(Y=1)
    R(X=?)
    R(Y=?)
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1 [t=124, {Y}]
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0 [t=0, {}]
    Y=1 [t=124, {X}]
    Y=0 [t=0, {}]
    R(Y=0)
    ITEM HIGHEST TS
    X 124
    Y 124
    R(X=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  107. BASIC IDEA
    W(X=1)
    W(Y=1)
    R(X=?)
    R(Y=?)
    LET CLIENTS RACE, but
    HAVE READERS “CLEAN UP”
    X=1 [t=124, {Y}]
    LIMITED
    MULTI-VERSIONING
    + METADATA
    X=0 [t=0, {}]
    Y=1 [t=124, {X}]
    Y=0 [t=0, {}]
    R(Y=0)
    ITEM HIGHEST TS
    X 124
    Y 124
    R(X=1)
    R(Y=1)
    “Scalable Atomic Visibility with RAMP Transactions” SIGMOD 2014

    View full-size slide

  108. TPCC
    Combine fkeys with sequence number
    insert on commit...
    500K
    txns/s

    View full-size slide

  109. 47,852
    Serializable locking bottlenecks on
    coordination over network
    “Coordination-Avoiding Database Systems” arXiv:1402.2237
    New-Order Transactions/s

    View full-size slide

  110. 47,852
    Serializable locking bottlenecks on
    coordination over network 632,589
    Coordination-avoiding implementation
    (RAMP with fast ID assignment)
    bottlenecks on CPU
    EC2 cr1.8xlarge
    here, 8 servers
    “Coordination-Avoiding Database Systems” arXiv:1402.2237
    New-Order Transactions/s

    View full-size slide

  111. 0 50 100 150 200
    Number of Servers
    2M
    4M
    6M
    8M
    10M
    12M
    14M
    Total Throughput (txn/s)

    View full-size slide

  112. 0 50 100 150 200
    Number of Servers
    2M
    4M
    6M
    8M
    10M
    12M
    14M
    Total Throughput (txn/s)
    INDUSTRY-STANDARD
    TRANSACTIONAL WORKLOADS
    CAN SCALE JUST FINE*

    View full-size slide

  113. INDUSTRY-STANDARD
    TRANSACTIONAL WORKLOADS
    CAN SCALE JUST FINE*
    GIVEN THE RIGHT
    MANY

    View full-size slide

  114. INDUSTRY-STANDARD
    TRANSACTIONAL WORKLOADS
    CAN SCALE JUST FINE*
    GIVEN THE RIGHT
    SYSTEM DESIGN
    CONCURRENCY PRIMITIVES
    ATTENTION TO SCALE
    MANY

    View full-size slide

  115. INDUSTRY-STANDARD
    TRANSACTIONAL WORKLOADS
    CAN SCALE JUST FINE*
    GIVEN THE RIGHT
    SYSTEM DESIGN
    CONCURRENCY PRIMITIVES
    ATTENTION TO SCALE
    LEVEL OF COORDINATION
    MANY

    View full-size slide

  116. THE NETWORK
    INCURS LATENCY
    THE NETWORK
    IS UNRELIABLE
    SO HOW CAN WE BUILD ROBUST
    AND SCALABLE DISTRIBUTED
    SYSTEMS?

    View full-size slide

  117. THE NETWORK
    INCURS LATENCY
    THE NETWORK
    IS UNRELIABLE
    SO HOW CAN WE BUILD ROBUST
    AND SCALABLE DISTRIBUTED
    SYSTEMS?
    UNDERSTAND COORDINATION

    View full-size slide

  118. COORDINATION AVOIDANCE
    UNDERSTAND IF/WHEN COORDINATION IS REQUIRED

    View full-size slide

  119. COORDINATION AVOIDANCE
    UNDERSTAND IF/WHEN COORDINATION IS REQUIRED
    INVARIANT CONFLUENCE (arXiv 2014)
    necessary and sufficient condition for c-free operation
    HIGHLY AVAILABLE TRANSACTIONS (CACM, VLDB 2014)
    what database isolation levels are coordination-free?
    RAMP ATOMIC VISIBILITY (SIGMOD 2014)
    fast and intuitive multi-put, multi-get, indexing
    BLOOM and BLAZES (ICDE 2014)
    language-level automated coordination analysis
    CRDTS and BLOOM^L (SoCC 2013, USENIX ATC 2014)
    correct-by-design distributed data types
    PBS INCONSISTENCY (VLDBJ 2014)
    how stale is data if we don’t coordinate?

    View full-size slide

  120. Traditional distributed systems designs!
    suffer from coordination bottlenecks
    By understanding application requirements,!
    we can avoid coordination
    We can build systems that actually scale!
    while providing correct behavior
    Thanks!!
    !
    [email protected]!
    @pbailis!
    http://bailis.org/ http://amplab.cs.berkeley.edu/!

    View full-size slide

  121. Punk designed by my name is mud from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Queen designed by Bohdan Burmich from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Guy Fawkes designed by Anisha Varghese from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Emperor designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Database designed by Shmidt Sergey from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    List designed by Nicholas Menghini from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Warehouse designed by Wilson Joseph from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    User designed by JM Waideaswaran from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Thermostat designed by Michael Senkow from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Customer Service designed by Bybzee from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Punk Rocker designed by Simon Child from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Jackhammer designed by Jamie Dickinson from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Earth designed by Martin Vanco from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Smart-Phone designed by Emily Haasch from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Cloud designed by Piotrek Chuchla from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Server designed by Jaime Carrion from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Computer designed by Matthew Hawdon from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Computer designed by james zamyslianskyj from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Computer designed by Alyssa Mahlberg from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    Lock designed by dylan voisard from the Noun Project Creative Commons – Attribution (CC BY 3.0)
    !
    COCOGOOSE font by ZetaFonts COMMON CREATIVE NON COMMERCIAL USE
    IMAGE/FONT CREDITs

    View full-size slide