$30 off During Our Annual Pro Sale. View Details »

2017 Jim Gray Award Talk: Coordination Avoidance in Distributed Databases

2017 Jim Gray Award Talk: Coordination Avoidance in Distributed Databases

More Decks by Stanford Future Data Systems

Other Decks in Technology

Transcript

  1. COORDINATION
    AVOIDANCE

    IN

    DISTRIBUTED

    DATABASES
    PETER BAILIS
    Stanford
    bailis.org
    2017 ACM SIGMOD Jim Gray Award Talk
    Chicago, IL May 2017

    View Slide

  2. How should we design database systems
    that enable new applications to scale?
    “post
    on
    timeline”
    “accept
    friend
    request”

    View Slide

  3. CLASSIC:

    ACID

    View Slide

  4. CLASSIC:

    ACID
    serializable transactions
    “accept
    friend
    request”
    “post
    on
    timeline”

    View Slide

  5. CLASSIC:

    ACID
    serializable transactions

    View Slide

  6. transactions cannot make progress independently
    Problem: Serializability requires Coordination
    Two-Phase Locking
    Optimistic Concurrency Control Pre-Scheduling
    Multi-Version Concurrency Control Blocking
    Waiting
    Aborts

    View Slide

  7. transactions cannot make progress independently
    Problem: Serializability requires Coordination
    133.7+
    ms RTT
    (7.5/s)
    Well-known for decades, but…

    View Slide

  8. Nje!3111t!
    Jnufsnfu!
    OpTRM!

    View Slide

  9. Major focus: coordination-free execution,
    or guaranteed response from every replica
    Availability
    Low latency
    Perfect
    horizontal
    scalability
    Benefits:
    OpTRM!

    View Slide

  10. Major focus: coordination-free execution,
    or guaranteed response from every replica
    Availability
    Low latency
    Perfect
    horizontal
    scalability
    Benefits:
    cost: rarely guarantee
    application safety properties

    View Slide

  11. Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    THESIS WORK:
    What is the coordination cost of a
    given safety guarantee?
    How do we achieve the minimum?
    “ACID” “NoSQL”

    View Slide

  12. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE APP SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14

    View Slide

  13. Model Prediction
    and Training
    CIDR15, LearningSys15
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    Data Serving and Transactions
    Analytics

    View Slide

  14. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, LearningSys15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE APP SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  15. The Far Side,
    Gary Larson

    View Slide

  16. WHAT THE APPLICATION SAYS
    “post
    on
    timeline”
    “accept
    friend
    request”
    write read
    write
    read
    write
    write
    read
    write
    write
    write
    read
    write
    WHAT THE DATABASE HEARS
    read
    read
    read
    read
    read
    read

    View Slide

  17. (Abridged) Related Work
    » Semantics-based concurrency control: esp.
    commutativity and CALM analysis, laws of order
    » Available storage systems: optimistic replication,
    causal memory, CRDTs, eventually consistent transactions
    » Distributed computing: CAP, FLP, NBAC, quorums
    » Here: focus on necessary coordination for
    common, modern data-intensive apps

    View Slide

  18. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    Model Prediction
    and Training
    CIDR15, LearningSys15

    View Slide

  19. only
    3/18
    serializable
    by default
    only
    10/18
    provide
    serializability
    at all
    [VLDB 2014]

    View Slide

  20. does this weak isolation require coordination?
    many RDBMSs don’t provide serializability?!?

    View Slide

  21. Highly
    Available
    Transactions
    Example:
    Read Committed (RC)
    Informal: no dirty reads
    Transactions conceal writes
    until commit

    View Slide

  22. Fyjtujnh!
    Ebubcbtf!
    Jtpmbujpn
    Tfttjpn!Hvbsbnufft Ejtusjcvufe!
    Sfhjtufst!

    View Slide

  23. Unavailable
    Sticky Available
    Highly Available
    Legend
    prevents lost update†, prevents write skew‡,
    requires recency guarantees⊕
    Sticky Available
    Unavailable
    Highly
    Available
    [VLDB 2014]

    View Slide

  24. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    Model Prediction
    and Training
    CIDR15, LearningSys15

    View Slide

  25. Constraint Operation
    Equality, Inequality Any
    Generate unique ID Any
    Specify unique ID Insert
    > Increment
    > Decrement
    < Decrement
    < Increment
    Foreign Key Insert
    Foreign Key Delete
    Secondary Indexing Any
    Materialized Views Any
    AUTO_INCREMENT Insert
    Typical database
    constraints and
    operations
    (SQL)

    View Slide

  26. CONSTRAINT: User IDs are unique
    OPERATION: Add users
    MERGE: Set union
    {{Stu,ID=1},
    {Ann,ID=1}}
    Constraint
    violated!
    {}
    MERGE
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}
    Key idea: Check if constraints can be violated by
    “merging” independent operations
    ICT: Invariant Confluence Test

    View Slide

  27. Key idea: Check if constraints can be violated by
    “merging” independent operations
    CONSTRAINT: User IDs are positive
    OPERATION: Add users
    MERGE: Set union
    {{Stu,ID=1},
    {Ann,ID=1}}
    Constraint
    holds!
    {}
    MERGE
    add
    {Stu,ID=1}
    add
    {Ann,ID=1}
    ICT: Invariant Confluence Test

    View Slide

  28. Key idea: Check if constraints can be violated by
    “merging” independent operations
    OUR CONTRIBUTION:
    Generalizes classic partitioning-based indistinguishability arguments
    Theorem. A globally I-valid system can execute a set of
    transactions T with coordination-freedom, transactional availability,
    and convergence if and only if T are I-confluent with respect to I.
    [VLDB 2015]
    ICT ⟺ safe, coordination-free execution possible
    ICT: Invariant Confluence Test

    View Slide

  29. Constraint Operation OK?
    Equality, Inequality Any ???
    Generate unique ID Any ???
    Specify unique ID Insert ???
    > Increment ???
    > Decrement ???
    < Decrement ???
    < Increment ???
    Foreign Key Insert ???
    Foreign Key Delete ???
    Secondary Indexing Any ???
    Materialized Views Any ???
    AUTO_INCREMENT Insert ???
    Typical database
    constraints and
    operations
    (SQL)
    Under set merge

    View Slide

  30. Constraint Operation OK?
    Equality, Inequality Any Y
    Generate unique ID Any Y
    Specify unique ID Insert N
    > Increment Y
    > Decrement N
    < Decrement Y
    < Increment N
    Foreign Key Insert Y
    Foreign Key Delete Y*
    Secondary Indexing Any Y
    Materialized Views Any Y
    AUTO_INCREMENT Insert N [VLDB 2015]
    Typical database
    constraints and
    operations
    (SQL)
    Under set merge
    R
    A
    M
    P
    [SIGMOD 2014]

    View Slide

  31. adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-
    mexican-sofa
    communityengine
    copycopter-
    server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig

    View Slide

  32. CONSTRAINTS
    INCREDIBLY COMMON
    adopt-a-hydrant
    alchemy_cms
    amahi
    bostonrb
    boxroom
    brevidy
    browsercms
    bucketwise
    calagator
    canvas-lms
    carter
    chiliproject
    citizenry
    comas
    comfortable-mexican-sofa
    communityengine
    copycopter-server
    danbooru
    diaspora
    discourse
    enki
    fat_free_crm
    fedena
    forem
    fulcrum
    gitlab-ci
    gitlabhq
    govsgo
    heaven
    inkwell
    insoshi
    jobsworth
    juvia
    kandan
    linuxfr.org
    lobsters
    lovd-by-less
    nimbleshop
    obtvse
    onebody
    opal
    opencongress
    opengovernment
    openproject
    piggybak
    publify
    radiant
    railscollab
    redmine
    refinerycms
    ror_ecommerce
    rucksack
    saasy
    salor-retail
    selfstarter
    sharetribe
    skyline
    spot-us
    spree
    sprintapp
    squaresquash
    sugar
    teambox
    tracks
    tryshoppe
    wallgig
    zena
    67 projects 1.77M LoC 1957 tables
    9986 total; avg. 5.1 per table
    [SIGMOD 2015]
    86.9% PASS ICT

    View Slide

  33. 14/16 CONSTRAINTS PASS ICT
    TPC-C
    scale to
    over 25x
    best listed result
    0 50 100 150 200
    2M
    4M
    6M
    8M
    10M
    12M
    14M
    Total Throughput (txn/s)
    0 50 100 150 200
    Number of Servers
    0
    20K
    40K
    60K
    80K
    Throughput (txn/s/server)
    6-11x faster than
    ACID/serializability
    8 16 32 48 64
    Number of Warehouses
    40K
    100K
    600K
    Throughput (txns/s)
    Coordination-Avoiding Serializable (2PL)

    View Slide

  34. Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    Serializability
    COORDINATION
    REQUIRED
    GUARANTEED
    SAFETY
    Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE

    View Slide

  35. Unruly developers are
    fantastic inspiration
    • As applications have evolved, so have their
    database demands and desired semantics
    • Our opportunity: build systems that
    implement the semantics users want
    (not just what we want)

    View Slide

  36. • Mounting evidence:
    many programmers
    don’t use transactions
    correctly (or at all!)
    • Need not despair:
    opportunity for new
    theory and systems
    ACIDRain: Concurrency-Related Attacks on
    Database-Backed Web Applications
    [SIGMOD17]

    View Slide

  37. Ali Ghodsi Joe Hellerstein Ion Stoica
    COADVISORS
    Alan
    Fekete
    Mike
    Franklin
    KEY
    COLLABORATORS,
    MENTORS

    View Slide

  38. Michael R. Bernstein, Rick Branson, Mark Callaghan,
    Adrian Colyer, Sean Cribbs, Jonathan Ellis, Alex Feinberg,
    Andy Gross, Coda Hale, Colin Jones, Evan Jones, Kyle
    Kingsbury, Adam Marcus, Caitie McCaffrey, Christopher
    Meiklejohn, Mike Miller, Jeremiah Peschka, Mark Phillips,
    Henry Robinson, Mehul Shah, Xavier Shay, Justin Sheehy,
    Ines Sombra, Kelly Sommers, Sriram Srinivasan
    and a cast of unruly developers and renegades:
    Also many thanks to
    a host of phenomenal colleagues and collaborators
    Peter Alvaro, Neil Conway, Shivaram Venkataraman,
    Joey Gonzalez, Haoyuan Li, Zhao Zhang, Aaron
    Davidson, Mike Jordan

    View Slide

  39. Eventual
    Consistency
    COORDINATION
    FREE
    NO SAFETY
    Atomic Visibility
    SIGMOD14
    Database
    Constraints
    VLDB15, SIGMOD15
    Model Prediction
    and Training
    CIDR15, TBA
    Weak Isolation
    HotOS13, VLDB14
    Causality
    SOCC12, SIGMOD13
    COORDINATION AVOIDANCE
    GUARANTEED SAFETY WITHOUT COORDINATION
    MORE APP SEMANTICS
    MORE SAFETY
    PBS
    VLDB12, VLDBJ14,
    SIGMOD13, CACM14
    COORDINATION FREE
    Joint work with Ali Ghodsi, Joe Hellerstein,
    Ion Stoica, Mike Franklin, Michael Jordan,
    Alan Fekete, Dan Crankshaw, Shivaram
    Venkataraman, Neil Conway, Peter Alvaro,
    Aaron Davidson, Joey Gonzalez, Kyle Kingsbury,
    Haoyuan Li, and Zhao Zhang

    View Slide