Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Phase Reconciliation for Contended In-Memory Transactions

Neha
October 29, 2014

Phase Reconciliation for Contended In-Memory Transactions

Transactions. Multicore. Contention. PROBLEMS.

(these are better in ppt, see here for link: http://nehanaru.la)

Neha

October 29, 2014
Tweet

More Decks by Neha

Other Decks in Research

Transcript

  1. Phase Reconciliation for
    Contended In-Memory
    Transactions
    Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris
    MIT CSAIL and Harvard
    1  

    View full-size slide

  2. 2  
    @neha
    MIT CSAIL grad student
    formerly Google
    graduating soon
    (hopefully)
    http://nehanaru.la

    View full-size slide

  3. IncrTxn(k Key) {
    INCR(k, 1)
    }
    LikePageTxn(page Key, user Key) {
    INCR(page, 1)
    liked_pages := GET(user)
    PUT(user, liked_pages + page)
    }
    FriendTxn(u1 Key, u2 Key) {
    PUT(friend:u1:u2, 1)
    PUT(friend:u2:u1, 1)
    }
    3  

    View full-size slide

  4. IncrTxn(k Key) {
    INCR(k, 1)
    }
    LikePageTxn(page Key, user Key) {
    INCR(page, 1)
    liked_pages := GET(user)
    PUT(user, liked_pages + page)
    }
    FriendTxn(u1 Key, u2 Key) {
    PUT(friend:u1:u2, 1)
    PUT(friend:u2:u1, 1)
    }
    4  

    View full-size slide

  5. Applications experience write contention on
    popular data
    5  
    Problem

    View full-size slide

  6. Concurrency Control Enforces Serial
    Execution
    core 0
    core 1
    core 2
    INCR(x,1)
    INCR(x,1)
    INCR(x,1)
    8  
    time
    Transactions on the same records
    execute one at a time

    View full-size slide

  7. Throughput on a Contentious
    Transactional Workload
    9  
    0 10 20 30 40 50 60 70 80
    Throughput (txns/sec)
    cores
    OCC

    View full-size slide

  8. Throughput on a Contentious
    Transactional Workload
    10  
    0 10 20 30 40 50 60 70 80
    Throughput (txns/sec)
    cores
    Doppel
    OCC

    View full-size slide

  9. INCR on the Same Records Can
    Execute in Parallel
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1)
    INCR(x2
    ,1)
    11  
    time
    •  Transactions on the same record can proceed in
    parallel on per-core slices and be reconciled later
    •  This is correct because INCR commutes
    1
    1
    1
    per-core slices
    of record x

    x is split across
    cores
    INCR(x0
    ,1)
    INCR(x1
    ,1)
    INCR(x2
    ,1)

    View full-size slide

  10. Databases Must Support General
    Purpose Transactions
    IncrTxn(k Key) {
    INCR(k, 1)
    }
    PutMaxTxn(k1 Key, k2 Key) {
    v1 := GET(k1)
    v2 := GET(k2)
    if v1 > v2:
    PUT(k1, v2)
    else:
    PUT(k2, v1)
    return v1,v2
    }
    12  
    IncrPutTxn(k1 Key, k2 Key, v Value) {
    INCR(k1, 1)
    PUT(k2, v)
    }
    Must happen
    atomically
    Must happen
    atomically
    Returns a value

    View full-size slide

  11. Challenge
    Fast, general-purpose serializable transaction
    execution with per-core slices for contended
    records
    13  

    View full-size slide

  12. Phase Reconciliation

    •  Database automatically
    detects contention to
    split a record among
    cores
    •  Database cycles
    through phases: split,
    reconciliation, and
    joined


    14  
    reconciliation
    Joined
    Phase
    Split
    Phase
    Doppel, an in-memory transactional database

    View full-size slide

  13. Contributions
    Phase reconciliation
    – Splittable operations
    – Efficient detection and response to contention
    on individual records
    – Reordering of split transactions and reads to
    reduce conflict
    – Fast reconciliation of split values

    15  

    View full-size slide

  14. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    16  

    View full-size slide

  15. Split Phase
    core 0
    core 1
    core 2
    INCR(x,1)
    INCR(x,1) PUT(y,2)
    INCR(x,1) PUT(z,1)
    17  
    core 3 INCR(x,1) PUT(y,2)
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    •  The split phase transforms operations on contended
    records (x) into operations on per-core slices (x0
    , x1
    , x2
    , x3
    )
    split phase

    View full-size slide

  16. •  Transactions can operate on split and non-split records
    •  Rest of the records use OCC (y, z)
    •  OCC ensures serializability for the non-split parts of the
    transaction
    18  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    split phase

    View full-size slide

  17. •  Split records have assigned operations for a given split phase
    •  Cannot correctly process a read of x in the current state
    •  Stash transaction to execute after reconciliation
    19  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    split phase
    INCR(x1
    ,1)
    INCR(x2
    ,1)
    INCR(x1
    ,1)
    GET(x)

    View full-size slide

  18. 20  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    split phase
    •  All threads hear they should reconcile their per-core state
    •  Stop processing per-core writes
    GET(x)
    INCR(x1
    ,1)
    INCR(x2
    ,1)
    INCR(x1
    ,1)

    View full-size slide

  19. •  Reconcile state to global store
    •  Wait until all threads have finished reconciliation
    •  Resume stashed read transactions in joined phase
    21  
    core 0
    core 1
    core 2
    core 3
    reconciliation phase joined phase
    x = x + x0

    x = x + x1

    x = x + x2

    x = x + x3

    GET(x)

    View full-size slide

  20. 22  
    core 0
    core 1
    core 2
    core 3
    x = x + x0

    x = x + x1

    x = x + x2

    x = x + x3

    reconciliation phase
    GET(x)
    joined phase
    •  Reconcile state to global store
    •  Wait until all threads have finished reconciliation
    •  Resume stashed read transactions in joined phase

    View full-size slide

  21. 23  
    core 0
    core 1
    core 2
    core 3
    GET(x)
    •  Process new transactions in joined phase using OCC
    •  No split data
    joined phase
    INCR(x)
    INCR(x, 1)
    GET(x)
    GET(x)

    View full-size slide

  22. Batching Amortizes the Cost of
    Reconciliation
    24  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) INCR(y,2)
    INCR(x2
    ,1) INCR(z,1)
    core 3 INCR(x3
    ,1) INCR(y,2)
    GET(x)
    •  Wait to accumulate stashed transactions, batch for joined phase
    •  Amortize the cost of reconciliation over many transactions
    •  Reads would have conflicted; now they do not
    INCR(x1
    ,1)
    INCR(x2
    ,1) INCR(z,1)
    GET(x)
    GET(x)
    GET(x)
    GET(x)
    GET(x)
    split phase
    joined
    phase

    View full-size slide

  23. Phase Reconciliation Summary
    •  Many contentious writes happen in parallel
    in split phases
    •  Reads and any other incompatible
    operations happen correctly in joined
    phases
    25  

    View full-size slide

  24. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    26  

    View full-size slide

  25. Ordered PUT and insert
    to an ordered list
    Operation Model
    Developers write transactions as stored procedures which
    are composed of operations on keys and values:
    27  
    value GET(k)
    void PUT(k,v)
    void INCR(k,n)
    void MAX(k,n)
    void MULT(k,n)
    void OPUT(k,v,o)
    void TOPK_INSERT(k,v,o)

    Traditional key/value
    operations
    Operations on numeric
    values which modify the
    existing value
    Not splittable
    Splittable

    View full-size slide

  26. 27
    10
    0
    MAX Can Be Efficiently Reconciled
    28  
    core 0
    core 1
    core 2
    MAX(x0
    ,55)
    MAX(x1
    ,10)
    MAX(x2
    ,21)
    0
    0
    •  Each core keeps one piece of state xi

    •  O(#cores) time to reconcile x
    •  Result is compatible with any order
    55
    21
    MAX(x0
    ,2)
    MAX(x1
    ,27)
    x = 55

    View full-size slide

  27. What Operations Does Doppel Split?
    Properties of operations that Doppel can split:
    – Commutative
    – Can be efficiently reconciled
    – Single key
    – Have no return value

    However:
    – Only one operation per record per split phase
    29  

    View full-size slide

  28. RestockTxn(x, y Key) {
    INCR(x)
    if GET(x) > GET(y) {
    INCR(y)
    }
    }
    RESTOCK
    Complicated Operations Can Also
    Commute
    30  
    x y

    View full-size slide

  29. 1
    RESTOCK Can Execute In Split
    Phase
    31  
    core 0
    core 1
    core 2
    RESTOCK(c0
    ,x,y)
    RESTOCK(c1
    ,x,y)
    RESTOCK(c2
    ,x,y)
    1
    1
    •  Each core keeps one piece of state ci
    , count of
    RESTOCK operations
    •  Must be the only operation happening on x and y
    •  Different merge function
    RESTOCK(c0
    ,x,y) 2

    View full-size slide

  30. RESTOCK’s Merge Function
    32  
    RESTOCK-merge(ci
    int, x, y Key) {
    xval := GET(x)
    yval := GET(y)
    if xval < yval {
    PUT(x, xval + ci
    )
    if yval < xval + ci
    {
    PUT(y, xval + ci
    )
    }
    } else {
    PUT(x, xval + ci
    )
    PUT(y, yval + ci
    )
    }
    }
    INCR-merge(xi
    int, k Key) {
    val := GET(k)
    PUT(k, val + xi
    )
    }

    View full-size slide

  31. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    33  

    View full-size slide

  32. Which Records Does Doppel Split?
    •  Database starts out with no split data
    •  Count conflicts on records
    – Make key split if #conflicts > conflictThreshold
    •  Count stashes on records in the split phase
    – Move key back to non-split if #stashes too high

    34  

    View full-size slide

  33. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    35  

    View full-size slide

  34. Implementation
    •  Doppel implemented as a multithreaded Go
    server; one worker thread per core
    •  Transactions are procedures written in Go
    36  

    View full-size slide

  35. Interesting Roadblocks at 80 Cores
    •  Marking memory for GC slow
    – https://codereview.appspot.com/100230043
    •  Memory allocation
    – Reduced, turned GC way down
    •  The Go scheduler sleeping/waking
    goroutines
    – Tight loop; try not to block or relinquish control
    •  Interfaces
    •  RPC serialization
    37  

    View full-size slide

  36. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    38  

    View full-size slide

  37. Experimental Setup
    •  All experiments run on an 80 core Intel
    server running 64 bit Linux 3.12 with
    256GB of RAM
    •  All data fits in memory; don’t measure RPC
    •  All graphs measure throughput in
    transactions/sec
    39  

    View full-size slide

  38. Performance Evaluation
    •  How much does Doppel improve
    throughput on contentious write-only
    workloads?
    •  What kinds of read/write workloads benefit?
    •  Does Doppel improve throughput for a
    realistic application: RUBiS?
    40  

    View full-size slide

  39. Doppel Executes Conflicting
    Workloads in Parallel
    Throughput (millions txns/sec)
    20 cores, 1M 16 byte keys, transaction: INCR(x,1) all on same key
    0
    5
    10
    15
    20
    25
    30
    35
    Doppel OCC 2PL
    41  

    View full-size slide

  40. Doppel Outperforms OCC Even With
    Low Contention
    0M
    5M
    10M
    15M
    20M
    25M
    30M
    35M
    0 20 40 60 80 100
    Throughput (txns/sec)
    % of transactions with hot key
    Doppel
    OCC
    42  
    20 cores, 1M 16 byte keys, transaction: INCR(x,1) on different keys
    5% of writes to
    contended key

    View full-size slide

  41. Comparison to Other Systems
    43  
    20 cores, 1M 16 byte keys, transaction: INCR(x,1) on different keys
    0M
    5M
    10M
    15M
    20M
    25M
    30M
    35M
    0 20 40 60 80 100
    Throughput (txns/sec)
    % of transactions with hot key
    Doppel
    OCC
    2PL
    Atomic
    Atomic
    Increments
    2PL
    Atomic
    increments still
    happen serially

    View full-size slide

  42. Contentious Workloads Scale Well
    1M 16 byte keys, transaction: INCR(x,1) all writing same key 44  
    0M
    10M
    20M
    30M
    40M
    50M
    60M
    70M
    80M
    90M
    100M
    0 10 20 30 40 50 60 70 80
    Throughput (txns/sec)
    number of cores
    Doppel
    OCC
    Communication of
    phase changing

    View full-size slide

  43. LIKE Benchmark
    •  Users liking pages on a social network
    •  2 tables: users, pages
    •  Two transactions:
    –  Increment page’s like count, insert user like of page
    –  Read a page’s like count, read user’s last like
    •  1M users, 1M pages, Zipfian distribution of page
    popularity
    Doppel splits the page-like-counts for popular pages
    But those counts are also read more often
    45  

    View full-size slide

  44. Benefits Even When There Are
    Reads and Writes to the Same
    Popular Keys
    46  
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    Doppel OCC
    Throughput (millions txns/sec)
    20 cores, transactions: 50% LIKE read, 50% LIKE write

    View full-size slide

  45. Doppel Outperforms OCC For A
    Wide Range of Read/Write Mixes
    20 cores, transactions: LIKE read, LIKE write 47  
    0M
    2M
    4M
    6M
    8M
    10M
    12M
    14M
    16M
    18M
    0 20 40 60 80 100
    Throughput (txns/sec)
    % of transactions that read
    Doppel
    OCC
    Doppel does not split
    any data and performs
    the same as OCC!
    More stashed read transactions

    View full-size slide

  46. RUBiS
    •  Auction application modeled after eBay
    –  Users bid on auctions, comment, list new items, search
    •  1M users and 33K auctions
    •  7 tables, 17 transactions
    •  85% read only transactions (RUBiS bidding mix)
    •  Two workloads:
    –  Uniform distribution of bids
    –  Skewed distribution of bids; a few auctions are very
    popular

    48  

    View full-size slide

  47. StoreBid Transaction
    StoreBidTxn(bidder, amount, item) {
    numBids := GET(NumBidsKey(item))
    PUT(NumBidsKey(item), numBids + 1)
    maxBid := GET(MaxBidKey(item))
    if amount > maxBid {
    PUT(MaxBidKey(item), amount)
    PUT(MaxBidderKey(item), bidder)
    }
    PUT(NewBidKey(), Bid{bidder, amount, item})
    }
    49  

    View full-size slide

  48. StoreBid Transaction
    StoreBidTxn(bidder, amount, item) {
    INCR(NumBidsKey(item),1)
    MAX(MaxBidKey(item), amount)
    OPUT(MaxBidderKey(item), bidder, amount)
    PUT(NewBidKey(), Bid{bidder, amount, item})
    }
    All commutative operations on
    potentially conflicting auction metadata

    Inserting new bids is not likely to conflict
    50  

    View full-size slide

  49. 0
    2
    4
    6
    8
    10
    12
    Uniform Skewed
    Doppel
    OCC
    Doppel Improves Throughput on an
    Application Benchmark
    51  
    Throughput (millions txns/sec)
    80 cores, 1M users 33K auctions, RUBiS bidding mix
    8% StoreBid
    Transactions
    3.2x
    throughput
    improvement

    View full-size slide

  50. Related Work
    •  Concurrency control
    –  Commutative concurrency control [Weihl ’88]
    –  Escrow transactions [O’Neil ’86]
    –  OCC [Kung ’81]
    –  Silo [Tu ’13]
    •  Commutativity in distributed systems
    –  CRDTs [Shapiro ’11]
    –  RedBlue consistency [Li ’12]
    –  Walter [Lloyd ’12]
    •  Scalable datastructures in multicore Oses (counters, memory
    allocator)
    52  

    View full-size slide

  51. Future Work
    •  Do per-key phases more perform better?
    •  How could we use phases with distributed
    transactions?
    •  What other types of commutative
    operations can we add?
    – User-defined operations
    – State and argument based commutativity
    •  INCR(k, 0) MULT(k, 1)
    53  

    View full-size slide

  52. Conclusion
    Doppel:
    •  Achieves serializability and parallel performance when
    many transactions conflict by combining per-core data,
    commutative operations, and concurrency control
    •  Performs comparably to OCC on uniform or read-heavy
    workloads while improving performance significantly on
    skewed workloads.
    54  
    http://pdos.csail.mit.edu/doppel
    @neha

    View full-size slide