Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Phase Reconciliation for Contended In-Memory Transactions

Neha
October 29, 2014

Phase Reconciliation for Contended In-Memory Transactions

Transactions. Multicore. Contention. PROBLEMS.

(these are better in ppt, see here for link: http://nehanaru.la)

Neha

October 29, 2014
Tweet

More Decks by Neha

Other Decks in Research

Transcript

  1. Phase Reconciliation for
    Contended In-Memory
    Transactions
    Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris
    MIT CSAIL and Harvard
    1  

    View Slide

  2. 2  
    @neha
    MIT CSAIL grad student
    formerly Google
    graduating soon
    (hopefully)
    http://nehanaru.la

    View Slide

  3. IncrTxn(k Key) {
    INCR(k, 1)
    }
    LikePageTxn(page Key, user Key) {
    INCR(page, 1)
    liked_pages := GET(user)
    PUT(user, liked_pages + page)
    }
    FriendTxn(u1 Key, u2 Key) {
    PUT(friend:u1:u2, 1)
    PUT(friend:u2:u1, 1)
    }
    3  

    View Slide

  4. IncrTxn(k Key) {
    INCR(k, 1)
    }
    LikePageTxn(page Key, user Key) {
    INCR(page, 1)
    liked_pages := GET(user)
    PUT(user, liked_pages + page)
    }
    FriendTxn(u1 Key, u2 Key) {
    PUT(friend:u1:u2, 1)
    PUT(friend:u2:u1, 1)
    }
    4  

    View Slide

  5. Applications experience write contention on
    popular data
    5  
    Problem

    View Slide

  6. 6  

    View Slide

  7. 7  

    View Slide

  8. Concurrency Control Enforces Serial
    Execution
    core 0
    core 1
    core 2
    INCR(x,1)
    INCR(x,1)
    INCR(x,1)
    8  
    time
    Transactions on the same records
    execute one at a time

    View Slide

  9. Throughput on a Contentious
    Transactional Workload
    9  
    0 10 20 30 40 50 60 70 80
    Throughput (txns/sec)
    cores
    OCC

    View Slide

  10. Throughput on a Contentious
    Transactional Workload
    10  
    0 10 20 30 40 50 60 70 80
    Throughput (txns/sec)
    cores
    Doppel
    OCC

    View Slide

  11. INCR on the Same Records Can
    Execute in Parallel
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1)
    INCR(x2
    ,1)
    11  
    time
    •  Transactions on the same record can proceed in
    parallel on per-core slices and be reconciled later
    •  This is correct because INCR commutes
    1
    1
    1
    per-core slices
    of record x

    x is split across
    cores
    INCR(x0
    ,1)
    INCR(x1
    ,1)
    INCR(x2
    ,1)

    View Slide

  12. Databases Must Support General
    Purpose Transactions
    IncrTxn(k Key) {
    INCR(k, 1)
    }
    PutMaxTxn(k1 Key, k2 Key) {
    v1 := GET(k1)
    v2 := GET(k2)
    if v1 > v2:
    PUT(k1, v2)
    else:
    PUT(k2, v1)
    return v1,v2
    }
    12  
    IncrPutTxn(k1 Key, k2 Key, v Value) {
    INCR(k1, 1)
    PUT(k2, v)
    }
    Must happen
    atomically
    Must happen
    atomically
    Returns a value

    View Slide

  13. Challenge
    Fast, general-purpose serializable transaction
    execution with per-core slices for contended
    records
    13  

    View Slide

  14. Phase Reconciliation

    •  Database automatically
    detects contention to
    split a record among
    cores
    •  Database cycles
    through phases: split,
    reconciliation, and
    joined


    14  
    reconciliation
    Joined
    Phase
    Split
    Phase
    Doppel, an in-memory transactional database

    View Slide

  15. Contributions
    Phase reconciliation
    – Splittable operations
    – Efficient detection and response to contention
    on individual records
    – Reordering of split transactions and reads to
    reduce conflict
    – Fast reconciliation of split values

    15  

    View Slide

  16. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    16  

    View Slide

  17. Split Phase
    core 0
    core 1
    core 2
    INCR(x,1)
    INCR(x,1) PUT(y,2)
    INCR(x,1) PUT(z,1)
    17  
    core 3 INCR(x,1) PUT(y,2)
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    •  The split phase transforms operations on contended
    records (x) into operations on per-core slices (x0
    , x1
    , x2
    , x3
    )
    split phase

    View Slide

  18. •  Transactions can operate on split and non-split records
    •  Rest of the records use OCC (y, z)
    •  OCC ensures serializability for the non-split parts of the
    transaction
    18  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    split phase

    View Slide

  19. •  Split records have assigned operations for a given split phase
    •  Cannot correctly process a read of x in the current state
    •  Stash transaction to execute after reconciliation
    19  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    split phase
    INCR(x1
    ,1)
    INCR(x2
    ,1)
    INCR(x1
    ,1)
    GET(x)

    View Slide

  20. 20  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) PUT(y,2)
    INCR(x2
    ,1) PUT(z,1)
    core 3 INCR(x3
    ,1) PUT(y,2)
    split phase
    •  All threads hear they should reconcile their per-core state
    •  Stop processing per-core writes
    GET(x)
    INCR(x1
    ,1)
    INCR(x2
    ,1)
    INCR(x1
    ,1)

    View Slide

  21. •  Reconcile state to global store
    •  Wait until all threads have finished reconciliation
    •  Resume stashed read transactions in joined phase
    21  
    core 0
    core 1
    core 2
    core 3
    reconciliation phase joined phase
    x = x + x0

    x = x + x1

    x = x + x2

    x = x + x3

    GET(x)

    View Slide

  22. 22  
    core 0
    core 1
    core 2
    core 3
    x = x + x0

    x = x + x1

    x = x + x2

    x = x + x3

    reconciliation phase
    GET(x)
    joined phase
    •  Reconcile state to global store
    •  Wait until all threads have finished reconciliation
    •  Resume stashed read transactions in joined phase

    View Slide

  23. 23  
    core 0
    core 1
    core 2
    core 3
    GET(x)
    •  Process new transactions in joined phase using OCC
    •  No split data
    joined phase
    INCR(x)
    INCR(x, 1)
    GET(x)
    GET(x)

    View Slide

  24. Batching Amortizes the Cost of
    Reconciliation
    24  
    core 0
    core 1
    core 2
    INCR(x0
    ,1)
    INCR(x1
    ,1) INCR(y,2)
    INCR(x2
    ,1) INCR(z,1)
    core 3 INCR(x3
    ,1) INCR(y,2)
    GET(x)
    •  Wait to accumulate stashed transactions, batch for joined phase
    •  Amortize the cost of reconciliation over many transactions
    •  Reads would have conflicted; now they do not
    INCR(x1
    ,1)
    INCR(x2
    ,1) INCR(z,1)
    GET(x)
    GET(x)
    GET(x)
    GET(x)
    GET(x)
    split phase
    joined
    phase

    View Slide

  25. Phase Reconciliation Summary
    •  Many contentious writes happen in parallel
    in split phases
    •  Reads and any other incompatible
    operations happen correctly in joined
    phases
    25  

    View Slide

  26. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    26  

    View Slide

  27. Ordered PUT and insert
    to an ordered list
    Operation Model
    Developers write transactions as stored procedures which
    are composed of operations on keys and values:
    27  
    value GET(k)
    void PUT(k,v)
    void INCR(k,n)
    void MAX(k,n)
    void MULT(k,n)
    void OPUT(k,v,o)
    void TOPK_INSERT(k,v,o)

    Traditional key/value
    operations
    Operations on numeric
    values which modify the
    existing value
    Not splittable
    Splittable

    View Slide

  28. 27
    10
    0
    MAX Can Be Efficiently Reconciled
    28  
    core 0
    core 1
    core 2
    MAX(x0
    ,55)
    MAX(x1
    ,10)
    MAX(x2
    ,21)
    0
    0
    •  Each core keeps one piece of state xi

    •  O(#cores) time to reconcile x
    •  Result is compatible with any order
    55
    21
    MAX(x0
    ,2)
    MAX(x1
    ,27)
    x = 55

    View Slide

  29. What Operations Does Doppel Split?
    Properties of operations that Doppel can split:
    – Commutative
    – Can be efficiently reconciled
    – Single key
    – Have no return value

    However:
    – Only one operation per record per split phase
    29  

    View Slide

  30. RestockTxn(x, y Key) {
    INCR(x)
    if GET(x) > GET(y) {
    INCR(y)
    }
    }
    RESTOCK
    Complicated Operations Can Also
    Commute
    30  
    x y

    View Slide

  31. 1
    RESTOCK Can Execute In Split
    Phase
    31  
    core 0
    core 1
    core 2
    RESTOCK(c0
    ,x,y)
    RESTOCK(c1
    ,x,y)
    RESTOCK(c2
    ,x,y)
    1
    1
    •  Each core keeps one piece of state ci
    , count of
    RESTOCK operations
    •  Must be the only operation happening on x and y
    •  Different merge function
    RESTOCK(c0
    ,x,y) 2

    View Slide

  32. RESTOCK’s Merge Function
    32  
    RESTOCK-merge(ci
    int, x, y Key) {
    xval := GET(x)
    yval := GET(y)
    if xval < yval {
    PUT(x, xval + ci
    )
    if yval < xval + ci
    {
    PUT(y, xval + ci
    )
    }
    } else {
    PUT(x, xval + ci
    )
    PUT(y, yval + ci
    )
    }
    }
    INCR-merge(xi
    int, k Key) {
    val := GET(k)
    PUT(k, val + xi
    )
    }

    View Slide

  33. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    33  

    View Slide

  34. Which Records Does Doppel Split?
    •  Database starts out with no split data
    •  Count conflicts on records
    – Make key split if #conflicts > conflictThreshold
    •  Count stashes on records in the split phase
    – Move key back to non-split if #stashes too high

    34  

    View Slide

  35. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    35  

    View Slide

  36. Implementation
    •  Doppel implemented as a multithreaded Go
    server; one worker thread per core
    •  Transactions are procedures written in Go
    36  

    View Slide

  37. Interesting Roadblocks at 80 Cores
    •  Marking memory for GC slow
    – https://codereview.appspot.com/100230043
    •  Memory allocation
    – Reduced, turned GC way down
    •  The Go scheduler sleeping/waking
    goroutines
    – Tight loop; try not to block or relinquish control
    •  Interfaces
    •  RPC serialization
    37  

    View Slide

  38. Outline
    1.  Phase reconciliation
    2.  Operations
    3.  Detecting contention
    4.  Implementation
    5.  Performance evaluation
    38  

    View Slide

  39. Experimental Setup
    •  All experiments run on an 80 core Intel
    server running 64 bit Linux 3.12 with
    256GB of RAM
    •  All data fits in memory; don’t measure RPC
    •  All graphs measure throughput in
    transactions/sec
    39  

    View Slide

  40. Performance Evaluation
    •  How much does Doppel improve
    throughput on contentious write-only
    workloads?
    •  What kinds of read/write workloads benefit?
    •  Does Doppel improve throughput for a
    realistic application: RUBiS?
    40  

    View Slide

  41. Doppel Executes Conflicting
    Workloads in Parallel
    Throughput (millions txns/sec)
    20 cores, 1M 16 byte keys, transaction: INCR(x,1) all on same key
    0
    5
    10
    15
    20
    25
    30
    35
    Doppel OCC 2PL
    41  

    View Slide

  42. Doppel Outperforms OCC Even With
    Low Contention
    0M
    5M
    10M
    15M
    20M
    25M
    30M
    35M
    0 20 40 60 80 100
    Throughput (txns/sec)
    % of transactions with hot key
    Doppel
    OCC
    42  
    20 cores, 1M 16 byte keys, transaction: INCR(x,1) on different keys
    5% of writes to
    contended key

    View Slide

  43. Comparison to Other Systems
    43  
    20 cores, 1M 16 byte keys, transaction: INCR(x,1) on different keys
    0M
    5M
    10M
    15M
    20M
    25M
    30M
    35M
    0 20 40 60 80 100
    Throughput (txns/sec)
    % of transactions with hot key
    Doppel
    OCC
    2PL
    Atomic
    Atomic
    Increments
    2PL
    Atomic
    increments still
    happen serially

    View Slide

  44. Contentious Workloads Scale Well
    1M 16 byte keys, transaction: INCR(x,1) all writing same key 44  
    0M
    10M
    20M
    30M
    40M
    50M
    60M
    70M
    80M
    90M
    100M
    0 10 20 30 40 50 60 70 80
    Throughput (txns/sec)
    number of cores
    Doppel
    OCC
    Communication of
    phase changing

    View Slide

  45. LIKE Benchmark
    •  Users liking pages on a social network
    •  2 tables: users, pages
    •  Two transactions:
    –  Increment page’s like count, insert user like of page
    –  Read a page’s like count, read user’s last like
    •  1M users, 1M pages, Zipfian distribution of page
    popularity
    Doppel splits the page-like-counts for popular pages
    But those counts are also read more often
    45  

    View Slide

  46. Benefits Even When There Are
    Reads and Writes to the Same
    Popular Keys
    46  
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9
    Doppel OCC
    Throughput (millions txns/sec)
    20 cores, transactions: 50% LIKE read, 50% LIKE write

    View Slide

  47. Doppel Outperforms OCC For A
    Wide Range of Read/Write Mixes
    20 cores, transactions: LIKE read, LIKE write 47  
    0M
    2M
    4M
    6M
    8M
    10M
    12M
    14M
    16M
    18M
    0 20 40 60 80 100
    Throughput (txns/sec)
    % of transactions that read
    Doppel
    OCC
    Doppel does not split
    any data and performs
    the same as OCC!
    More stashed read transactions

    View Slide

  48. RUBiS
    •  Auction application modeled after eBay
    –  Users bid on auctions, comment, list new items, search
    •  1M users and 33K auctions
    •  7 tables, 17 transactions
    •  85% read only transactions (RUBiS bidding mix)
    •  Two workloads:
    –  Uniform distribution of bids
    –  Skewed distribution of bids; a few auctions are very
    popular

    48  

    View Slide

  49. StoreBid Transaction
    StoreBidTxn(bidder, amount, item) {
    numBids := GET(NumBidsKey(item))
    PUT(NumBidsKey(item), numBids + 1)
    maxBid := GET(MaxBidKey(item))
    if amount > maxBid {
    PUT(MaxBidKey(item), amount)
    PUT(MaxBidderKey(item), bidder)
    }
    PUT(NewBidKey(), Bid{bidder, amount, item})
    }
    49  

    View Slide

  50. StoreBid Transaction
    StoreBidTxn(bidder, amount, item) {
    INCR(NumBidsKey(item),1)
    MAX(MaxBidKey(item), amount)
    OPUT(MaxBidderKey(item), bidder, amount)
    PUT(NewBidKey(), Bid{bidder, amount, item})
    }
    All commutative operations on
    potentially conflicting auction metadata

    Inserting new bids is not likely to conflict
    50  

    View Slide

  51. 0
    2
    4
    6
    8
    10
    12
    Uniform Skewed
    Doppel
    OCC
    Doppel Improves Throughput on an
    Application Benchmark
    51  
    Throughput (millions txns/sec)
    80 cores, 1M users 33K auctions, RUBiS bidding mix
    8% StoreBid
    Transactions
    3.2x
    throughput
    improvement

    View Slide

  52. Related Work
    •  Concurrency control
    –  Commutative concurrency control [Weihl ’88]
    –  Escrow transactions [O’Neil ’86]
    –  OCC [Kung ’81]
    –  Silo [Tu ’13]
    •  Commutativity in distributed systems
    –  CRDTs [Shapiro ’11]
    –  RedBlue consistency [Li ’12]
    –  Walter [Lloyd ’12]
    •  Scalable datastructures in multicore Oses (counters, memory
    allocator)
    52  

    View Slide

  53. Future Work
    •  Do per-key phases more perform better?
    •  How could we use phases with distributed
    transactions?
    •  What other types of commutative
    operations can we add?
    – User-defined operations
    – State and argument based commutativity
    •  INCR(k, 0) MULT(k, 1)
    53  

    View Slide

  54. Conclusion
    Doppel:
    •  Achieves serializability and parallel performance when
    many transactions conflict by combining per-core data,
    commutative operations, and concurrency control
    •  Performs comparably to OCC on uniform or read-heavy
    workloads while improving performance significantly on
    skewed workloads.
    54  
    http://pdos.csail.mit.edu/doppel
    @neha

    View Slide