Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Phase Reconciliation for Contended In-Memory Transactions

Neha
October 29, 2014

Phase Reconciliation for Contended In-Memory Transactions

Transactions. Multicore. Contention. PROBLEMS.

(these are better in ppt, see here for link: http://nehanaru.la)

Neha

October 29, 2014
Tweet

More Decks by Neha

Other Decks in Research

Transcript

  1. Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler,

    Eddie Kohler, Robert Morris MIT CSAIL and Harvard 1  
  2. IncrTxn(k Key) { INCR(k, 1) } LikePageTxn(page Key, user Key)

    { INCR(page, 1) liked_pages := GET(user) PUT(user, liked_pages + page) } FriendTxn(u1 Key, u2 Key) { PUT(friend:u1:u2, 1) PUT(friend:u2:u1, 1) } 3  
  3. IncrTxn(k Key) { INCR(k, 1) } LikePageTxn(page Key, user Key)

    { INCR(page, 1) liked_pages := GET(user) PUT(user, liked_pages + page) } FriendTxn(u1 Key, u2 Key) { PUT(friend:u1:u2, 1) PUT(friend:u2:u1, 1) } 4  
  4. Concurrency Control Enforces Serial Execution core 0 core 1 core

    2 INCR(x,1) INCR(x,1) INCR(x,1) 8   time Transactions on the same records execute one at a time
  5. Throughput on a Contentious Transactional Workload 9   0 10

    20 30 40 50 60 70 80 Throughput (txns/sec) cores OCC
  6. Throughput on a Contentious Transactional Workload 10   0 10

    20 30 40 50 60 70 80 Throughput (txns/sec) cores Doppel OCC
  7. INCR on the Same Records Can Execute in Parallel core

    0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) INCR(x2 ,1) 11   time •  Transactions on the same record can proceed in parallel on per-core slices and be reconciled later •  This is correct because INCR commutes 1 1 1 per-core slices of record x x is split across cores INCR(x0 ,1) INCR(x1 ,1) INCR(x2 ,1)
  8. Databases Must Support General Purpose Transactions IncrTxn(k Key) { INCR(k,

    1) } PutMaxTxn(k1 Key, k2 Key) { v1 := GET(k1) v2 := GET(k2) if v1 > v2: PUT(k1, v2) else: PUT(k2, v1) return v1,v2 } 12   IncrPutTxn(k1 Key, k2 Key, v Value) { INCR(k1, 1) PUT(k2, v) } Must happen atomically Must happen atomically Returns a value
  9. Phase Reconciliation •  Database automatically detects contention to split a

    record among cores •  Database cycles through phases: split, reconciliation, and joined 14   reconciliation Joined Phase Split Phase Doppel, an in-memory transactional database
  10. Contributions Phase reconciliation – Splittable operations – Efficient detection and response to

    contention on individual records – Reordering of split transactions and reads to reduce conflict – Fast reconciliation of split values 15  
  11. Split Phase core 0 core 1 core 2 INCR(x,1) INCR(x,1)

    PUT(y,2) INCR(x,1) PUT(z,1) 17   core 3 INCR(x,1) PUT(y,2) core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) •  The split phase transforms operations on contended records (x) into operations on per-core slices (x0 , x1 , x2 , x3 ) split phase
  12. •  Transactions can operate on split and non-split records • 

    Rest of the records use OCC (y, z) •  OCC ensures serializability for the non-split parts of the transaction 18   core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) split phase
  13. •  Split records have assigned operations for a given split

    phase •  Cannot correctly process a read of x in the current state •  Stash transaction to execute after reconciliation 19   core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) split phase INCR(x1 ,1) INCR(x2 ,1) INCR(x1 ,1) GET(x)
  14. 20   core 0 core 1 core 2 INCR(x0 ,1)

    INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) split phase •  All threads hear they should reconcile their per-core state •  Stop processing per-core writes GET(x) INCR(x1 ,1) INCR(x2 ,1) INCR(x1 ,1)
  15. •  Reconcile state to global store •  Wait until all

    threads have finished reconciliation •  Resume stashed read transactions in joined phase 21   core 0 core 1 core 2 core 3 reconciliation phase joined phase x = x + x0 x = x + x1 x = x + x2 x = x + x3 GET(x)
  16. 22   core 0 core 1 core 2 core 3

    x = x + x0 x = x + x1 x = x + x2 x = x + x3 reconciliation phase GET(x) joined phase •  Reconcile state to global store •  Wait until all threads have finished reconciliation •  Resume stashed read transactions in joined phase
  17. 23   core 0 core 1 core 2 core 3

    GET(x) •  Process new transactions in joined phase using OCC •  No split data joined phase INCR(x) INCR(x, 1) GET(x) GET(x)
  18. Batching Amortizes the Cost of Reconciliation 24   core 0

    core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) INCR(y,2) INCR(x2 ,1) INCR(z,1) core 3 INCR(x3 ,1) INCR(y,2) GET(x) •  Wait to accumulate stashed transactions, batch for joined phase •  Amortize the cost of reconciliation over many transactions •  Reads would have conflicted; now they do not INCR(x1 ,1) INCR(x2 ,1) INCR(z,1) GET(x) GET(x) GET(x) GET(x) GET(x) split phase joined phase
  19. Phase Reconciliation Summary •  Many contentious writes happen in parallel

    in split phases •  Reads and any other incompatible operations happen correctly in joined phases 25  
  20. Ordered PUT and insert to an ordered list Operation Model

    Developers write transactions as stored procedures which are composed of operations on keys and values: 27   value GET(k) void PUT(k,v) void INCR(k,n) void MAX(k,n) void MULT(k,n) void OPUT(k,v,o) void TOPK_INSERT(k,v,o) Traditional key/value operations Operations on numeric values which modify the existing value Not splittable Splittable
  21. 27 10 0 MAX Can Be Efficiently Reconciled 28  

    core 0 core 1 core 2 MAX(x0 ,55) MAX(x1 ,10) MAX(x2 ,21) 0 0 •  Each core keeps one piece of state xi •  O(#cores) time to reconcile x •  Result is compatible with any order 55 21 MAX(x0 ,2) MAX(x1 ,27) x = 55
  22. What Operations Does Doppel Split? Properties of operations that Doppel

    can split: – Commutative – Can be efficiently reconciled – Single key – Have no return value However: – Only one operation per record per split phase 29  
  23. RestockTxn(x, y Key) { INCR(x) if GET(x) > GET(y) {

    INCR(y) } } RESTOCK Complicated Operations Can Also Commute 30   x y
  24. 1 RESTOCK Can Execute In Split Phase 31   core

    0 core 1 core 2 RESTOCK(c0 ,x,y) RESTOCK(c1 ,x,y) RESTOCK(c2 ,x,y) 1 1 •  Each core keeps one piece of state ci , count of RESTOCK operations •  Must be the only operation happening on x and y •  Different merge function RESTOCK(c0 ,x,y) 2
  25. RESTOCK’s Merge Function 32   RESTOCK-merge(ci int, x, y Key)

    { xval := GET(x) yval := GET(y) if xval < yval { PUT(x, xval + ci ) if yval < xval + ci { PUT(y, xval + ci ) } } else { PUT(x, xval + ci ) PUT(y, yval + ci ) } } INCR-merge(xi int, k Key) { val := GET(k) PUT(k, val + xi ) }
  26. Which Records Does Doppel Split? •  Database starts out with

    no split data •  Count conflicts on records – Make key split if #conflicts > conflictThreshold •  Count stashes on records in the split phase – Move key back to non-split if #stashes too high 34  
  27. Implementation •  Doppel implemented as a multithreaded Go server; one

    worker thread per core •  Transactions are procedures written in Go 36  
  28. Interesting Roadblocks at 80 Cores •  Marking memory for GC

    slow – https://codereview.appspot.com/100230043 •  Memory allocation – Reduced, turned GC way down •  The Go scheduler sleeping/waking goroutines – Tight loop; try not to block or relinquish control •  Interfaces •  RPC serialization 37  
  29. Experimental Setup •  All experiments run on an 80 core

    Intel server running 64 bit Linux 3.12 with 256GB of RAM •  All data fits in memory; don’t measure RPC •  All graphs measure throughput in transactions/sec 39  
  30. Performance Evaluation •  How much does Doppel improve throughput on

    contentious write-only workloads? •  What kinds of read/write workloads benefit? •  Does Doppel improve throughput for a realistic application: RUBiS? 40  
  31. Doppel Executes Conflicting Workloads in Parallel Throughput (millions txns/sec) 20

    cores, 1M 16 byte keys, transaction: INCR(x,1) all on same key 0 5 10 15 20 25 30 35 Doppel OCC 2PL 41  
  32. Doppel Outperforms OCC Even With Low Contention 0M 5M 10M

    15M 20M 25M 30M 35M 0 20 40 60 80 100 Throughput (txns/sec) % of transactions with hot key Doppel OCC 42   20 cores, 1M 16 byte keys, transaction: INCR(x,1) on different keys 5% of writes to contended key
  33. Comparison to Other Systems 43   20 cores, 1M 16

    byte keys, transaction: INCR(x,1) on different keys 0M 5M 10M 15M 20M 25M 30M 35M 0 20 40 60 80 100 Throughput (txns/sec) % of transactions with hot key Doppel OCC 2PL Atomic Atomic Increments 2PL Atomic increments still happen serially
  34. Contentious Workloads Scale Well 1M 16 byte keys, transaction: INCR(x,1)

    all writing same key 44   0M 10M 20M 30M 40M 50M 60M 70M 80M 90M 100M 0 10 20 30 40 50 60 70 80 Throughput (txns/sec) number of cores Doppel OCC Communication of phase changing
  35. LIKE Benchmark •  Users liking pages on a social network

    •  2 tables: users, pages •  Two transactions: –  Increment page’s like count, insert user like of page –  Read a page’s like count, read user’s last like •  1M users, 1M pages, Zipfian distribution of page popularity Doppel splits the page-like-counts for popular pages But those counts are also read more often 45  
  36. Benefits Even When There Are Reads and Writes to the

    Same Popular Keys 46   0 1 2 3 4 5 6 7 8 9 Doppel OCC Throughput (millions txns/sec) 20 cores, transactions: 50% LIKE read, 50% LIKE write
  37. Doppel Outperforms OCC For A Wide Range of Read/Write Mixes

    20 cores, transactions: LIKE read, LIKE write 47   0M 2M 4M 6M 8M 10M 12M 14M 16M 18M 0 20 40 60 80 100 Throughput (txns/sec) % of transactions that read Doppel OCC Doppel does not split any data and performs the same as OCC! More stashed read transactions
  38. RUBiS •  Auction application modeled after eBay –  Users bid

    on auctions, comment, list new items, search •  1M users and 33K auctions •  7 tables, 17 transactions •  85% read only transactions (RUBiS bidding mix) •  Two workloads: –  Uniform distribution of bids –  Skewed distribution of bids; a few auctions are very popular 48  
  39. StoreBid Transaction StoreBidTxn(bidder, amount, item) { numBids := GET(NumBidsKey(item)) PUT(NumBidsKey(item),

    numBids + 1) maxBid := GET(MaxBidKey(item)) if amount > maxBid { PUT(MaxBidKey(item), amount) PUT(MaxBidderKey(item), bidder) } PUT(NewBidKey(), Bid{bidder, amount, item}) } 49  
  40. StoreBid Transaction StoreBidTxn(bidder, amount, item) { INCR(NumBidsKey(item),1) MAX(MaxBidKey(item), amount) OPUT(MaxBidderKey(item),

    bidder, amount) PUT(NewBidKey(), Bid{bidder, amount, item}) } All commutative operations on potentially conflicting auction metadata Inserting new bids is not likely to conflict 50  
  41. 0 2 4 6 8 10 12 Uniform Skewed Doppel

    OCC Doppel Improves Throughput on an Application Benchmark 51   Throughput (millions txns/sec) 80 cores, 1M users 33K auctions, RUBiS bidding mix 8% StoreBid Transactions 3.2x throughput improvement
  42. Related Work •  Concurrency control –  Commutative concurrency control [Weihl

    ’88] –  Escrow transactions [O’Neil ’86] –  OCC [Kung ’81] –  Silo [Tu ’13] •  Commutativity in distributed systems –  CRDTs [Shapiro ’11] –  RedBlue consistency [Li ’12] –  Walter [Lloyd ’12] •  Scalable datastructures in multicore Oses (counters, memory allocator) 52  
  43. Future Work •  Do per-key phases more perform better? • 

    How could we use phases with distributed transactions? •  What other types of commutative operations can we add? – User-defined operations – State and argument based commutativity •  INCR(k, 0) MULT(k, 1) 53  
  44. Conclusion Doppel: •  Achieves serializability and parallel performance when many

    transactions conflict by combining per-core data, commutative operations, and concurrency control •  Performs comparably to OCC on uniform or read-heavy workloads while improving performance significantly on skewed workloads. 54   http://pdos.csail.mit.edu/doppel @neha