Phase Reconciliation for Contended In-Memory Transactions

Slide 1

Slide 1 text

Phase Reconciliation for Contended In-Memory Transactions Neha Narula, Cody Cutler, Eddie Kohler, Robert Morris MIT CSAIL and Harvard 1

Slide 2

Slide 2 text

2 @neha MIT CSAIL grad student formerly Google graduating soon (hopefully) http://nehanaru.la

Slide 3

Slide 3 text

IncrTxn(k Key) { INCR(k, 1) } LikePageTxn(page Key, user Key) { INCR(page, 1) liked_pages := GET(user) PUT(user, liked_pages + page) } FriendTxn(u1 Key, u2 Key) { PUT(friend:u1:u2, 1) PUT(friend:u2:u1, 1) } 3

Slide 4

Slide 4 text

Slide 5

Slide 5 text

Applications experience write contention on popular data 5 Problem

Slide 6

Slide 6 text

Slide 7

Slide 7 text

Slide 8

Slide 8 text

Concurrency Control Enforces Serial Execution core 0 core 1 core 2 INCR(x,1) INCR(x,1) INCR(x,1) 8 time Transactions on the same records execute one at a time

Slide 9

Slide 9 text

Throughput on a Contentious Transactional Workload 9 0 10 20 30 40 50 60 70 80 Throughput (txns/sec) cores OCC

Slide 10

Slide 10 text

Throughput on a Contentious Transactional Workload 10 0 10 20 30 40 50 60 70 80 Throughput (txns/sec) cores Doppel OCC

Slide 11

Slide 11 text

INCR on the Same Records Can Execute in Parallel core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) INCR(x2 ,1) 11 time •  Transactions on the same record can proceed in parallel on per-core slices and be reconciled later •  This is correct because INCR commutes 1 1 1 per-core slices of record x x is split across cores INCR(x0 ,1) INCR(x1 ,1) INCR(x2 ,1)

Slide 12

Slide 12 text

Databases Must Support General Purpose Transactions IncrTxn(k Key) { INCR(k, 1) } PutMaxTxn(k1 Key, k2 Key) { v1 := GET(k1) v2 := GET(k2) if v1 > v2: PUT(k1, v2) else: PUT(k2, v1) return v1,v2 } 12 IncrPutTxn(k1 Key, k2 Key, v Value) { INCR(k1, 1) PUT(k2, v) } Must happen atomically Must happen atomically Returns a value

Slide 13

Slide 13 text

Challenge Fast, general-purpose serializable transaction execution with per-core slices for contended records 13

Slide 14

Slide 14 text

Phase Reconciliation •  Database automatically detects contention to split a record among cores •  Database cycles through phases: split, reconciliation, and joined 14 reconciliation Joined Phase Split Phase Doppel, an in-memory transactional database

Slide 15

Slide 15 text

Contributions Phase reconciliation – Splittable operations – Efﬁcient detection and response to contention on individual records – Reordering of split transactions and reads to reduce conﬂict – Fast reconciliation of split values 15

Slide 16

Slide 16 text

Outline 1.  Phase reconciliation 2.  Operations 3.  Detecting contention 4.  Implementation 5.  Performance evaluation 16

Slide 17

Slide 17 text

Split Phase core 0 core 1 core 2 INCR(x,1) INCR(x,1) PUT(y,2) INCR(x,1) PUT(z,1) 17 core 3 INCR(x,1) PUT(y,2) core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) •  The split phase transforms operations on contended records (x) into operations on per-core slices (x0 , x1 , x2 , x3 ) split phase

Slide 18

Slide 18 text

•  Transactions can operate on split and non-split records •  Rest of the records use OCC (y, z) •  OCC ensures serializability for the non-split parts of the transaction 18 core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) split phase

Slide 19

Slide 19 text

•  Split records have assigned operations for a given split phase •  Cannot correctly process a read of x in the current state •  Stash transaction to execute after reconciliation 19 core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) split phase INCR(x1 ,1) INCR(x2 ,1) INCR(x1 ,1) GET(x)

Slide 20

Slide 20 text

20 core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) PUT(y,2) INCR(x2 ,1) PUT(z,1) core 3 INCR(x3 ,1) PUT(y,2) split phase •  All threads hear they should reconcile their per-core state •  Stop processing per-core writes GET(x) INCR(x1 ,1) INCR(x2 ,1) INCR(x1 ,1)

Slide 21

Slide 21 text

•  Reconcile state to global store •  Wait until all threads have ﬁnished reconciliation •  Resume stashed read transactions in joined phase 21 core 0 core 1 core 2 core 3 reconciliation phase joined phase x = x + x0 x = x + x1 x = x + x2 x = x + x3 GET(x)

Slide 22

Slide 22 text

22 core 0 core 1 core 2 core 3 x = x + x0 x = x + x1 x = x + x2 x = x + x3 reconciliation phase GET(x) joined phase •  Reconcile state to global store •  Wait until all threads have ﬁnished reconciliation •  Resume stashed read transactions in joined phase

Slide 23

Slide 23 text

23 core 0 core 1 core 2 core 3 GET(x) •  Process new transactions in joined phase using OCC •  No split data joined phase INCR(x) INCR(x, 1) GET(x) GET(x)

Slide 24

Slide 24 text

Batching Amortizes the Cost of Reconciliation 24 core 0 core 1 core 2 INCR(x0 ,1) INCR(x1 ,1) INCR(y,2) INCR(x2 ,1) INCR(z,1) core 3 INCR(x3 ,1) INCR(y,2) GET(x) •  Wait to accumulate stashed transactions, batch for joined phase •  Amortize the cost of reconciliation over many transactions •  Reads would have conﬂicted; now they do not INCR(x1 ,1) INCR(x2 ,1) INCR(z,1) GET(x) GET(x) GET(x) GET(x) GET(x) split phase joined phase

Slide 25

Slide 25 text

Phase Reconciliation Summary •  Many contentious writes happen in parallel in split phases •  Reads and any other incompatible operations happen correctly in joined phases 25

Slide 26

Slide 26 text

Outline 1.  Phase reconciliation 2.  Operations 3.  Detecting contention 4.  Implementation 5.  Performance evaluation 26

Slide 27

Slide 27 text

Ordered PUT and insert to an ordered list Operation Model Developers write transactions as stored procedures which are composed of operations on keys and values: 27 value GET(k) void PUT(k,v) void INCR(k,n) void MAX(k,n) void MULT(k,n) void OPUT(k,v,o) void TOPK_INSERT(k,v,o) Traditional key/value operations Operations on numeric values which modify the existing value Not splittable Splittable

Slide 28

Slide 28 text

27 10 0 MAX Can Be Efﬁciently Reconciled 28 core 0 core 1 core 2 MAX(x0 ,55) MAX(x1 ,10) MAX(x2 ,21) 0 0 •  Each core keeps one piece of state xi •  O(#cores) time to reconcile x •  Result is compatible with any order 55 21 MAX(x0 ,2) MAX(x1 ,27) x = 55

Slide 29

Slide 29 text

What Operations Does Doppel Split? Properties of operations that Doppel can split: – Commutative – Can be efﬁciently reconciled – Single key – Have no return value However: – Only one operation per record per split phase 29

Slide 30

Slide 30 text

RestockTxn(x, y Key) { INCR(x) if GET(x) > GET(y) { INCR(y) } } RESTOCK Complicated Operations Can Also Commute 30 x y

Slide 31

Slide 31 text

1 RESTOCK Can Execute In Split Phase 31 core 0 core 1 core 2 RESTOCK(c0 ,x,y) RESTOCK(c1 ,x,y) RESTOCK(c2 ,x,y) 1 1 •  Each core keeps one piece of state ci , count of RESTOCK operations •  Must be the only operation happening on x and y •  Different merge function RESTOCK(c0 ,x,y) 2

Slide 32

Slide 32 text

RESTOCK’s Merge Function 32 RESTOCK-merge(ci int, x, y Key) { xval := GET(x) yval := GET(y) if xval < yval { PUT(x, xval + ci ) if yval < xval + ci { PUT(y, xval + ci ) } } else { PUT(x, xval + ci ) PUT(y, yval + ci ) } } INCR-merge(xi int, k Key) { val := GET(k) PUT(k, val + xi ) }

Slide 33

Slide 33 text

Outline 1.  Phase reconciliation 2.  Operations 3.  Detecting contention 4.  Implementation 5.  Performance evaluation 33

Slide 34

Slide 34 text

Which Records Does Doppel Split? •  Database starts out with no split data •  Count conflicts on records – Make key split if #conflicts > conflictThreshold •  Count stashes on records in the split phase – Move key back to non-split if #stashes too high 34

Slide 35

Slide 35 text

Outline 1.  Phase reconciliation 2.  Operations 3.  Detecting contention 4.  Implementation 5.  Performance evaluation 35

Slide 36

Slide 36 text

Implementation •  Doppel implemented as a multithreaded Go server; one worker thread per core •  Transactions are procedures written in Go 36

Slide 37

Slide 37 text

Interesting Roadblocks at 80 Cores •  Marking memory for GC slow – https://codereview.appspot.com/100230043 •  Memory allocation – Reduced, turned GC way down •  The Go scheduler sleeping/waking goroutines – Tight loop; try not to block or relinquish control •  Interfaces •  RPC serialization 37

Slide 38

Slide 38 text

Outline 1.  Phase reconciliation 2.  Operations 3.  Detecting contention 4.  Implementation 5.  Performance evaluation 38

Slide 39

Slide 39 text

Experimental Setup •  All experiments run on an 80 core Intel server running 64 bit Linux 3.12 with 256GB of RAM •  All data ﬁts in memory; don’t measure RPC •  All graphs measure throughput in transactions/sec 39

Slide 40

Slide 40 text

Performance Evaluation •  How much does Doppel improve throughput on contentious write-only workloads? •  What kinds of read/write workloads beneﬁt? •  Does Doppel improve throughput for a realistic application: RUBiS? 40

Slide 41

Slide 41 text

Doppel Executes Conﬂicting Workloads in Parallel Throughput (millions txns/sec) 20 cores, 1M 16 byte keys, transaction: INCR(x,1) all on same key 0 5 10 15 20 25 30 35 Doppel OCC 2PL 41

Slide 42

Slide 42 text

Doppel Outperforms OCC Even With Low Contention 0M 5M 10M 15M 20M 25M 30M 35M 0 20 40 60 80 100 Throughput (txns/sec) % of transactions with hot key Doppel OCC 42 20 cores, 1M 16 byte keys, transaction: INCR(x,1) on different keys 5% of writes to contended key

Slide 43

Slide 43 text

Comparison to Other Systems 43 20 cores, 1M 16 byte keys, transaction: INCR(x,1) on different keys 0M 5M 10M 15M 20M 25M 30M 35M 0 20 40 60 80 100 Throughput (txns/sec) % of transactions with hot key Doppel OCC 2PL Atomic Atomic Increments 2PL Atomic increments still happen serially

Slide 44

Slide 44 text

Contentious Workloads Scale Well 1M 16 byte keys, transaction: INCR(x,1) all writing same key 44 0M 10M 20M 30M 40M 50M 60M 70M 80M 90M 100M 0 10 20 30 40 50 60 70 80 Throughput (txns/sec) number of cores Doppel OCC Communication of phase changing

Slide 45

Slide 45 text

LIKE Benchmark •  Users liking pages on a social network •  2 tables: users, pages •  Two transactions: –  Increment page’s like count, insert user like of page –  Read a page’s like count, read user’s last like •  1M users, 1M pages, Zipﬁan distribution of page popularity Doppel splits the page-like-counts for popular pages But those counts are also read more often 45

Slide 46

Slide 46 text

Beneﬁts Even When There Are Reads and Writes to the Same Popular Keys 46 0 1 2 3 4 5 6 7 8 9 Doppel OCC Throughput (millions txns/sec) 20 cores, transactions: 50% LIKE read, 50% LIKE write

Slide 47

Slide 47 text

Doppel Outperforms OCC For A Wide Range of Read/Write Mixes 20 cores, transactions: LIKE read, LIKE write 47 0M 2M 4M 6M 8M 10M 12M 14M 16M 18M 0 20 40 60 80 100 Throughput (txns/sec) % of transactions that read Doppel OCC Doppel does not split any data and performs the same as OCC! More stashed read transactions

Slide 48

Slide 48 text

RUBiS •  Auction application modeled after eBay –  Users bid on auctions, comment, list new items, search •  1M users and 33K auctions •  7 tables, 17 transactions •  85% read only transactions (RUBiS bidding mix) •  Two workloads: –  Uniform distribution of bids –  Skewed distribution of bids; a few auctions are very popular 48

Slide 49

Slide 49 text

StoreBid Transaction StoreBidTxn(bidder, amount, item) { numBids := GET(NumBidsKey(item)) PUT(NumBidsKey(item), numBids + 1) maxBid := GET(MaxBidKey(item)) if amount > maxBid { PUT(MaxBidKey(item), amount) PUT(MaxBidderKey(item), bidder) } PUT(NewBidKey(), Bid{bidder, amount, item}) } 49

Slide 50

Slide 50 text

StoreBid Transaction StoreBidTxn(bidder, amount, item) { INCR(NumBidsKey(item),1) MAX(MaxBidKey(item), amount) OPUT(MaxBidderKey(item), bidder, amount) PUT(NewBidKey(), Bid{bidder, amount, item}) } All commutative operations on potentially conﬂicting auction metadata Inserting new bids is not likely to conﬂict 50

Slide 51

Slide 51 text

0 2 4 6 8 10 12 Uniform Skewed Doppel OCC Doppel Improves Throughput on an Application Benchmark 51 Throughput (millions txns/sec) 80 cores, 1M users 33K auctions, RUBiS bidding mix 8% StoreBid Transactions 3.2x throughput improvement

Slide 52

Slide 52 text

Related Work •  Concurrency control –  Commutative concurrency control [Weihl ’88] –  Escrow transactions [O’Neil ’86] –  OCC [Kung ’81] –  Silo [Tu ’13] •  Commutativity in distributed systems –  CRDTs [Shapiro ’11] –  RedBlue consistency [Li ’12] –  Walter [Lloyd ’12] •  Scalable datastructures in multicore Oses (counters, memory allocator) 52

Slide 53

Slide 53 text

Future Work •  Do per-key phases more perform better? •  How could we use phases with distributed transactions? •  What other types of commutative operations can we add? – User-deﬁned operations – State and argument based commutativity •  INCR(k, 0) MULT(k, 1) 53

Slide 54

Slide 54 text

Conclusion Doppel: •  Achieves serializability and parallel performance when many transactions conﬂict by combining per-core data, commutative operations, and concurrency control •  Performs comparably to OCC on uniform or read-heavy workloads while improving performance signiﬁcantly on skewed workloads. 54 http://pdos.csail.mit.edu/doppel @neha