Joseph Blomstedt (@jtuple)Basho TechnologiesBringing Consistency To Riak (Part 2)Tuesday, October 29, 13
View Slide
CAP Theorem2Tuesday, October 29, 13
3Partition-toleranceConsistencyAvailabilityTuesday, October 29, 13
4Partition-toleranceConsistency AvailabilityTuesday, October 29, 13
5Partition-toleranceConsistency AvailabilityCP APTuesday, October 29, 13
6Partition-toleranceConsistency AvailabilityCP APTuesday, October 29, 13
7Partition-toleranceConsistency AvailabilityCP APTuesday, October 29, 13
8C/PStrict Quorum A/PSloppy Quorum A/PTuesday, October 29, 13
9C/PStrict Quorum A/PSloppy Quorum A/PTuesday, October 29, 13
10Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
11Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
12Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
13Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
14C/PStrict Quorum A/PSloppy Quorum A/PTuesday, October 29, 13
15Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
16Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
17Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
18Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
19C/PStrict Quorum A/PSloppy Quorum A/PTuesday, October 29, 13
20Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
21Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
22Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
23Node 1 Node 2 Node 3 Node 4 Node 5client clientclientTuesday, October 29, 13
24Node 1 Node 2 Node 3 Node 4 Node 5client clientclient client clientTuesday, October 29, 13
Eventual Consistency25Tuesday, October 29, 13
26A A ATuesday, October 29, 13
27A A ATuesday, October 29, 13
28A A ABTuesday, October 29, 13
29A A ABTuesday, October 29, 13
30A A ABB B BTuesday, October 29, 13
31A A ATuesday, October 29, 13
32A A AB CTuesday, October 29, 13
33A A AB CTuesday, October 29, 13
34A A AB{B,C} {B,C} {B,C}CTuesday, October 29, 13
35Write OnceImmutableLast Write WinsBusiness RulesCRDTs/MonotonicityTuesday, October 29, 13
36Write OnceImmutableLast Write WinsBusiness RulesCRDTs/MonotonicityTuesday, October 29, 13
37Write OnceImmutableLast Write WinsBusiness RulesCRDTs/MonotonicityTuesday, October 29, 13
38Write OnceImmutableLast Write WinsBusiness RulesCRDTs/MonotonicityTuesday, October 29, 13
39Write OnceImmutableLast Write WinsBusiness RulesCRDTs/MonotonicityTuesday, October 29, 13
40Write OnceImmutableLast Write WinsBusiness RulesCRDTs/MonotonicityTuesday, October 29, 13
Strong Consistency41Tuesday, October 29, 13
Strong Consistency42Why?Tuesday, October 29, 13
Strong Consistency43RecencyTuesday, October 29, 13
Strong Consistency44RecencyPartial WritesTuesday, October 29, 13
Strong Consistency45RecencyPartial WritesAtomicityTuesday, October 29, 13
46RecencyPartial WritesAtomicityTuesday, October 29, 13
47RecencyPartial WritesAtomicityTuesday, October 29, 13
48Eventual consistencyis greatTuesday, October 29, 13
49But, when is eventual?Tuesday, October 29, 13
50Do I have themost recentvalue?Tuesday, October 29, 13
51CRDTs don’t helpTuesday, October 29, 13
52(a,1) (a,1) (a,1)=1Tuesday, October 29, 13
53(a,1) (a,1) (a,1)Tuesday, October 29, 13
54(a,1)+1 +3(a,1) (a,1)(a,2) (a,1),(b,3)=2 =4Tuesday, October 29, 13
55(a,1)+1 +3(a,1) (a,1)(a,2) (a,1),(b,3)Tuesday, October 29, 13
56(a,1)+1 +3(a,1) (a,1)(a,2) (a,1),(b,3)(a,2),(b,3) (a,2),(b,3) (a,2),(b,3)Tuesday, October 29, 13
57(a,1)+1 +3(a,1) (a,1)(a,2) (a,1),(b,3)(a,2),(b,3) (a,2),(b,3) (a,2),(b,3)=5Tuesday, October 29, 13
58(a,1)+1 +3(a,1) (a,1)(a,2) (a,1),(b,3)=2 =4Tuesday, October 29, 13
59RecencyPartial WritesAtomicityTuesday, October 29, 13
60Awrite B (fail)A AB A ATuesday, October 29, 13
61B A ATuesday, October 29, 13
62B A Aread Aread Aread ATuesday, October 29, 13
63B A Aread Aread Aread ATuesday, October 29, 13
64B A Aread Aread Aread Aread BTuesday, October 29, 13
65RecencyPartial WritesAtomicityTuesday, October 29, 13
Strong Consistency66Tuesday, October 29, 13
Strong Consistency67What doesmean for Riak 2.0?Tuesday, October 29, 13
68Conditionalsingle keyatomic operationsTuesday, October 29, 13
69No siblingsTuesday, October 29, 13
70get seesmost recent putTuesday, October 29, 13
71get/modify/putfails if object changedTuesday, October 29, 13
72get/modify/putfails if object changed(eg. concurrent put)Tuesday, October 29, 13
73puts w/o vclockfails if object existsTuesday, October 29, 13
74partial writesresolved on readTuesday, October 29, 13
75ConsensusTuesday, October 29, 13
76PaxosTuesday, October 29, 13
771RGH 1RGH 1RGH 1SUHSDUH1SURPLVH1 9%SURPLVH1 9&91I9$ 9% 9&FRPPLW1 91DFFHSW1Tuesday, October 29, 13
78Rinse/repeat foreach requestTuesday, October 29, 13
792 round trips/requestTuesday, October 29, 13
80Multi-PaxosTuesday, October 29, 13
81First RequestTuesday, October 29, 13
821RGH 1RGH 1RGH 1 , SUHSDUH1 ,SURPLVH1 , 9%SURPLVH1 , 9&91I9$ 9% 9&FRPPLW1 , 91DFFHSW1 ,Tuesday, October 29, 13
83Each Additional RequestTuesday, October 29, 13
841RGH 1RGH 1RGH ,FRPPLW1 , 9DFFHSW1 ,Tuesday, October 29, 13
851 round trip/request(common case)Tuesday, October 29, 13
86ProblemShipping entire stateeach request isexpensiveTuesday, October 29, 13
87SolutionPaxos+Replicated LogTuesday, October 29, 13
88ProblemNow I haveN problemsTuesday, October 29, 13
89Log recoveryLog trimmingRollupSnapshotsFault RecoveryTuesday, October 29, 13
90Choose your ownadventure...Tuesday, October 29, 13
91Better SolutionBuild log replicationinto protocolTuesday, October 29, 13
92Better SolutionZK Atomic BroadcastRaftTuesday, October 29, 13
Zab93Tuesday, October 29, 13
94Tuesday, October 29, 13
95Tuesday, October 29, 13
96Tuesday, October 29, 13
97Tuesday, October 29, 13
Raft98Tuesday, October 29, 13
99Tuesday, October 29, 13
100raftconsensus.github.ioTuesday, October 29, 13
101TextTuesday, October 29, 13
Back to Riak102Tuesday, October 29, 13
103Key/ValueKeys are independentActive Anti-EntropyTunable backendsTuesday, October 29, 13
104Each key isindependent stateTuesday, October 29, 13
105Simple multi-paxosper keyTuesday, October 29, 13
1061B keys=1B consensus groups?Tuesday, October 29, 13
107NoTuesday, October 29, 13
108Consensus groupper preflist (replica set)Tuesday, October 29, 13
109Emulate paxos per keyTuesday, October 29, 13
Node 0Node 1Node 2Tuesday, October 29, 13
1111 234567123Tuesday, October 29, 13
1121 234567123234Tuesday, October 29, 13
1131 234567123234345Tuesday, October 29, 13
1141 234567123234345456Tuesday, October 29, 13
1151 234567123234345456567...Tuesday, October 29, 13
1161 234567123234345456567Ensembles...Tuesday, October 29, 13
11764 partition ring=64 ensemblesTuesday, October 29, 13
118Each EnsembleElects leaderEstablishes epochSupports get/put opsTuesday, October 29, 13
119Establish a new epochTuesday, October 29, 13
1201RGH 1RGH 1RGH 1 , SUHSDUH1 ,SURPLVH1 , 9%SURPLVH1 , 9&91I9$ 9% 9&FRPPLW1 , 91DFFHSW1 ,Tuesday, October 29, 13
121consensus stateepochsequencemembershipleaderTuesday, October 29, 13
122K/V objectsepochsequencekeyvalueTuesday, October 29, 13
123GETleader reads local objectif obj.epoch old: refreshreply w/ valTuesday, October 29, 13
1241RGH 1RGH 1RGH REMHSRFK HSRFKJHW.H\UHSO\(SRFK% 6HT% 9DO%UHSO\(SRFK& 6HT& 9DO&9DO ODWHVW9DO$ 9DO% 9DO&9DOHSRFK HSRFKZULWH(SRFK 6HT 9DODFN(SRFK 6HTTuesday, October 29, 13
1251RGH 1RGH 1RGH REMHSRFK HSRFK5HSO\ ORFDOBJHW.H\Tuesday, October 29, 13
1262 roundtrips/get (worst)0 roundtrip/get (best)Tuesday, October 29, 13
127PUTleader reads local objectif obj.epoch old: refreshif modify(obj) false: failcommit modified objreply okTuesday, October 29, 13
1281RGH 1RGH 1RGH REMHSRFK HSRFKJHW.H\UHSO\(SRFK% 6HT% 9DO%UHSO\(SRFK& 6HT& 9DO&/DWHVW ODWHVW9DO$ 9DO% 9DO&9DO PRGLI\/DWHVWZULWH(SRFK 6HT 9DODFN(SRFK 6HTTuesday, October 29, 13
1291RGH 1RGH 1RGH REMHSRFK HSRFK/DWHVW ORFDOBJHW.H\9DO PRGLI\/DWHVWZULWH(SRFK 6HT 9DODFN(SRFK 6HTTuesday, October 29, 13
1302 roundtrips/put (worst)1 roundtrip/put (best)Tuesday, October 29, 13
131Leader abandonsleadership if any quorumoperation ever failsTuesday, October 29, 13
132Which forces new epochto be establishedTuesday, October 29, 13
133Partial WritesTuesday, October 29, 13
failed partial writeX(2)X(2)X(2)X(2)X(2)Y(2)epoch2epoch3Tuesday, October 29, 13
read / rewrite / reply XX(2)X(2)Y(2)X(3)X(3)Y(2)epoch3epoch3Tuesday, October 29, 13
X(3)X(3)Y(2)X(3)X(3)X(3)read / repair / reply Xepoch3epoch3Tuesday, October 29, 13
Usage137Tuesday, October 29, 13
138AP or CP per bucket typeTuesday, October 29, 13
139consistent = trueTuesday, October 29, 13
140$ riak-admin bucket-type create strong \'{"props": {"consistent": true}}'strong createdTuesday, October 29, 13
141$ riak-admin bucket-type activate strongstrong has been activatedTuesday, October 29, 13
142> riakc_pb_socket:get(Socket,{<<"strong">>, <<"bucket">>},<<"key">>).{error,notfound}Tuesday, October 29, 13
143> Obj = riakc_obj:new({<<"strong">>, <<"bucket">>},<<"key">>,<<"1">>)).> riakc_pb_socket:put(Socket, Obj).okTuesday, October 29, 13
144> Obj2 = riakc_obj:new({<<"strong">>, <<"bucket">>},<<"key">>,<<"2">>)).> riakc_pb_socket:put(Socket, Obj2).{error, failed}Tuesday, October 29, 13
145{ok, Obj3} =riakc_pb_socket:get(Socket,{<<"strong">>, <<"bucket">>},<<"key">>).Tuesday, October 29, 13
146Obj4 = riakc_obj:update_value(Obj3, <<"2">>).Tuesday, October 29, 13
147Obj5 = riakc_obj:update_value(Obj3, <<"22">>).Tuesday, October 29, 13
148> riakc_pb_socket:put(Socket, Obj4).okTuesday, October 29, 13
149> riakc_pb_socket:put(Socket, Obj5).{error,<<"failed">>}Tuesday, October 29, 13
150Your client may varyTuesday, October 29, 13
151Your client may varyWe’re working on itTuesday, October 29, 13
Tech Preview152Tuesday, October 29, 13
153No AAE syncingNo 2iNo statsTuesday, October 29, 13
154Will be in 2.0 finalTuesday, October 29, 13
Coming Soon155Tuesday, October 29, 13
156DatatypesMulti-DCLightweight Tx?Perf benchmarksTuesday, October 29, 13
157DatatypesMulti-DCLightweight Tx?Perf benchmarksTuesday, October 29, 13
158DatatypesMulti-DCLightweight Tx?Perf benchmarksTuesday, October 29, 13
159DatatypesMulti-DCLightweight Tx?Perf benchmarksTuesday, October 29, 13
160DatatypesMulti-DCLightweight Tx?Perf benchmarksTuesday, October 29, 13
Questions?161Tuesday, October 29, 13