Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Consistency and Riak
Search
Christopher Meiklejohn
February 25, 2015
Programming
0
140
Consistency and Riak
Christopher Meiklejohn
Riak Meetup, Paris, February 2015
Christopher Meiklejohn
February 25, 2015
Tweet
Share
More Decks by Christopher Meiklejohn
See All by Christopher Meiklejohn
Towards a Solution to the Red Wedding Problem
cmeiklejohn
0
390
Language Support for Cloud-Scale Distributed Systems
cmeiklejohn
0
530
Towards a Systems Approach to Distributed Programming
cmeiklejohn
3
340
Scaling a Startup with a 21st Century Programming Language
cmeiklejohn
0
410
Practical Evaluation of the Lasp Programming Model at Scale
cmeiklejohn
4
2.7k
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
260
Declarative, Convergent, Edge Computation
cmeiklejohn
1
200
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
1.3k
A Certain Tendency of the Database Community
cmeiklejohn
0
460
Other Decks in Programming
See All in Programming
Kotlin エンジニアへ送る:Swift 案件に参加させられる日に備えて~似てるけど色々違う Swift の仕様 / from Kotlin to Swift
lovee
1
260
プロダクト志向ってなんなんだろうね
righttouch
PRO
0
160
設計やレビューに悩んでいるPHPerに贈る、クリーンなオブジェクト設計の指針たち
panda_program
6
1.4k
CursorはMCPを使った方が良いぞ
taigakono
1
180
エンジニア向け採用ピッチ資料
inusan
0
160
High-Level Programming Languages in AI Era -Human Thought and Mind-
hayat01sh1da
PRO
0
300
ニーリーにおけるプロダクトエンジニア
nealle
0
490
明示と暗黙 ー PHPとGoの インターフェイスの違いを知る
shimabox
2
320
技術同人誌をMCP Serverにしてみた
74th
1
360
童醫院敏捷轉型的實踐經驗
cclai999
0
190
20250628_非エンジニアがバイブコーディングしてみた
ponponmikankan
0
430
PostgreSQLのRow Level SecurityをPHPのORMで扱う Eloquent vs Doctrine #phpcon #track2
77web
2
340
Featured
See All Featured
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
138
34k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.2k
Improving Core Web Vitals using Speculation Rules API
sergeychernyshev
17
940
A Modern Web Designer's Workflow
chriscoyier
694
190k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
30
2.1k
A Tale of Four Properties
chriscoyier
160
23k
Gamification - CAS2011
davidbonilla
81
5.3k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Transcript
Consistency and Riak Christopher Meiklejohn Riak Meetup, Paris, February 2015
@cmeik
History
Published SOSP 2007; key-value storage system Amazon Dynamo
Focused on high-availability and low-latency Amazon Dynamo
Collection of distributed systems techniques Amazon Dynamo
LinkedIn Voldemort, Facebook Cassandra Amazon Dynamo
Released 2009; Apache2 licensed Dynamo clone Basho Riak
Riak Architecture
Consistent Hashing hash(bucket/key)
hash ring
tokenize it
node 0 node 1 node 2 hash(key)
node 0 node 1 node 2 Replicas are stored to
the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2
Scaling out node 0 node 1 node 2 node 3
+
Quorum requests N R W PR/PW DW
Vector Clocks establish temporality
None
None
Anatomy of a Request get(users/clay-davis)
Anatomy of a Request get(users/clay-davis) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
get(users/clay-davis) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) obj
Read Repair (Anti-Entropy)
replica replica replica
replica replica replica X
replica replica replica replica replica replica
Active Anti-Entropy (self healing clusters)
real-time updates persistent non-blocking disk-based
merkle tree to track changes coordinated at the vnode level
runs as a background process exchange with neighbor vnodes for inconsistencies resolution semantics: trigger read-repair
= hashes marked dirty
None
None
None
None
= keys to read-repair
Riak and Consistency
Riak Object
BKey Value
Consistent hashing; dynamic membership Data Placement
None
None
None
Replication per-value across ring Data Placement
Replica Replica Replica
Take the form: {Writer, Value, Time} Concurrent writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] Concurrent
writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [{b,
v1, t2}] [{b, v1, t2}] [{b, v1, t2}] Last Writer Wins
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [[{a,
v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] Allow Mult
User specificed Merge
Two Approaches
Strong Eventual Consistency
Designed for convergence; allows divergence Conflict-free Replicated Data Types
Solves the Dynamo concurrency anomaly Conflict-free Replicated Data Types
The Theory
None
None
None
None
Two flavors: state-based and operation-based Conflict-free Replicated Data Types
Counters, Flags, Registers, Sets, Maps, Graphs Conflict-free Replicated Data Types
Broadcast update operation Operation-Based CRDTs
Commutative; relies on unique delivery Operation-Based CRDTs
Apply change locally; propagate entire state State-Based CRDTs
State is merged between replicas State-Based CRDTs
Set of all states form a bounded join-semilattice State-Based CRDTs
Partially ordered set; join operation Bounded Join-Semilattice
Associativity: (X · Y) · Z = X · (Y
· Z) Bounded Join-Semilattice
Commutativity: X · Y = Y · X Bounded Join-Semilattice
Idempotence: X · X = X Bounded Join-Semilattice
Examples Bounded Join-Semilattice
b a c a, b a, c a, b, c
Set; merge function: union. b, c
3 5 7 5 7 7 Increasing natural; merge function:
max.
F F T F T T Booleans; merge function: or.
x <= y montone f(x) <= f(y)
Examples State-Based Observed-Remove Set
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, b}], [{1, a}] ]
Strong Consistency
Provides atomicity and recency Strong Consistency
Prohibits partial writes Strong Consistency
A A A
A A A Val = B
A A A Val = B
B A A
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair B B
Single key atomic operations Strong Consistency
Requires read/modify/write cycle (CAS) Strong Consistency
Consensus
Distributed Consensus The problem of reaching agreement among remote processes
is one of the most fundamental problems in distributed computing and is at the core of many algorithms for distributed data processing, distributed file management, and fault-tolerant distributed applications. Fischer, Lynch, Paterson
Termination, agreement, validity The Consensus Problem
All processes eventually decide on a value Termination
All processes decide on the same value Agreement
Value decided on had to have been proposed Validity
Consensus Algorithms
Paxos, ZAB, Raft, etc. Consensus Algorithms
Coordinated requests with a chosen leader The Paxos Algorithm
Node 1 Node 2 Node 3 N++ prepare(N) promise(N, Vb)
promise(N, Vc) Vn = f(Va, Vb, Vc) commit(N, Vn) accept(N)
First request Multi-Paxos
Node 1 Node 2 Node 3 N++; I = 0
prepare(N, I) promise(N, I, Vb) promise(N, I, Vc) Vn = f(Va, Vb, Vc) commit(N, I, Vn) accept(N, I)
Each additional request Multi-Paxos
Node 1 Node 2 Node 3 I++ commit(N, I, V)
accept(N, I)
Ship entire state! Multi-Paxos
Riak
Key-value store; keys are independent state Riak
Multi-Paxos per key; CAS on isolated state Riak
Consensus Groups
Participants in decisioning; ensembles Consensus Groups
Use the preference list! Consensus Groups
preflist
None
None
None
None
One ensemble per preference list; ring size Consensus Groups
Ensembles
election of leader; get/put operations Riak Ensembles
read local; refresh, if old Get Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Val = latest(Vala, Valb, Valc) Val.epoch = epoch write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Reply
= local_get(Key)
Worst Case: 2 roundtrips / write Get Operations Best Case:
0 roundtrips / write
read local; refresh, modify and commit if old Put Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Latest = latest(Vala, Valb, Valc) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Latest
= local_get(Key) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq)
Worst Case: 2 roundtrips / write Put Operations Best Case:
1 roundtrips / write
Elect a new leader; start a new epoch Failed Quorums
Cluster Membership
Use joint consensus from multi paxos Dynamic Membership
Existing Ensemble Joining Ensemble riak_01 riak_02 riak_03 riak_07 riak_08 riak_09
[{riak_01}, {riak_02}, {riak_03}] [{riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
New Ensemble riak_07 riak_08 riak_09 [{riak_07}, {riak_08}, {riak_09}]
Single-key linearizability; reduced availability Strong Consistency
$ riak-admin bucket-type create strongly_consistent \ ‘{"props":{"consistent":true}}' $ riak-admin bucket-type
status strongly_consistent $ riak-admin bucket-type activate strongly_consistent Enable strong consistency; http://docs.basho.com/riak/latest/dev/advanced/strong-consistency/
Conflict-Free Replicated Data Types Strong Eventual Consistency
$ riak-admin bucket-type create maps \ '{"props":{"datatype":"map"}}' $ riak-admin bucket-type
create sets \ '{"props":{"datatype":"set"}}' $ riak-admin bucket-type create counters \ ‘{“props":{"datatype":"counter"}}' $ riak-admin bucket-type status maps $ riak-admin bucket-type activate maps Create bucket type for data types; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets \ -H "Content-Type: application/json" \
-d '{"increment": 1}’ $ curl http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets Operate on counters; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \
-d '{"add_all":["Toronto", “Montreal"]}' $ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \ -d '{"remove": “Montreal"}' $ curl http://localhost:10018/types/sets/buckets/travel/datatypes/ cities Operate on sets; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \
-d ' { "update": { "first_name_register": "Ahmed", "phone_number_register": "5551234567" } }' $ curl -XPOST http://localhost:8098/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \ -d ' { "update": { "annika_info_map": { "update": { "interests_set": { "add": "tango dancing" } } } } } ' Operate on maps; http://docs.basho.com/riak/latest/dev/using/data-types/
Questions?