$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Consistency and Riak
Search
Christopher Meiklejohn
February 25, 2015
Programming
0
140
Consistency and Riak
Christopher Meiklejohn
Riak Meetup, Paris, February 2015
Christopher Meiklejohn
February 25, 2015
Tweet
Share
More Decks by Christopher Meiklejohn
See All by Christopher Meiklejohn
Towards a Solution to the Red Wedding Problem
cmeiklejohn
0
410
Language Support for Cloud-Scale Distributed Systems
cmeiklejohn
0
550
Towards a Systems Approach to Distributed Programming
cmeiklejohn
3
370
Scaling a Startup with a 21st Century Programming Language
cmeiklejohn
0
420
Practical Evaluation of the Lasp Programming Model at Scale
cmeiklejohn
4
2.8k
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
280
Declarative, Convergent, Edge Computation
cmeiklejohn
1
210
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
1.3k
A Certain Tendency of the Database Community
cmeiklejohn
0
470
Other Decks in Programming
See All in Programming
S3 VectorsとStrands Agentsを利用したAgentic RAGシステムの構築
tosuri13
6
310
CSC509 Lecture 14
javiergs
PRO
0
220
【Streamlit x Snowflake】データ基盤からアプリ開発・AI活用まで、すべてをSnowflake内で実現
ayumu_yamaguchi
1
120
TestingOsaka6_Ozono
o3
0
130
ViewファーストなRailsアプリ開発のたのしさ
sugiwe
0
450
JETLS.jl ─ A New Language Server for Julia
abap34
1
370
ゲームの物理 剛体編
fadis
0
330
tparseでgo testの出力を見やすくする
utgwkk
1
210
堅牢なフロントエンドテスト基盤を構築するために行った取り組み
shogo4131
8
2.3k
社内オペレーション改善のためのTypeScript / TSKaigi Hokuriku 2025
dachi023
2
590
「コードは上から下へ読むのが一番」と思った時に、思い出してほしい話
panda728
PRO
38
25k
モデル駆動設計をやってみようワークショップ開催報告(Modeling Forum2025) / model driven design workshop report
haru860
0
260
Featured
See All Featured
The Language of Interfaces
destraynor
162
25k
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
1
97
Build your cross-platform service in a week with App Engine
jlugia
234
18k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.8k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
54k
Code Review Best Practice
trishagee
74
19k
Speed Design
sergeychernyshev
33
1.4k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
BBQ
matthewcrist
89
9.9k
Automating Front-end Workflow
addyosmani
1371
200k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
35
2.3k
Stop Working from a Prison Cell
hatefulcrawdad
273
21k
Transcript
Consistency and Riak Christopher Meiklejohn Riak Meetup, Paris, February 2015
@cmeik
History
Published SOSP 2007; key-value storage system Amazon Dynamo
Focused on high-availability and low-latency Amazon Dynamo
Collection of distributed systems techniques Amazon Dynamo
LinkedIn Voldemort, Facebook Cassandra Amazon Dynamo
Released 2009; Apache2 licensed Dynamo clone Basho Riak
Riak Architecture
Consistent Hashing hash(bucket/key)
hash ring
tokenize it
node 0 node 1 node 2 hash(key)
node 0 node 1 node 2 Replicas are stored to
the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2
Scaling out node 0 node 1 node 2 node 3
+
Quorum requests N R W PR/PW DW
Vector Clocks establish temporality
None
None
Anatomy of a Request get(users/clay-davis)
Anatomy of a Request get(users/clay-davis) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
get(users/clay-davis) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) obj
Read Repair (Anti-Entropy)
replica replica replica
replica replica replica X
replica replica replica replica replica replica
Active Anti-Entropy (self healing clusters)
real-time updates persistent non-blocking disk-based
merkle tree to track changes coordinated at the vnode level
runs as a background process exchange with neighbor vnodes for inconsistencies resolution semantics: trigger read-repair
= hashes marked dirty
None
None
None
None
= keys to read-repair
Riak and Consistency
Riak Object
BKey Value
Consistent hashing; dynamic membership Data Placement
None
None
None
Replication per-value across ring Data Placement
Replica Replica Replica
Take the form: {Writer, Value, Time} Concurrent writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] Concurrent
writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [{b,
v1, t2}] [{b, v1, t2}] [{b, v1, t2}] Last Writer Wins
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [[{a,
v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] Allow Mult
User specificed Merge
Two Approaches
Strong Eventual Consistency
Designed for convergence; allows divergence Conflict-free Replicated Data Types
Solves the Dynamo concurrency anomaly Conflict-free Replicated Data Types
The Theory
None
None
None
None
Two flavors: state-based and operation-based Conflict-free Replicated Data Types
Counters, Flags, Registers, Sets, Maps, Graphs Conflict-free Replicated Data Types
Broadcast update operation Operation-Based CRDTs
Commutative; relies on unique delivery Operation-Based CRDTs
Apply change locally; propagate entire state State-Based CRDTs
State is merged between replicas State-Based CRDTs
Set of all states form a bounded join-semilattice State-Based CRDTs
Partially ordered set; join operation Bounded Join-Semilattice
Associativity: (X · Y) · Z = X · (Y
· Z) Bounded Join-Semilattice
Commutativity: X · Y = Y · X Bounded Join-Semilattice
Idempotence: X · X = X Bounded Join-Semilattice
Examples Bounded Join-Semilattice
b a c a, b a, c a, b, c
Set; merge function: union. b, c
3 5 7 5 7 7 Increasing natural; merge function:
max.
F F T F T T Booleans; merge function: or.
x <= y montone f(x) <= f(y)
Examples State-Based Observed-Remove Set
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, b}], [{1, a}] ]
Strong Consistency
Provides atomicity and recency Strong Consistency
Prohibits partial writes Strong Consistency
A A A
A A A Val = B
A A A Val = B
B A A
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair B B
Single key atomic operations Strong Consistency
Requires read/modify/write cycle (CAS) Strong Consistency
Consensus
Distributed Consensus The problem of reaching agreement among remote processes
is one of the most fundamental problems in distributed computing and is at the core of many algorithms for distributed data processing, distributed file management, and fault-tolerant distributed applications. Fischer, Lynch, Paterson
Termination, agreement, validity The Consensus Problem
All processes eventually decide on a value Termination
All processes decide on the same value Agreement
Value decided on had to have been proposed Validity
Consensus Algorithms
Paxos, ZAB, Raft, etc. Consensus Algorithms
Coordinated requests with a chosen leader The Paxos Algorithm
Node 1 Node 2 Node 3 N++ prepare(N) promise(N, Vb)
promise(N, Vc) Vn = f(Va, Vb, Vc) commit(N, Vn) accept(N)
First request Multi-Paxos
Node 1 Node 2 Node 3 N++; I = 0
prepare(N, I) promise(N, I, Vb) promise(N, I, Vc) Vn = f(Va, Vb, Vc) commit(N, I, Vn) accept(N, I)
Each additional request Multi-Paxos
Node 1 Node 2 Node 3 I++ commit(N, I, V)
accept(N, I)
Ship entire state! Multi-Paxos
Riak
Key-value store; keys are independent state Riak
Multi-Paxos per key; CAS on isolated state Riak
Consensus Groups
Participants in decisioning; ensembles Consensus Groups
Use the preference list! Consensus Groups
preflist
None
None
None
None
One ensemble per preference list; ring size Consensus Groups
Ensembles
election of leader; get/put operations Riak Ensembles
read local; refresh, if old Get Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Val = latest(Vala, Valb, Valc) Val.epoch = epoch write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Reply
= local_get(Key)
Worst Case: 2 roundtrips / write Get Operations Best Case:
0 roundtrips / write
read local; refresh, modify and commit if old Put Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Latest = latest(Vala, Valb, Valc) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Latest
= local_get(Key) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq)
Worst Case: 2 roundtrips / write Put Operations Best Case:
1 roundtrips / write
Elect a new leader; start a new epoch Failed Quorums
Cluster Membership
Use joint consensus from multi paxos Dynamic Membership
Existing Ensemble Joining Ensemble riak_01 riak_02 riak_03 riak_07 riak_08 riak_09
[{riak_01}, {riak_02}, {riak_03}] [{riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
New Ensemble riak_07 riak_08 riak_09 [{riak_07}, {riak_08}, {riak_09}]
Single-key linearizability; reduced availability Strong Consistency
$ riak-admin bucket-type create strongly_consistent \ ‘{"props":{"consistent":true}}' $ riak-admin bucket-type
status strongly_consistent $ riak-admin bucket-type activate strongly_consistent Enable strong consistency; http://docs.basho.com/riak/latest/dev/advanced/strong-consistency/
Conflict-Free Replicated Data Types Strong Eventual Consistency
$ riak-admin bucket-type create maps \ '{"props":{"datatype":"map"}}' $ riak-admin bucket-type
create sets \ '{"props":{"datatype":"set"}}' $ riak-admin bucket-type create counters \ ‘{“props":{"datatype":"counter"}}' $ riak-admin bucket-type status maps $ riak-admin bucket-type activate maps Create bucket type for data types; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets \ -H "Content-Type: application/json" \
-d '{"increment": 1}’ $ curl http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets Operate on counters; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \
-d '{"add_all":["Toronto", “Montreal"]}' $ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \ -d '{"remove": “Montreal"}' $ curl http://localhost:10018/types/sets/buckets/travel/datatypes/ cities Operate on sets; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \
-d ' { "update": { "first_name_register": "Ahmed", "phone_number_register": "5551234567" } }' $ curl -XPOST http://localhost:8098/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \ -d ' { "update": { "annika_info_map": { "update": { "interests_set": { "add": "tango dancing" } } } } } ' Operate on maps; http://docs.basho.com/riak/latest/dev/using/data-types/
Questions?