Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Consistency and Riak
Search
Christopher Meiklejohn
February 25, 2015
Programming
0
130
Consistency and Riak
Christopher Meiklejohn
Riak Meetup, Paris, February 2015
Christopher Meiklejohn
February 25, 2015
Tweet
Share
More Decks by Christopher Meiklejohn
See All by Christopher Meiklejohn
Towards a Solution to the Red Wedding Problem
cmeiklejohn
0
380
Language Support for Cloud-Scale Distributed Systems
cmeiklejohn
0
500
Towards a Systems Approach to Distributed Programming
cmeiklejohn
3
320
Scaling a Startup with a 21st Century Programming Language
cmeiklejohn
0
390
Practical Evaluation of the Lasp Programming Model at Scale
cmeiklejohn
4
2.7k
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
250
Declarative, Convergent, Edge Computation
cmeiklejohn
1
190
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
1.3k
A Certain Tendency of the Database Community
cmeiklejohn
0
440
Other Decks in Programming
See All in Programming
一休.com のログイン体験を支える技術 〜Web Components x Vue.js 活用事例と最適化について〜
atsumim
0
770
XStateを用いた堅牢なReact Components設計~複雑なClient Stateをシンプルに~ @React Tokyo ミートアップ #2
kfurusho
1
950
密集、ドキュメントのコロケーション with AWS Lambda
satoshi256kbyte
1
210
Domain-Driven Transformation
hschwentner
2
1.9k
AI Agent系IDEを使って 開発生産性を爆アゲする
ouchi2501
1
100
Boost Performance and Developer Productivity with Jakarta EE 11
ivargrimstad
0
620
Flutter × Firebase Genkit で加速する生成 AI アプリ開発
coborinai
0
170
Rails アプリ地図考 Flush Cut
makicamel
1
130
Java Webフレームワークの現状 / java web framework at burikaigi
kishida
9
2.2k
負債になりにくいCSSをデザイナとつくるには?
fsubal
10
2.5k
『品質』という言葉が嫌いな理由
korimu
0
180
GoとPHPのインターフェイスの違い
shimabox
2
200
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
49
2.3k
How GitHub (no longer) Works
holman
314
140k
A Modern Web Designer's Workflow
chriscoyier
693
190k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Facilitating Awesome Meetings
lara
52
6.2k
Why You Should Never Use an ORM
jnunemaker
PRO
55
9.2k
The Language of Interfaces
destraynor
156
24k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
What’s in a name? Adding method to the madness
productmarketing
PRO
22
3.3k
Reflections from 52 weeks, 52 projects
jeffersonlam
348
20k
Fontdeck: Realign not Redesign
paulrobertlloyd
83
5.4k
Transcript
Consistency and Riak Christopher Meiklejohn Riak Meetup, Paris, February 2015
@cmeik
History
Published SOSP 2007; key-value storage system Amazon Dynamo
Focused on high-availability and low-latency Amazon Dynamo
Collection of distributed systems techniques Amazon Dynamo
LinkedIn Voldemort, Facebook Cassandra Amazon Dynamo
Released 2009; Apache2 licensed Dynamo clone Basho Riak
Riak Architecture
Consistent Hashing hash(bucket/key)
hash ring
tokenize it
node 0 node 1 node 2 hash(key)
node 0 node 1 node 2 Replicas are stored to
the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2
Scaling out node 0 node 1 node 2 node 3
+
Quorum requests N R W PR/PW DW
Vector Clocks establish temporality
None
None
Anatomy of a Request get(users/clay-davis)
Anatomy of a Request get(users/clay-davis) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
get(users/clay-davis) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) obj
Read Repair (Anti-Entropy)
replica replica replica
replica replica replica X
replica replica replica replica replica replica
Active Anti-Entropy (self healing clusters)
real-time updates persistent non-blocking disk-based
merkle tree to track changes coordinated at the vnode level
runs as a background process exchange with neighbor vnodes for inconsistencies resolution semantics: trigger read-repair
= hashes marked dirty
None
None
None
None
= keys to read-repair
Riak and Consistency
Riak Object
BKey Value
Consistent hashing; dynamic membership Data Placement
None
None
None
Replication per-value across ring Data Placement
Replica Replica Replica
Take the form: {Writer, Value, Time} Concurrent writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] Concurrent
writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [{b,
v1, t2}] [{b, v1, t2}] [{b, v1, t2}] Last Writer Wins
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [[{a,
v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] Allow Mult
User specificed Merge
Two Approaches
Strong Eventual Consistency
Designed for convergence; allows divergence Conflict-free Replicated Data Types
Solves the Dynamo concurrency anomaly Conflict-free Replicated Data Types
The Theory
None
None
None
None
Two flavors: state-based and operation-based Conflict-free Replicated Data Types
Counters, Flags, Registers, Sets, Maps, Graphs Conflict-free Replicated Data Types
Broadcast update operation Operation-Based CRDTs
Commutative; relies on unique delivery Operation-Based CRDTs
Apply change locally; propagate entire state State-Based CRDTs
State is merged between replicas State-Based CRDTs
Set of all states form a bounded join-semilattice State-Based CRDTs
Partially ordered set; join operation Bounded Join-Semilattice
Associativity: (X · Y) · Z = X · (Y
· Z) Bounded Join-Semilattice
Commutativity: X · Y = Y · X Bounded Join-Semilattice
Idempotence: X · X = X Bounded Join-Semilattice
Examples Bounded Join-Semilattice
b a c a, b a, c a, b, c
Set; merge function: union. b, c
3 5 7 5 7 7 Increasing natural; merge function:
max.
F F T F T T Booleans; merge function: or.
x <= y montone f(x) <= f(y)
Examples State-Based Observed-Remove Set
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, b}], [{1, a}] ]
Strong Consistency
Provides atomicity and recency Strong Consistency
Prohibits partial writes Strong Consistency
A A A
A A A Val = B
A A A Val = B
B A A
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair B B
Single key atomic operations Strong Consistency
Requires read/modify/write cycle (CAS) Strong Consistency
Consensus
Distributed Consensus The problem of reaching agreement among remote processes
is one of the most fundamental problems in distributed computing and is at the core of many algorithms for distributed data processing, distributed file management, and fault-tolerant distributed applications. Fischer, Lynch, Paterson
Termination, agreement, validity The Consensus Problem
All processes eventually decide on a value Termination
All processes decide on the same value Agreement
Value decided on had to have been proposed Validity
Consensus Algorithms
Paxos, ZAB, Raft, etc. Consensus Algorithms
Coordinated requests with a chosen leader The Paxos Algorithm
Node 1 Node 2 Node 3 N++ prepare(N) promise(N, Vb)
promise(N, Vc) Vn = f(Va, Vb, Vc) commit(N, Vn) accept(N)
First request Multi-Paxos
Node 1 Node 2 Node 3 N++; I = 0
prepare(N, I) promise(N, I, Vb) promise(N, I, Vc) Vn = f(Va, Vb, Vc) commit(N, I, Vn) accept(N, I)
Each additional request Multi-Paxos
Node 1 Node 2 Node 3 I++ commit(N, I, V)
accept(N, I)
Ship entire state! Multi-Paxos
Riak
Key-value store; keys are independent state Riak
Multi-Paxos per key; CAS on isolated state Riak
Consensus Groups
Participants in decisioning; ensembles Consensus Groups
Use the preference list! Consensus Groups
preflist
None
None
None
None
One ensemble per preference list; ring size Consensus Groups
Ensembles
election of leader; get/put operations Riak Ensembles
read local; refresh, if old Get Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Val = latest(Vala, Valb, Valc) Val.epoch = epoch write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Reply
= local_get(Key)
Worst Case: 2 roundtrips / write Get Operations Best Case:
0 roundtrips / write
read local; refresh, modify and commit if old Put Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Latest = latest(Vala, Valb, Valc) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Latest
= local_get(Key) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq)
Worst Case: 2 roundtrips / write Put Operations Best Case:
1 roundtrips / write
Elect a new leader; start a new epoch Failed Quorums
Cluster Membership
Use joint consensus from multi paxos Dynamic Membership
Existing Ensemble Joining Ensemble riak_01 riak_02 riak_03 riak_07 riak_08 riak_09
[{riak_01}, {riak_02}, {riak_03}] [{riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
New Ensemble riak_07 riak_08 riak_09 [{riak_07}, {riak_08}, {riak_09}]
Single-key linearizability; reduced availability Strong Consistency
$ riak-admin bucket-type create strongly_consistent \ ‘{"props":{"consistent":true}}' $ riak-admin bucket-type
status strongly_consistent $ riak-admin bucket-type activate strongly_consistent Enable strong consistency; http://docs.basho.com/riak/latest/dev/advanced/strong-consistency/
Conflict-Free Replicated Data Types Strong Eventual Consistency
$ riak-admin bucket-type create maps \ '{"props":{"datatype":"map"}}' $ riak-admin bucket-type
create sets \ '{"props":{"datatype":"set"}}' $ riak-admin bucket-type create counters \ ‘{“props":{"datatype":"counter"}}' $ riak-admin bucket-type status maps $ riak-admin bucket-type activate maps Create bucket type for data types; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets \ -H "Content-Type: application/json" \
-d '{"increment": 1}’ $ curl http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets Operate on counters; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \
-d '{"add_all":["Toronto", “Montreal"]}' $ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \ -d '{"remove": “Montreal"}' $ curl http://localhost:10018/types/sets/buckets/travel/datatypes/ cities Operate on sets; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \
-d ' { "update": { "first_name_register": "Ahmed", "phone_number_register": "5551234567" } }' $ curl -XPOST http://localhost:8098/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \ -d ' { "update": { "annika_info_map": { "update": { "interests_set": { "add": "tango dancing" } } } } } ' Operate on maps; http://docs.basho.com/riak/latest/dev/using/data-types/
Questions?