Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Consistency and Riak
Search
Christopher Meiklejohn
February 25, 2015
Programming
0
130
Consistency and Riak
Christopher Meiklejohn
Riak Meetup, Paris, February 2015
Christopher Meiklejohn
February 25, 2015
Tweet
Share
More Decks by Christopher Meiklejohn
See All by Christopher Meiklejohn
Towards a Solution to the Red Wedding Problem
cmeiklejohn
0
340
Language Support for Cloud-Scale Distributed Systems
cmeiklejohn
0
440
Towards a Systems Approach to Distributed Programming
cmeiklejohn
3
270
Scaling a Startup with a 21st Century Programming Language
cmeiklejohn
0
380
Practical Evaluation of the Lasp Programming Model at Scale
cmeiklejohn
4
2.6k
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
240
Declarative, Convergent, Edge Computation
cmeiklejohn
1
180
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
1.2k
A Certain Tendency of the Database Community
cmeiklejohn
0
420
Other Decks in Programming
See All in Programming
チームでモデリングを育てるうえで 考えたこと・気づいたこと / Cultivating Modeling in Teams: Thoughts and Insights
mackey0225
7
4.1k
ログラスを支える設計標準について / loglass-design-standards
urmot
10
2.1k
Zero Waste, Radical Magic, and Italian Graft – Quarkus Efficiency Secrets
hollycummins
0
210
Rubyでたのしむクリエイティブコーディング/Enjoy Creative coding with Ruby
chobishiba
1
160
ゆるい個人開発のススメ
kuroppe1819
10
930
コーンフレークから始める モデリング会話入門
ogurotakayuki
0
280
Micro Frontends for Java Microservices - Devnexus 2024
mraible
PRO
0
420
Rails と人魚の話/rails-and-mermaid
sanfrecce_osaka
0
100
雑に思考を整理する技術と効能
konifar
55
24k
try! Swift Tokyo 2024 参加報告 / try! Swift Tokyo 2024 Report
hironytic
0
170
Elm 0.19.0 Changes
bkuhlmann
0
480
Ruby GitHub Packages
bkuhlmann
0
620
Featured
See All Featured
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
118
38k
Why Our Code Smells
bkeepers
PRO
331
56k
Optimising Largest Contentful Paint
csswizardry
7
2.3k
KATA
mclloyd
14
12k
Building Better People: How to give real-time feedback that sticks.
wjessup
353
18k
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
13
1.5k
Thoughts on Productivity
jonyablonski
57
3.8k
Statistics for Hackers
jakevdp
789
220k
Why You Should Never Use an ORM
jnunemaker
PRO
50
8.6k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
115
18k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
1
1.3k
Building a Scalable Design System with Sketch
lauravandoore
455
32k
Transcript
Consistency and Riak Christopher Meiklejohn Riak Meetup, Paris, February 2015
@cmeik
History
Published SOSP 2007; key-value storage system Amazon Dynamo
Focused on high-availability and low-latency Amazon Dynamo
Collection of distributed systems techniques Amazon Dynamo
LinkedIn Voldemort, Facebook Cassandra Amazon Dynamo
Released 2009; Apache2 licensed Dynamo clone Basho Riak
Riak Architecture
Consistent Hashing hash(bucket/key)
hash ring
tokenize it
node 0 node 1 node 2 hash(key)
node 0 node 1 node 2 Replicas are stored to
the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2
Scaling out node 0 node 1 node 2 node 3
+
Quorum requests N R W PR/PW DW
Vector Clocks establish temporality
None
None
Anatomy of a Request get(users/clay-davis)
Anatomy of a Request get(users/clay-davis) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
get(users/clay-davis) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) obj
Read Repair (Anti-Entropy)
replica replica replica
replica replica replica X
replica replica replica replica replica replica
Active Anti-Entropy (self healing clusters)
real-time updates persistent non-blocking disk-based
merkle tree to track changes coordinated at the vnode level
runs as a background process exchange with neighbor vnodes for inconsistencies resolution semantics: trigger read-repair
= hashes marked dirty
None
None
None
None
= keys to read-repair
Riak and Consistency
Riak Object
BKey Value
Consistent hashing; dynamic membership Data Placement
None
None
None
Replication per-value across ring Data Placement
Replica Replica Replica
Take the form: {Writer, Value, Time} Concurrent writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] Concurrent
writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [{b,
v1, t2}] [{b, v1, t2}] [{b, v1, t2}] Last Writer Wins
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [[{a,
v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] Allow Mult
User specificed Merge
Two Approaches
Strong Eventual Consistency
Designed for convergence; allows divergence Conflict-free Replicated Data Types
Solves the Dynamo concurrency anomaly Conflict-free Replicated Data Types
The Theory
None
None
None
None
Two flavors: state-based and operation-based Conflict-free Replicated Data Types
Counters, Flags, Registers, Sets, Maps, Graphs Conflict-free Replicated Data Types
Broadcast update operation Operation-Based CRDTs
Commutative; relies on unique delivery Operation-Based CRDTs
Apply change locally; propagate entire state State-Based CRDTs
State is merged between replicas State-Based CRDTs
Set of all states form a bounded join-semilattice State-Based CRDTs
Partially ordered set; join operation Bounded Join-Semilattice
Associativity: (X · Y) · Z = X · (Y
· Z) Bounded Join-Semilattice
Commutativity: X · Y = Y · X Bounded Join-Semilattice
Idempotence: X · X = X Bounded Join-Semilattice
Examples Bounded Join-Semilattice
b a c a, b a, c a, b, c
Set; merge function: union. b, c
3 5 7 5 7 7 Increasing natural; merge function:
max.
F F T F T T Booleans; merge function: or.
x <= y montone f(x) <= f(y)
Examples State-Based Observed-Remove Set
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, b}], [{1, a}] ]
Strong Consistency
Provides atomicity and recency Strong Consistency
Prohibits partial writes Strong Consistency
A A A
A A A Val = B
A A A Val = B
B A A
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair B B
Single key atomic operations Strong Consistency
Requires read/modify/write cycle (CAS) Strong Consistency
Consensus
Distributed Consensus The problem of reaching agreement among remote processes
is one of the most fundamental problems in distributed computing and is at the core of many algorithms for distributed data processing, distributed file management, and fault-tolerant distributed applications. Fischer, Lynch, Paterson
Termination, agreement, validity The Consensus Problem
All processes eventually decide on a value Termination
All processes decide on the same value Agreement
Value decided on had to have been proposed Validity
Consensus Algorithms
Paxos, ZAB, Raft, etc. Consensus Algorithms
Coordinated requests with a chosen leader The Paxos Algorithm
Node 1 Node 2 Node 3 N++ prepare(N) promise(N, Vb)
promise(N, Vc) Vn = f(Va, Vb, Vc) commit(N, Vn) accept(N)
First request Multi-Paxos
Node 1 Node 2 Node 3 N++; I = 0
prepare(N, I) promise(N, I, Vb) promise(N, I, Vc) Vn = f(Va, Vb, Vc) commit(N, I, Vn) accept(N, I)
Each additional request Multi-Paxos
Node 1 Node 2 Node 3 I++ commit(N, I, V)
accept(N, I)
Ship entire state! Multi-Paxos
Riak
Key-value store; keys are independent state Riak
Multi-Paxos per key; CAS on isolated state Riak
Consensus Groups
Participants in decisioning; ensembles Consensus Groups
Use the preference list! Consensus Groups
preflist
None
None
None
None
One ensemble per preference list; ring size Consensus Groups
Ensembles
election of leader; get/put operations Riak Ensembles
read local; refresh, if old Get Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Val = latest(Vala, Valb, Valc) Val.epoch = epoch write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Reply
= local_get(Key)
Worst Case: 2 roundtrips / write Get Operations Best Case:
0 roundtrips / write
read local; refresh, modify and commit if old Put Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Latest = latest(Vala, Valb, Valc) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Latest
= local_get(Key) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq)
Worst Case: 2 roundtrips / write Put Operations Best Case:
1 roundtrips / write
Elect a new leader; start a new epoch Failed Quorums
Cluster Membership
Use joint consensus from multi paxos Dynamic Membership
Existing Ensemble Joining Ensemble riak_01 riak_02 riak_03 riak_07 riak_08 riak_09
[{riak_01}, {riak_02}, {riak_03}] [{riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
New Ensemble riak_07 riak_08 riak_09 [{riak_07}, {riak_08}, {riak_09}]
Single-key linearizability; reduced availability Strong Consistency
$ riak-admin bucket-type create strongly_consistent \ ‘{"props":{"consistent":true}}' $ riak-admin bucket-type
status strongly_consistent $ riak-admin bucket-type activate strongly_consistent Enable strong consistency; http://docs.basho.com/riak/latest/dev/advanced/strong-consistency/
Conflict-Free Replicated Data Types Strong Eventual Consistency
$ riak-admin bucket-type create maps \ '{"props":{"datatype":"map"}}' $ riak-admin bucket-type
create sets \ '{"props":{"datatype":"set"}}' $ riak-admin bucket-type create counters \ ‘{“props":{"datatype":"counter"}}' $ riak-admin bucket-type status maps $ riak-admin bucket-type activate maps Create bucket type for data types; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets \ -H "Content-Type: application/json" \
-d '{"increment": 1}’ $ curl http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets Operate on counters; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \
-d '{"add_all":["Toronto", “Montreal"]}' $ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \ -d '{"remove": “Montreal"}' $ curl http://localhost:10018/types/sets/buckets/travel/datatypes/ cities Operate on sets; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \
-d ' { "update": { "first_name_register": "Ahmed", "phone_number_register": "5551234567" } }' $ curl -XPOST http://localhost:8098/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \ -d ' { "update": { "annika_info_map": { "update": { "interests_set": { "add": "tango dancing" } } } } } ' Operate on maps; http://docs.basho.com/riak/latest/dev/using/data-types/
Questions?