Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Consistency and Riak
Search
Christopher Meiklejohn
February 25, 2015
Programming
140
0
Share
Consistency and Riak
Christopher Meiklejohn
Riak Meetup, Paris, February 2015
Christopher Meiklejohn
February 25, 2015
More Decks by Christopher Meiklejohn
See All by Christopher Meiklejohn
Towards a Solution to the Red Wedding Problem
cmeiklejohn
0
430
Language Support for Cloud-Scale Distributed Systems
cmeiklejohn
0
570
Towards a Systems Approach to Distributed Programming
cmeiklejohn
3
390
Scaling a Startup with a 21st Century Programming Language
cmeiklejohn
0
430
Practical Evaluation of the Lasp Programming Model at Scale
cmeiklejohn
4
2.8k
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
300
Declarative, Convergent, Edge Computation
cmeiklejohn
1
230
Just-Right Consistency - Closing the CAP Gap
cmeiklejohn
3
1.4k
A Certain Tendency of the Database Community
cmeiklejohn
0
490
Other Decks in Programming
See All in Programming
How Swift's Type System Guides AI Agents
koher
0
290
Kingdom of the Machine
yui_knk
2
620
TiDBのアーキテクチャから学ぶ分散システム入門 〜MySQL互換のNewSQLは何を解決するのか〜 / tidb-architecture-study
dznbk
1
180
ルールルルルルRubyの中身の予備知識 ── RubyKaigiの前に予習しなイカ?
ydah
1
190
AI時代のPhpStorm最新事情 #phpcon_odawara
yusuke
0
190
Server-Side Kotlin LT大会 vol.18 [Kotlin-lspの最新情報と Neovimのlsp設定例]
yasunori0418
1
170
クラウドネイティブなエンジニアに向ける Raycastの魅力と実際の活用事例
nealle
2
210
CDK Deployのための ”反響定位”
watany
5
800
2026_04_15_量子計算をパズルとして解く
hideakitakechi
0
110
GoogleCloudとterraform完全に理解した
terisuke
1
120
🦞OpenClaw works with AWS
licux
1
160
Making the RBS Parser Faster
soutaro
0
450
Featured
See All Featured
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.4k
Designing Experiences People Love
moore
143
24k
GraphQLとの向き合い方2022年版
quramy
50
15k
Context Engineering - Making Every Token Count
addyosmani
9
840
Future Trends and Review - Lecture 12 - Web Technologies (1019888BNR)
signer
PRO
0
3.5k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
10k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
31
3.2k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
49
9.9k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
120
Abbi's Birthday
coloredviolet
2
7.2k
Designing for Timeless Needs
cassininazir
0
200
Taking LLMs out of the black box: A practical guide to human-in-the-loop distillation
inesmontani
PRO
3
2.1k
Transcript
Consistency and Riak Christopher Meiklejohn Riak Meetup, Paris, February 2015
@cmeik
History
Published SOSP 2007; key-value storage system Amazon Dynamo
Focused on high-availability and low-latency Amazon Dynamo
Collection of distributed systems techniques Amazon Dynamo
LinkedIn Voldemort, Facebook Cassandra Amazon Dynamo
Released 2009; Apache2 licensed Dynamo clone Basho Riak
Riak Architecture
Consistent Hashing hash(bucket/key)
hash ring
tokenize it
node 0 node 1 node 2 hash(key)
node 0 node 1 node 2 Replicas are stored to
the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2 hash(companies/cisco) Replicas are stored
to the N - 1 contiguous partitions
node 0 node 1 node 2
Scaling out node 0 node 1 node 2 node 3
+
Quorum requests N R W PR/PW DW
Vector Clocks establish temporality
None
None
Anatomy of a Request get(users/clay-davis)
Anatomy of a Request get(users/clay-davis) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
hash(users/clay-davis) == 10, 11, 12 Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
get(users/clay-davis) Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 The Ring R=2 obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) Get Handler (FSM) client Riak
R=2 obj obj
Anatomy of a Request get(users/clay-davis) obj
Read Repair (Anti-Entropy)
replica replica replica
replica replica replica X
replica replica replica replica replica replica
Active Anti-Entropy (self healing clusters)
real-time updates persistent non-blocking disk-based
merkle tree to track changes coordinated at the vnode level
runs as a background process exchange with neighbor vnodes for inconsistencies resolution semantics: trigger read-repair
= hashes marked dirty
None
None
None
None
= keys to read-repair
Riak and Consistency
Riak Object
BKey Value
Consistent hashing; dynamic membership Data Placement
None
None
None
Replication per-value across ring Data Placement
Replica Replica Replica
Take the form: {Writer, Value, Time} Concurrent writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] Concurrent
writes
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [{b,
v1, t2}] [{b, v1, t2}] [{b, v1, t2}] Last Writer Wins
[{a, v1, t1}] [{b, v1, t2}] [{a, v1, t1}] [[{a,
v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] [[{a, v1, t1}, {b, v1, t2}] Allow Mult
User specificed Merge
Two Approaches
Strong Eventual Consistency
Designed for convergence; allows divergence Conflict-free Replicated Data Types
Solves the Dynamo concurrency anomaly Conflict-free Replicated Data Types
The Theory
None
None
None
None
Two flavors: state-based and operation-based Conflict-free Replicated Data Types
Counters, Flags, Registers, Sets, Maps, Graphs Conflict-free Replicated Data Types
Broadcast update operation Operation-Based CRDTs
Commutative; relies on unique delivery Operation-Based CRDTs
Apply change locally; propagate entire state State-Based CRDTs
State is merged between replicas State-Based CRDTs
Set of all states form a bounded join-semilattice State-Based CRDTs
Partially ordered set; join operation Bounded Join-Semilattice
Associativity: (X · Y) · Z = X · (Y
· Z) Bounded Join-Semilattice
Commutativity: X · Y = Y · X Bounded Join-Semilattice
Idempotence: X · X = X Bounded Join-Semilattice
Examples Bounded Join-Semilattice
b a c a, b a, c a, b, c
Set; merge function: union. b, c
3 5 7 5 7 7 Increasing natural; merge function:
max.
F F T F T T Booleans; merge function: or.
x <= y montone f(x) <= f(y)
Examples State-Based Observed-Remove Set
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [{1, a}] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ] [ [{1, a}, {2, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ]
[ [{1, a}], [] ] [ [{1, a}], [] ]
[ [{1, a}, {2, b}], [] ] [ [{1, a}], [{1, a}] ] [ [{1, a}, {2, b}], [{1, a}] ]
Strong Consistency
Provides atomicity and recency Strong Consistency
Prohibits partial writes Strong Consistency
A A A
A A A Val = B
A A A Val = B
B A A
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair
B A A Get Operation with Read Repair B B
Single key atomic operations Strong Consistency
Requires read/modify/write cycle (CAS) Strong Consistency
Consensus
Distributed Consensus The problem of reaching agreement among remote processes
is one of the most fundamental problems in distributed computing and is at the core of many algorithms for distributed data processing, distributed file management, and fault-tolerant distributed applications. Fischer, Lynch, Paterson
Termination, agreement, validity The Consensus Problem
All processes eventually decide on a value Termination
All processes decide on the same value Agreement
Value decided on had to have been proposed Validity
Consensus Algorithms
Paxos, ZAB, Raft, etc. Consensus Algorithms
Coordinated requests with a chosen leader The Paxos Algorithm
Node 1 Node 2 Node 3 N++ prepare(N) promise(N, Vb)
promise(N, Vc) Vn = f(Va, Vb, Vc) commit(N, Vn) accept(N)
First request Multi-Paxos
Node 1 Node 2 Node 3 N++; I = 0
prepare(N, I) promise(N, I, Vb) promise(N, I, Vc) Vn = f(Va, Vb, Vc) commit(N, I, Vn) accept(N, I)
Each additional request Multi-Paxos
Node 1 Node 2 Node 3 I++ commit(N, I, V)
accept(N, I)
Ship entire state! Multi-Paxos
Riak
Key-value store; keys are independent state Riak
Multi-Paxos per key; CAS on isolated state Riak
Consensus Groups
Participants in decisioning; ensembles Consensus Groups
Use the preference list! Consensus Groups
preflist
None
None
None
None
One ensemble per preference list; ring size Consensus Groups
Ensembles
election of leader; get/put operations Riak Ensembles
read local; refresh, if old Get Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Val = latest(Vala, Valb, Valc) Val.epoch = epoch write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Reply
= local_get(Key)
Worst Case: 2 roundtrips / write Get Operations Best Case:
0 roundtrips / write
read local; refresh, modify and commit if old Put Operations
Node 1 Node 2 Node 3 obj.epoch < epoch get(key)
reply(Epochb, Seqb, Valb) Latest = latest(Vala, Valb, Valc) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq) reply(Epochc, Seqc, Valc)
Node 1 Node 2 Node 3 obj.epoch == epoch Latest
= local_get(Key) Val = modify(Latest) write(Epoch, ++Seq, Val) ack(Epoch, Seq)
Worst Case: 2 roundtrips / write Put Operations Best Case:
1 roundtrips / write
Elect a new leader; start a new epoch Failed Quorums
Cluster Membership
Use joint consensus from multi paxos Dynamic Membership
Existing Ensemble Joining Ensemble riak_01 riak_02 riak_03 riak_07 riak_08 riak_09
[{riak_01}, {riak_02}, {riak_03}] [{riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
Joint-Consensus Ensemble [{riak_01}, {riak_02}, {riak_03}, {riak_07}, {riak_08}, {riak_09}]
New Ensemble riak_07 riak_08 riak_09 [{riak_07}, {riak_08}, {riak_09}]
Single-key linearizability; reduced availability Strong Consistency
$ riak-admin bucket-type create strongly_consistent \ ‘{"props":{"consistent":true}}' $ riak-admin bucket-type
status strongly_consistent $ riak-admin bucket-type activate strongly_consistent Enable strong consistency; http://docs.basho.com/riak/latest/dev/advanced/strong-consistency/
Conflict-Free Replicated Data Types Strong Eventual Consistency
$ riak-admin bucket-type create maps \ '{"props":{"datatype":"map"}}' $ riak-admin bucket-type
create sets \ '{"props":{"datatype":"set"}}' $ riak-admin bucket-type create counters \ ‘{“props":{"datatype":"counter"}}' $ riak-admin bucket-type status maps $ riak-admin bucket-type activate maps Create bucket type for data types; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets \ -H "Content-Type: application/json" \
-d '{"increment": 1}’ $ curl http://localhost:10018/types/counters/buckets/counters/ datatypes/traffic_tickets Operate on counters; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \
-d '{"add_all":["Toronto", “Montreal"]}' $ curl -XPOST http://localhost:10018/types/sets/buckets/travel/ datatypes/cities \ -H "Content-Type: application/json" \ -d '{"remove": “Montreal"}' $ curl http://localhost:10018/types/sets/buckets/travel/datatypes/ cities Operate on sets; http://docs.basho.com/riak/latest/dev/using/data-types/
$ curl -XPOST http://localhost:10018/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \
-d ' { "update": { "first_name_register": "Ahmed", "phone_number_register": "5551234567" } }' $ curl -XPOST http://localhost:8098/types/maps/buckets/customers/ datatypes/ahmed_info \ -H "Content-Type: application/json" \ -d ' { "update": { "annika_info_map": { "update": { "interests_set": { "add": "tango dancing" } } } } } ' Operate on maps; http://docs.basho.com/riak/latest/dev/using/data-types/
Questions?