Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Speaker Deck
PRO
Sign in
Sign up for free
designing for concurrency with riak
Mathias Meyer
May 29, 2012
Programming
11
1.6k
designing for concurrency with riak
http://riakhandbook.com
Mathias Meyer
May 29, 2012
Tweet
Share
More Decks by Mathias Meyer
See All by Mathias Meyer
Building and Scaling an Distributed and Inclusive Team
roidrage
0
970
cooking infrastructure with chef
roidrage
4
210
The Message Queue is Dead, Long Live the Message Queue
roidrage
4
600
riak-js
roidrage
1
260
metrics, monitoring, logging
roidrage
82
14k
design for cloud - jax 2012
roidrage
2
270
A Riak Query Tale
roidrage
5
970
Don't Use NoSQL
roidrage
10
1k
Designing Applications for Amazon Web Services (GOTO Aarhus)
roidrage
6
310
Other Decks in Programming
See All in Programming
(新米)エンジニアリングマネージャーのしごと #RSGT2023
murabayashi
9
5.6k
フロントエンドで学んだことをデータ分析で使ってみた話
daichi_igarashi
0
180
フロントエンドで 良いコードを書くために
t_keshi
3
1.6k
23年のJavaトレンドは?Quarkusで理解するコンテナネイティブJava
tatsuya1bm
1
120
ペパカレで入社した私が感じた2つのギャップと向き合い方
kosuke_ito
0
220
Cloudflare WorkersでGoを動かすライブラリを作っている話
syumai
1
310
Listかもしれない
irof
1
240
NGK2023S - OCaml最高! スマホ開発にも使えちゃう?!
haochenxie
0
110
Findy - エンジニア向け会社紹介 / Findy Letter for Engineers
findyinc
2
42k
Functional Data Engineering - A Blueprint for adopting functional principles in data pipeline
vananth22
0
180
レガシーフレームワークからの移行
ug
0
110
Hasura の Relationship と権限管理
karszawa
0
170
Featured
See All Featured
Support Driven Design
roundedbygravity
88
8.9k
The Art of Programming - Codeland 2020
erikaheidi
35
11k
Raft: Consensus for Rubyists
vanstee
130
5.7k
It's Worth the Effort
3n
177
26k
The Invisible Side of Design
smashingmag
292
48k
A Philosophy of Restraint
colly
193
15k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
32
6.7k
The Illustrated Children's Guide to Kubernetes
chrisshort
22
42k
Facilitating Awesome Meetings
lara
33
4.6k
Building Applications with DynamoDB
mza
85
5k
Designing for humans not robots
tammielis
245
24k
Building Flexible Design Systems
yeseniaperezcruz
314
35k
Transcript
designing for concurrency with riak nosql matters mathias meyer, @roidrage
None
http://riakhandbook.com
design for concurrency?
design data for concurrency
data starts out simple
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3
karen
[email protected]
single source of truth
always consistent
mostly consistent
monotonic
increase number of sources
replication
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3
karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3
karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3
karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
eventual consistency* * if no new updates are made to
the object, eventually all accesses will return the last updated value. werner vogels, 2008, http://queue.acm.org/detail.cfm?id=1466448
multiple clients
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3
karen
[email protected]
ID Username Email 1 roidrage
[email protected]
2 thomas
[email protected]
3 karen
[email protected]
Client 1 Client 2 PUT PUT
conflicting writes
siblings
data diverges
the challenge
determine the winner
determine order
designing data for concurrency
designing data for non-monotonic writes
no atomicity in riak
no coordination
all state is in the data
(eventual) consistency and logical monoticity * hellerstein: the declarative imperative:
experiences and conjectures in distributed logic (2010)
designing data with conflicts in mind
write now, converge later
rethink the data structures
ID Username Email 1 roidrage
[email protected]
{ "id": 1,
"username": "roidrage", "email": "
[email protected]
" }
track updates
{ "id": 1, "username": "roidrage", "email": "
[email protected]
"
"changes": [ { "client": "client-‐1", "timestamp": 1337001337, "updates": [ "firstname": "Mathias", "lastname": "Meyer" ] } ] }
{ "id": 1, "username": "roidrage", "email": "
[email protected]
"
"changes": [ { "client": "client-‐2", "timestamp": 1337001337, "updates": [ "email": "
[email protected]
" ] } ] }
apply all updates ordered by time
what about removing data?
{ "id": 1, "username": "roidrage", "email": "
[email protected]
"
"changes": [{ "client": "client-‐1", "timestamp": 1337001337, "updates": [ { "_op": "delete", "attribute": "email" } ] }] }
{ "id": 1, "username": "roidrage", "email": "
[email protected]
"
"changes": [{ "client": "client-‐2", "timestamp": 1337001337, "updates": [ { "_op": "add", "attribute": "email", "value": "
[email protected]
" } ] }] }
keep a changelog
client converges data
time as a means of ordering* * leslie lamport, et.
al.: time, clocks and the ordering of events in a distributed system (1977)
time is not a guarantee for uniqueness
vector clocks?
{ "id": 1, "username": "roidrage", "email": "
[email protected]
"
"changes": [{ "id": "ca0cb932-‐a74e-‐11e1-‐9ce4-‐1093e90b5d80", "timestamp": 1337001337, "updates": [ { "_op": "delete", "attribute": "email" } ] ] }
timelines* * riak at yammer: http://basho.com/blog/technical/2011/03/28/Riak-and-Scala-at-Yammer/
time-ordered series of events
kept per user
{ "events": [ {
"id": "ca0cb932-‐a74e-‐11e1-‐9ce4-‐1093e90b5d80", "timestamp": 1337001337, "event": { "type": "push", "repository": "rails/rails", "sha1": "0ea43bf" } }, { "id": "e018f024-‐a74e-‐11e1-‐9feb-‐1093e90b5d80", "timestamp": 1337001337, "event": { "type": "pull_request", "repository": "rails/rails", "sha1": "84efda0" } } ] }
clients dedup, sort and truncate
observation: clients manage the data
sets, counters, graphs
monotonic data structures
sets
an unordered bag of unique items
simplest thing that could possibly work...in riak
secondary indexes
X-‐Riak-‐Index-‐tags_bin: nosql, cloud, infrastructure { "id": 1, "username":
"roidrage", "email": "
[email protected]
" }
always unique
useful for simple things
useful for object associations
add-only
set: time-ordered list of operations
{ "set": [ {
"id": "e018f024-‐a74e-‐11e1-‐9feb-‐1093e90b5d80", "timestamp": 1337001337, "op": "add", "value": "roidrage" } ] }
{ "set": [ {
"id": "e018f024-‐a74e-‐11e1-‐9feb-‐1093e90b5d80", "timestamp": 1337001337, "op": "add", "value": "roidrage" }, { "id": "56707cee-‐a757-‐11e1-‐8e1b-‐1093e90b5d80", "timestamp": 1337001339, "op": "add", "value": "josh" } ] }
{ "set": [ {
"id": "e018f024-‐a74e-‐11e1-‐9feb-‐1093e90b5d80", "timestamp": 1337001337, "op": "add", "value": "roidrage" }, { "id": "56707cee-‐a757-‐11e1-‐8e1b-‐1093e90b5d80", "timestamp": 1337001339, "op": "add", "value": "josh" }, { "id": "a525f16c-‐a968-‐11e1-‐8b07-‐1093e90b5d80", "timestamp": 1337001343, "op": "remove", "value": "josh" } ] }
slightly inefficient
2-phase set* * https://github.com/aphyr/meangirls
{ "set": { "adds": ["roidrage", "josh"],
"removes": ["josh"] } }
counters
increment, decrement
{ "counter": [ {
"id": "e018f024-‐a74e-‐11e1-‐9feb-‐1093e90b5d80", "timestamp": 1337001337, "op": "incr", "value": 4 } ], }
g-counters* *a comprehensive study of convergent and commutative replicated data
types http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
{ "elements": { "client-‐1": 1,
"client-‐2": 3, "client-‐3": 5 } } value = 1 + 3 + 5 = 9
counters are easy when you increment only
convergent replicated data types *shapiro et. al.: a comprehensive study
of convergent and commutative replicated data types http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
statebox for erlang* * https://github.com/mochi/statebox
knockbox for clojure* * https://github.com/reiddraper/knockbox
data represents state
state-based means growth
data increases with lots of updates
dealing with growth
truncate
roll up, discard
{ "counter": [{ "id": "458f5936-‐a752-‐11e1-‐a876-‐1093e90b5d80",
"timestamp": 1337001347, "op": "inc", "value": 1 }], "value": 2 }
garbage collection
not easy with riak
not easy with stateful data
garbage collection requires coordination
network partitions cause stale data
the solution?
trade off data size vs. consistency
commutative replicated data types* *shapiro et. al.: a comprehensive study
of convergent and commutative replicated data types http://hal.inria.fr/docs/00/55/55/88/PDF/techreport.pdf
operations instead of state
not yet possible with riak
eventual consistency is hard
thanks