Prac%cal Demys%fica%on
of
CRDTs
Nami Nasserazad (@namiazad, nami.me)
Dmitry Ivanov (@idajan8s)
Didier Liauw
Curry On, Rome, 2016
1
Slide 2
Slide 2 text
Disclaimer
We are NOT:
• Distributed systems experts.
• Hardcore academia guys.
Just curious engineers hacking on real
world problems.
2
Slide 3
Slide 3 text
Who We Are
"Fool" stack developers hacking on:
• Backend services
• Mobile || SDKs
• Infrastructure && AWS && DevOps
3
Slide 4
Slide 4 text
NavCloud
4
Slide 5
Slide 5 text
Server Development Stack
5
Slide 6
Slide 6 text
Client Libraries
6
Slide 7
Slide 7 text
NavCloud Nature
• Unstable connec,ons
• Limited bandwidth
• Seamless edit/view in offline mode
• Concurrent changes with poten7al
conflicts
• No guarantee on updates order
• No data loss
• Data convergence to expected value
7
Slide 8
Slide 8 text
How to Deal with this Nature?
8
Slide 9
Slide 9 text
Bad programmers worry about the
code. Good programmers worry
about data structures
— Linus Torvalds
9
Slide 10
Slide 10 text
CRDT
10
Slide 11
Slide 11 text
CRDT
DT: Data Type
CRDT is a data type with its own algebra
11
Slide 12
Slide 12 text
CRDT
R: Replicated
CRDT is a family of data structures which
has been designed to be distributed
12
Slide 13
Slide 13 text
CRDT
C: Conflict Free
Resolving conflicts is done automa2cally
13
Slide 14
Slide 14 text
How?
14
Slide 15
Slide 15 text
Merge
15
Slide 16
Slide 16 text
What is Merge?
• A binary opera-on on two CRDTs
• Commuta've: x • y = y • x
• Associa've: ( x • y ) • z = x • ( y • z )
• Idempotent: x • x = x
16
Slide 17
Slide 17 text
How Does it Help?
In Distributed Systems:
• Order is not guaranteed:
• No Problem: Merge is Commuta-ve and Associa-ve
• Events can be delivered more than once:
• No problem: Merge is Idempotent
17
Slide 18
Slide 18 text
What Does it Bring in Prac1ce?
• Local updates
• Local merge of receiving data
• All local merges converge
18
Slide 19
Slide 19 text
Examples
19
Slide 20
Slide 20 text
G-Counter
20
Slide 21
Slide 21 text
G-Counter
Merge: Max of corresponding elements: A:6 B:3 C:9
TotalValue: Sum of all elements: 6 + 3 + 9 = 18
21
Slide 22
Slide 22 text
Max Func)on
• A binary opera-on on two CRDTs
• Commuta've: x max y = y max x
• Associa've: ( x max y ) max z = x max ( y max z )
• Idempotent: x max x = x
22
Slide 23
Slide 23 text
G-Set
23
Slide 24
Slide 24 text
G-Set
Merge: Union of sets: { x, y, z, a, b, c }
Total Value: The same as the merge result
24
Slide 25
Slide 25 text
Union Func)on
• A binary opera-on on two CRDTs
• Commuta've: x ∪ y = y ∪ x
• Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z )
• Idempotent: x ∪ x = x
25
Slide 26
Slide 26 text
CRDT in NavCloud
26
Slide 27
Slide 27 text
Favorite Loca,ons
Synchronisa,on
27
Slide 28
Slide 28 text
Naive Approach?
28
Slide 29
Slide 29 text
Last Write Wins
29
Slide 30
Slide 30 text
Problems
• Unstable connec-ons
• Actual update -me < Sent -me
• Network latency
• Sent -me < Received -me
• Unreliable clocks
30
Slide 31
Slide 31 text
Stale update may win!
31
Slide 32
Slide 32 text
So What?
32
Slide 33
Slide 33 text
CRDT
33
Slide 34
Slide 34 text
NavCloud Nature vs CRDT
• Unstable connec,ons ✔
• Limited bandwidth ✔
• Seamless edit/view in offline mode ✔
• Concurrent changes with poten7al
conflicts ✔
• No guarantee on updates order ✔
• No data loss ✔
• Data convergence to expected value ✔
34
Slide 35
Slide 35 text
Same Data Model Everywhere
• Server
• Clients
• Data store
35
Slide 36
Slide 36 text
Merging Conflicts in Riak
36
Slide 37
Slide 37 text
Implemen'ng a CRDT Set
What do we want?
• Support for addi-on and removal opera-ons.
• Op-mized for element muta-ons.
• Footprint as compact as possible.
37
Slide 38
Slide 38 text
2-Phase-Set
Supports addi,ons and removals.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
38
2-Phase-Set
Doesn't work for us:
• Removed element can't be added again
• Immutable elements: no updates possible
42
Slide 43
Slide 43 text
LWW-Element-Set
Supports addi,ons and removals, with !mestamps.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
• Each element has a 3mestamp
• Supports re-adding removed element using a higher 3mestamp
43
LWW-Element-Set
Doesn't work for us:
• Immutable elements: no updates possible.
48
Slide 49
Slide 49 text
OR-Set
OR - Observed / Removed
Supports addi,ons and removals, with tags.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
• Unique tag is associated with each element
• Supports re-adding removed elements
49
OR-Set
Lookup
E exists iff it has in AddSet a tag that is not in the RemoveSet.
def lookup(): Set =
addSet.filter { addElem =>
!removeSet.exists { remElem =>
addElem.value == remElem.value
&& remElem.tag.equals(addElem.tag) }
}
.map(_.value);
53
OR-Set
Doesn't work for us:
• Immutable elements: no updates possible.
55
Slide 56
Slide 56 text
OUR-Set
Our take on Observed-Updated-Removed Set
• Each element has a unique iden%fier
• Element can be changed if iden4fier remains the same
• Each element has a %mestamp
• Timestamp is updated on each element muta4on
Iden%ty (immutable unique id) vs Value (mutable)
56
Slide 57
Slide 57 text
OUR-Set
Contains a single underlying set of elements with metadata:
• Each element has a unique id field (e.g. a UUID)
• Each element has a "removed" boolean flag
• Each element has a )mestamp
• Set can only contain one element with a par'cular id
57
CRDT Model: Favorites
FavoriteState element:
• ID (to uniquely iden.fy a favorite)
• Timestamp (to indicate the last change .me)
• Removed flag (to indicate if favorite has been removed)
• Favorite data: ( Name, Loca2on, ... )
63
Slide 64
Slide 64 text
Convergence in case of equal !mestamps
Compare func-on checks all the fields in order of priority:
• Timestamp
• Removed flag (Add or Delete bias)
• .. rest a6ributes ..
64
Slide 65
Slide 65 text
Using CRDT everywhere
• Use the same algorithm everywhere
As simple as calling the merge func8on
65
Slide 66
Slide 66 text
Using CRDT everywhere
Client <-> Server <-> Database
def update(fromClient: OURSet[FavoriteState]): OURSet[FavoriteState] = {
val fromDatabase = database.fetch(...)
val newSet = fromDatabase.merge(fromClient)
database.push(newSet)
newSet
}
66
Slide 67
Slide 67 text
Considera*ons & Limita*ons
67
Slide 68
Slide 68 text
"What about garbage?"
• CRDTs tend to grow because of tombstones.
• Deleted Favorite in the Set == Tombstone.
• A poten?ally unbounded growth.
68
Slide 69
Slide 69 text
Prune deleted elements
But when?
Requirement:
All nodes holding a CRDT Set replica should have seen a deleted
element before it can be pruned.
Otherwise deleted elements can be resurrected.
69
Slide 70
Slide 70 text
Time-To-Live for tombstones
Prune tombstones once TTL exceeded.
if ((DateTime.now() - tombstone.timestamp) > TimeToLive) {
crdtSet.remove(tombstone)
}
Requirement: all nodes holding a CRDT set should apply the same
TTL rule independently.
70
Slide 71
Slide 71 text
Send and reply with a Diff
Client modifies and sends only updated elements (Diff).
Before: Server responds with a full merged result.
71
Slide 72
Slide 72 text
Send and reply with a Diff
We introduced a 'Scoped Diff':
Server responds only with the elements which have won against
those sent by the client.
72
Slide 73
Slide 73 text
Trouble With Time
73
Slide 74
Slide 74 text
There is no such thing as reliable (me*.
74
Slide 75
Slide 75 text
Tracking *me is actually
tracking causality.
— Jonas Bonér, "Life Beyond the Illusion of Present"
75
Slide 76
Slide 76 text
Causality & Ordering of events.
76
Slide 77
Slide 77 text
Time can be just good enough.
77
Slide 78
Slide 78 text
Ordering updates within a single node
Timestamp field as a logical clock.
Absolute value is not important,
but it should always grow monotonically.
78
Slide 79
Slide 79 text
Ordering updates within a single node
"+1 Strategy":
Long resolveNewTimestamp(ElementState state) {
return Math.max( retrieveTimestamp(),
state.lastModified() + 1 );
}
79
Slide 80
Slide 80 text
Ordering updates from different nodes
If GPS clock is available -> use it (mainly Naviga&on Devices case).
Prefer the server &me to a client's local 0me.
80
Slide 81
Slide 81 text
Edge case
Mul$ple Clients modify the same element
(concurrently || without a reliable clock).
81
Slide 82
Slide 82 text
One "merge" to rule them all
82
Slide 83
Slide 83 text
Clients & Server MUST have same 'merge'
behaviour.
==
Given the same input, their 'merge' func/ons
emit the same results.
83
Slide 84
Slide 84 text
Divergence may lead to endless synchronisa0on loops!
84
Slide 85
Slide 85 text
What have we learned?
• Academia is not as scary as it some-mes
seems to pragma&c devs.
• Check for the exis,ng && simple(r) solu-ons.
• Understand their limita,ons.
• Analyze and monitor the real usage.
• Never se=le: constantly search how to tune & improve.
85
Slide 86
Slide 86 text
"Show me the code"
github.com/ajan/s/scala-crdt
github.com/ajan/s/java-crdt
86
Slide 87
Slide 87 text
Want to know more? (Part 1)
• CRDTs for fun and eventual profit, - Noel Welsh, 2013.
• Readings in conflict-free replicated data types, - Christopher
Meiklejohn, 2015.
• A comprehensive study of Convergent and CommutaJve
Replicated Data Types, - Marc Shapiro, Nuno Preguiça, Carlos
Baquero, Marek Zawirski, 2011.
87
Slide 88
Slide 88 text
Want to know more? (Part 2)
• Lasp: A language for distributed, coordina7on-free programming,
- Meiklejohn & Van Roy, 2015.
• Swarm.js+React — real-7me, offline-ready Holy Grail web apps, -
Victor Grishchenko.
88