Prac%cal Demys%fica%on
of
CRDTs
Nami Nasserazad (@namiazad, nami.me)
Dmitry Ivanov (@idajan8s)
Didier Liauw
17-18 February 2016, Kraków
1
Slide 2
Slide 2 text
Disclaimer
We are NOT:
• Distributed systems experts.
• Hardcore academia guys.
Just curious engineers hacking on real
world problems.
2
Slide 3
Slide 3 text
Who we are?
"Fool" stack developers hacking on:
• Backend services
• Mobile || SDKs
• Infrastructure && AWS && DevOps
3
Slide 4
Slide 4 text
NavCloud
4
Slide 5
Slide 5 text
Server Development Stack
5
Slide 6
Slide 6 text
Client Libraries
6
Slide 7
Slide 7 text
NavCloud Nature
• Unstable connec,ons
• Limited bandwidth
• Seamless edit/view in offline mode
• Concurrent changes with poten7al
conflicts
• No guarantee on updates order
• No data loss
• Data convergence to expected value
7
Slide 8
Slide 8 text
How to Deal with this Nature?
8
Slide 9
Slide 9 text
Bad programmers worry about the
code. Good programmers worry
about data structures
— Linus Torvalds
9
Slide 10
Slide 10 text
CRDT
10
Slide 11
Slide 11 text
CRDT
DT: Data Type
CRDT is a data type with its own algebra
11
Slide 12
Slide 12 text
CRDT
R: Replicated
CRDT is a family of data structures which
has been designed to be distributed
12
Slide 13
Slide 13 text
CRDT
C: Conflict Free
Resolving conflicts is done automa2cally
13
Slide 14
Slide 14 text
How?
14
Slide 15
Slide 15 text
Merge
15
Slide 16
Slide 16 text
What is Merge?
• A binary opera-on on two CRDTs
• Commuta've: x • y = y • x
• Associa've: ( x • y ) • z = x • ( y • z )
• Idempotent: x • x = x
16
Slide 17
Slide 17 text
How Does it Help?
In Distributed Systems:
• Order is not guaranteed:
• No Problem: Merge is Commuta-ve and Associa-ve
• Events can be delivered more than once:
• No problem: Merge is Idempotent
17
Slide 18
Slide 18 text
What Does it Bring in Prac1ce?
• Local updates
• Local merge of receiving data
• All local merges converge
18
Slide 19
Slide 19 text
Examples
19
Slide 20
Slide 20 text
G-Counter
20
Slide 21
Slide 21 text
G-Counter
Merge: Max of corresponding elements: A:6 B:3 C:9
TotalValue: Sum of all elements: 6 + 3 + 9 = 18
21
Slide 22
Slide 22 text
Max Func)on
• A binary opera-on on two CRDTs
• Commuta've: x max y = y max x
• Associa've: ( x max y ) max z = x max ( y max z )
• Idempotent: x max x = x
22
Slide 23
Slide 23 text
G-Set
23
Slide 24
Slide 24 text
G-Set
Merge: Union of sets: { x, y, z, a, b, c }
Total Value: The same as the merge result
24
Slide 25
Slide 25 text
Union Func)on
• A binary opera-on on two CRDTs
• Commuta've: x ∪ y = y ∪ x
• Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z )
• Idempotent: x ∪ x = x
25
Slide 26
Slide 26 text
CRDT in NavCloud
26
Slide 27
Slide 27 text
Favourite Loca-ons
Synchronisa-on
27
Slide 28
Slide 28 text
Naive Approach?
28
Slide 29
Slide 29 text
Last Write Wins
29
Slide 30
Slide 30 text
Problems
• Unstable connec-ons
• Actual update -me < Sent -me
• Network latency
• Sent -me < Received -me
• Unreliable clocks
30
Slide 31
Slide 31 text
Stale update may win!
31
Slide 32
Slide 32 text
So What?
32
Slide 33
Slide 33 text
CRDT
33
Slide 34
Slide 34 text
NavCloud Nature vs CRDT
• Unstable connec,ons ✔
• Limited bandwidth ✔
• Seamless edit/view in offline mode ✔
• Concurrent changes with poten7al
conflicts ✔
• No guarantee on updates order ✔
• No data loss ✔
• Data convergence to expected value ✔
34
Slide 35
Slide 35 text
Same Data Model Everywhere
• Server
• Clients
• Data store
35
Slide 36
Slide 36 text
CRDT Set Implementa/ons
Let's do our homework :)
36
Slide 37
Slide 37 text
2-Phase-Set
Stores addi+ons and removals.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
37
2-Phase-Set
Doesn't work for us:
• Removed element can't be added again
• Immutable elements: no updates possible
41
Slide 42
Slide 42 text
LWW-Element-Set
Stores addi+ons and removals, with !mestamps.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
• Each element has a 3mestamp
• Supports re-adding removed element using a higher 3mestamp
42
LWW-Element-Set
Doesn't work for us:
• Immutable elements: no updates possible.
47
Slide 48
Slide 48 text
OR-Set
OR - Observed / Removed
Stores addi+ons and removals, with tags.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
• Unique tag is associated with each inser7on or dele7on
• Supports re-adding removed elements
48
OR-Set
Lookup
E exists iff it has in AddSet a tag that is not in the RemoveSet.
def lookup(): Set =
addSet.filter { addElem =>
!removeSet.exists { remElem =>
addElem.value == remElem.value
&& remElem.tag.equals(addElem.tag) }
}
.map(_.value);
52
OR-Set
Doesn't work for us:
• Immutable elements: no updates possible.
54
Slide 55
Slide 55 text
OUR-Set
Our take on Observed-Updated-Removed Set
• Each element has a unique iden%fier
• Element can be changed if iden4fier remains the same
• Each element has a %mestamp
• Timestamp is updated on each element muta4on
Iden%ty (immutable unique id) vs Value (mutable)
55
Slide 56
Slide 56 text
OUR-Set
Contains a single underlying set of elements with metadata:
• Each element has a unique id field (e.g. a UUID)
• Each element has a "removed" boolean flag
• Each element has a )mestamp
• Set can only contain one element with a par'cular id
56
CRDT Model: Favorites
FavoriteState element:
• ID (to uniquely iden.fy a favorite)
• Timestamp (to indicate the last change .me)
• Removed flag (to indicate if favorite has been removed)
• Favorite data: ( Name, Loca2on, ... )
62
Slide 63
Slide 63 text
Convergence in case of equal !mestamps
Compare func-on checks all the fields in order of priority:
• Timestamp
• Removed flag (Add or Delete bias)
• .. rest a6ributes ..
63
Slide 64
Slide 64 text
Using CRDT everywhere
• Use the same algorithm everywhere
As simple as calling the merge func8on
64
Slide 65
Slide 65 text
Using CRDT everywhere
Client <-> Server <-> Database
def update(fromClient: OURSet[FavoriteState]): OURSet[FavoriteState] = {
val fromDatabase = database.fetch(...)
val newSet = fromDatabase.merge(fromClient)
database.push(newSet)
newSet
}
65
Slide 66
Slide 66 text
Considera*ons & Limita*ons
66
Slide 67
Slide 67 text
"What About Garbage?"
• CRDTs tend to grow because of tombstones.
• Deleted Favorite in the Set == Tombstone.
• A poten?ally unbounded growth.
67
Slide 68
Slide 68 text
Case
MyDrive beta-test user with ~3000 deleted
favorites and 5 non-deleted ones.
=> 1 Mb Favorites.json
68
Slide 69
Slide 69 text
Prune deleted favorites
But when?
Requirement: all nodes holding a Favorites set should have seen a
deleted element before it can be pruned.
Otherwise deleted elements can be resurrected.
69
Slide 70
Slide 70 text
Solu%on #1: Client-awareness & LastSyncTime
Capturing a +me of the last sync between a client and the service.
if (clients.forAll(_.lastSyncTimestamp > tombstone.timestamp)) {
crdtSet.remove(tombstone)
}
70
Slide 71
Slide 71 text
Solu%on #2: Time-To-Live for tombstones
Prune tombstones once TTL exceeded.
if ((DateTime.now() - tombstone.timestamp) > TimeToLive) {
crdtSet.drop(tombstone)
}
Requirement: all nodes holding a CRDT set should apply the same
TTL rule independently.
71
Slide 72
Slide 72 text
Solu%on #3: Send only diff upon any update.
Client has a set of [ A, B, C ]; Server has a set of [ A, B'', C ].
Client modifies and sends only updated favorites: [ A', B' ]
Before: Server responds with a full merged set [ A', B'', C ].
72
Slide 73
Slide 73 text
Solu%on #3: Send only diff upon any update.
We introduced a scoped diff:
Server responds with a diff set [ B'' ] as B' update from the client
has lost to B'' on the server.
A' element is skipped as it has won on the server.
73
Slide 74
Slide 74 text
Trouble With Time
74
Slide 75
Slide 75 text
There is no such thing as reliable (me.
75
Slide 76
Slide 76 text
Tracking *me is actually
tracking causality.
— Jonas Bonér, "Life Beyond the Illusion of Present"
Causality & Ordering events.
76
Slide 77
Slide 77 text
Time can be just good enough.
77
Slide 78
Slide 78 text
Ordering updates within a single node
Timestamp field as a logical clock.
Actual value is not important, but it should always grow
monotonically.
78
Slide 79
Slide 79 text
Ordering updates within a single node
"+1 Strategy":
Long resolveNewTimestamp(ElementState state) {
return Math.max( retrieveTimestamp(),
state.lastModified() + 1 );
}
79
Slide 80
Slide 80 text
Ordering updates from different nodes
If GPS clock is available -> use it (mainly PND case).
Prefer the server &me to a client's local 0me.
80
Slide 81
Slide 81 text
Edge case
Mul$ple Clients modify the same element
(concurrently || without a reliable clock).
81
Slide 82
Slide 82 text
One "merge" to rule them all
82
Slide 83
Slide 83 text
Clients & Server MUST have same 'merge'
behaviour.
==
Given the same input, their 'merge' func/ons
emit the same results.
83
Slide 84
Slide 84 text
Divergence may lead to endless synchronisa0on loops!
84
Slide 85
Slide 85 text
"So what?" :)
• Academia is not as scary as might seem to pragma1c devs.
• Look for the best && simplest solu1ons.
• Understand their limita/ons.
• Analyze and monitor real usage.
• Never se?le: constantly search how to tune & improve.
85
Slide 86
Slide 86 text
"Show me the code"
Scala samples
h+ps:/
/github.com/ajan7s/scala-crdt
Java samples
h+ps:/
/github.com/ajan8s/java-crdt
86
Slide 87
Slide 87 text
Homework for the curious minds (Part 1)
• CRDTs for fun and eventual profit, - Noel Welsh, 2013.
• Readings in conflict-free replicated data types, - Christopher
Meiklejohn, 2015.
• A comprehensive study of Convergent and CommutaJve
Replicated Data Types, - Marc Shapiro, Nuno Preguiça, Carlos
Baquero, Marek Zawirski, 2011.
87
Slide 88
Slide 88 text
Homework for the curious minds (Part 2)
• Lasp: A language for distributed, coordina7on-free programming,
- Meiklejohn & Van Roy, 2015.
• Swarm.js+React — real-7me, offline-ready Holy Grail web apps, -
Victor Grishchenko, 2014.
88