Prac%cal Data Synchroniza%on
&
CRDTs
Dmitry Ivanov @idajan0s
2016
1
Slide 2
Slide 2 text
2
Slide 3
Slide 3 text
NavCloud
3
Slide 4
Slide 4 text
Who We Are
"Fool" stack developers hacking on:
• Backend services
• Client libraries
• Infrastructure && DevOps
4
Slide 5
Slide 5 text
Backend stack
5
Slide 6
Slide 6 text
Client Libraries
6
Slide 7
Slide 7 text
NavCloud Nature
• Unstable connec,ons
• Limited data plans & bandwidth
• Seamless edit/view in offline mode
• Concurrent changes with poten8al
conflicts
• No guarantee on updates order
• No data loss
• Data convergence to expected value
7
Slide 8
Slide 8 text
How to Deal with this Nature?
8
Slide 9
Slide 9 text
Bad programmers worry about the
code. Good programmers worry
about data structures
— Linus Torvalds
9
Slide 10
Slide 10 text
CRDT
10
Slide 11
Slide 11 text
CRDT
DT: Data Type
CRDT is a data type with its own algebra
11
Slide 12
Slide 12 text
CRDT
R: Replicated
CRDT is a family of data structures which
has been designed to be distributed
12
Slide 13
Slide 13 text
CRDT
C: Conflict Free
Resolving conflicts is done automa2cally
13
Slide 14
Slide 14 text
How?
14
Slide 15
Slide 15 text
Merge
15
Slide 16
Slide 16 text
What is Merge?
• A binary opera-on on two CRDTs
• Commuta've: x • y = y • x
• Associa've: ( x • y ) • z = x • ( y • z )
• Idempotent: x • x = x
16
Slide 17
Slide 17 text
How Does it Help?
In Distributed Systems:
• Order is not guaranteed:
• No Problem: Merge is Commuta-ve and Associa-ve
• Events can be delivered more than once:
• No problem: Merge is Idempotent
17
Slide 18
Slide 18 text
What Does it Bring in Prac1ce?
• Local updates
• Local merge of receiving data
• All local merges converge
18
Slide 19
Slide 19 text
Examples
19
Slide 20
Slide 20 text
G-Counter
20
Slide 21
Slide 21 text
G-Counter
Merge: Max of corresponding elements: A:6 B:3 C:9
TotalValue: Sum of all elements: 6 + 3 + 9 = 18
21
Slide 22
Slide 22 text
Max Func)on
• A binary opera-on on two CRDTs
• Commuta've: x max y = y max x
• Associa've: ( x max y ) max z = x max ( y max z )
• Idempotent: x max x = x
22
Slide 23
Slide 23 text
G-Set
23
Slide 24
Slide 24 text
Union Func)on
• A binary opera-on on two CRDTs
• Commuta've: x ∪ y = y ∪ x
• Associa've: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z )
• Idempotent: x ∪ x = x
24
Slide 25
Slide 25 text
G-Set
Merge: Union of sets: { x, y, z, a, b, c }
Total Value: The same as the merge result
25
Slide 26
Slide 26 text
CRDT in NavCloud
26
Slide 27
Slide 27 text
Favorite Loca,ons
Synchroniza,on
27
Slide 28
Slide 28 text
Naive Approach?
28
Slide 29
Slide 29 text
Last Write Wins
29
Slide 30
Slide 30 text
Problems
• Unstable connec-ons
• Actual update -me < Sent -me
• Network latency
• Sent -me < Received -me
• Unreliable clocks
30
Slide 31
Slide 31 text
Stale update may win!
31
Slide 32
Slide 32 text
So What?
32
Slide 33
Slide 33 text
CRDT
33
Slide 34
Slide 34 text
NavCloud Nature vs CRDT
• Unstable connec,ons ✔
• Limited data plans & bandwidth ✔
• Seamless edit/view in offline mode ✔
• Concurrent changes with poten8al
conflicts ✔
• No guarantee on updates order ✔
• No data loss ✔
• Data convergence to expected value ✔
34
Slide 35
Slide 35 text
Same Data Model Everywhere
• Server
• Clients
• Data store
35
Slide 36
Slide 36 text
Merging Conflicts in Riak
36
Slide 37
Slide 37 text
The data consistency is determined
by 'the weakest link' in your pipeline
37
Slide 38
Slide 38 text
Implemen'ng a CRDT Set
What do we want?
• Support for addi-on and removal opera-ons.
• Op-mized for element muta-ons.
• Footprint as compact as possible.
38
Slide 39
Slide 39 text
2-Phase-Set
Supports addi,ons and removals.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
39
2-Phase-Set
Doesn't work for us:
• Removed element can't be added again
• Immutable elements: no updates possible
43
Slide 44
Slide 44 text
LWW-Element-Set
Supports addi,ons and removals, with !mestamps.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
• Each element has a 3mestamp
• Supports re-adding removed element using a higher 3mestamp
44
LWW-Element-Set
Doesn't work for us:
• Immutable elements: no updates possible.
49
Slide 50
Slide 50 text
OR-Set
OR - Observed / Removed
Supports addi,ons and removals, with tags.
• G-Set for added elements
• G-Set for removed elements aka Tombstones
• Unique tag is associated with each element
• Supports re-adding removed elements
50
OR-Set
Lookup
E exists iff it has in AddSet a tag that is not in the RemoveSet.
def lookup(): Set =
addSet.filter { addElem =>
!removeSet.exists { remElem =>
addElem.value == remElem.value
&& remElem.tag.equals(addElem.tag) }
}
.map(_.value);
54
OR-Set
Doesn't work for us:
• Immutable elements: no updates possible.
56
Slide 57
Slide 57 text
OUR-Set
Our take on Observed-Updated-Removed Set
• Each element has a unique iden%fier
• Element can be changed if iden4fier remains the same
• Each element has a %mestamp
• Timestamp is updated on each element muta4on
Iden%ty (immutable unique id) vs Value (mutable)
57
Slide 58
Slide 58 text
OUR-Set
Contains a single underlying set of elements with metadata:
• Each element has a unique id field (e.g. a UUID)
• Each element has a "removed" boolean flag
• Each element has a )mestamp
• Set can only contain one element with a par'cular id
58
CRDT Model: Favorites
FavoriteState element:
• ID (to uniquely iden.fy a favorite)
• Timestamp (to indicate the last change .me)
• Removed flag (to indicate if favorite has been removed)
• Favorite data: ( Name, Loca2on, ... )
64
Slide 65
Slide 65 text
Convergence in case of equal !mestamps
Compare func-on checks all the fields in order of priority:
• Timestamp
• Removed flag (Add or Delete bias)
• .. rest a6ributes ..
65
Slide 66
Slide 66 text
Using CRDT everywhere
• Use the same algorithm everywhere
As simple as calling the merge func8on
66
Slide 67
Slide 67 text
Using CRDT everywhere
Client <-> Server <-> Database
def update(fromClient: OURSet[E]): OURSet[E] = {
val fromDatabase = database.fetch(...)
val newSet = fromDatabase.merge(fromClient)
database.store(..., newSet)
newSet
}
67
Slide 68
Slide 68 text
68
Slide 69
Slide 69 text
Considera*ons & Limita*ons
69
Slide 70
Slide 70 text
"What about garbage?"
• CRDTs tend to grow because of tombstones.
• Deleted Element in the Set == Tombstone.
• A poten?ally unbounded growth.
70
Slide 71
Slide 71 text
Prune deleted elements
But when?
Requirement:
All nodes holding a CRDT Set replica should have seen a deleted
element before it can be pruned.
Otherwise deleted elements can be resurrected.
71
Slide 72
Slide 72 text
Time-To-Live for tombstones
Prune tombstones once TTL exceeded.
if ((DateTime.now() - tombstone.timestamp) > TimeToLive) {
crdtSet.remove(tombstone)
}
Requirement: all nodes holding a CRDT set should apply the same
TTL rule independently.
72
Slide 73
Slide 73 text
Prune deleted elements
Problem
Synchroniza+on between all replicas is needed for correctness.
73
Slide 74
Slide 74 text
Distributed
transac.ons
74
Slide 75
Slide 75 text
- Academia, help!
75
Slide 76
Slide 76 text
76
Slide 77
Slide 77 text
Op#mized OR-Set
Introduces replica awareness
77
Slide 78
Slide 78 text
Op#mized OR-Set
Addi$onal metadata is added to every transferred state.
{ (replica_id -> seq_nr) }
where:
- replica_id - is a unique & stable replica iden5fier.
- seq_nr - monotonically growing (a=er each op) local counter.
78
Slide 79
Slide 79 text
Op#mized OR-Set
Each local state maintains a map:
{ replica_A: 1, replica_B: 1, replica_C: 3 }
If a received state has a seq_nr lower than the corresponding local
value -> ignore.
79
Slide 80
Slide 80 text
Op#mized OR-Set
No Tombstones, yay! ☺
(Slightly) more complicated API: stable replica_id needed. ☹
80
Slide 81
Slide 81 text
Update & Reply with a Diff
Client modifies and sends only updated elements (Diff).
Before: Server responds with a full merge result.
81
Slide 82
Slide 82 text
Update & Reply with a Diff
We introduced a 'Scoped Diff':
Server responds only with the elements which have won against
those sent by the client.
82
Slide 83
Slide 83 text
Server -> Client Diff
83
Slide 84
Slide 84 text
- Academia, help?..
84
Slide 85
Slide 85 text
85
Slide 86
Slide 86 text
δ-CRDT
Builds on replica awareness
Introduces a Causal Context:
map of (replica_id -> seq_nr).
Introduces a Dot Store: CRDT state (No tombstones).
86
Slide 87
Slide 87 text
δ-CRDT
A formalized way to compute a minimal δ-CRDT instances
against a target replica.
87
Slide 88
Slide 88 text
δ-CRDT
Adrian Colyer (The Morning Paper) wrote a great paper review:
blog.acolyer.org/2016/04/25/delta-state-replicated-data-types
88
Slide 89
Slide 89 text
Trouble With Time
89
Slide 90
Slide 90 text
There is no such thing as reliable (me*.
90
Slide 91
Slide 91 text
Tracking *me is actually
tracking causality.
— Jonas Bonér, "Life Beyond the Illusion of Present"
91
Slide 92
Slide 92 text
Causality & Ordering of events.
92
Slide 93
Slide 93 text
Time can be just good enough.
93
Slide 94
Slide 94 text
Ordering updates within a single node
Timestamp field as a logical clock.
Absolute value is not important,
but it should always grow monotonically.
94
Slide 95
Slide 95 text
Ordering updates within a single node
"+1 Strategy" (aka ensure monotonicity):
Long resolveNewTimestamp(ElementState state) {
return Math.max( retrieveTimestamp(),
state.lastModified() + 1 );
}
95
Slide 96
Slide 96 text
Ordering updates from different nodes
If GPS clock is available -> use it (mainly Naviga&on Devices case).
Prefer the server &me to a client's local 0me.
96
Slide 97
Slide 97 text
Edge case
Mul$ple Clients modify the same element
(concurrently || without a reliable clock).
97
Slide 98
Slide 98 text
One "merge" to rule them all
98
Slide 99
Slide 99 text
Clients & Server MUST have same 'merge'
behaviour.
==
Given the same input, their 'merge' func/ons
emit the same results.
99
Slide 100
Slide 100 text
Divergence may lead to endless synchroniza1on loops!
100
Lazy (data) loading
New OURSet Element
• Metadata: UUID, $mestamp, "removed" flag, + tag / hash
• (Op(onal) Data:
Flexible synchroniza1on strategy
Eager || Lazy Fetch
102
Slide 103
Slide 103 text
What have we learned?
• Academia is not as scary as it some-mes seems to pragma,c devs.
• We need be2er and simpler abstrac-ons to develop
Offline-friendly apps.
• CRDTs give a great value, but there are some caveats.
• Things like Lasp (lasp-lang.org) also could be the answer (?).
103
Slide 104
Slide 104 text
Show me the code
github.com/ajan/s/{scala | java}-crdt
104