PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
PRACTICAL
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
DEMYSTIFICATION OF
CRDTS
CRDTS
CRDTS
CRDTS
CRDTS
CRDTS
AMSTERDAM.SCALA MEETUP, 27TH AUGUST 2015
Nami Nasserazad < > , Didier Liauw ,
Dmitry Ivanov < >
@nami4552
@idajantis
Slide 2
Slide 2 text
DISCLAIMER & WARNING
We are neither distributed systems experts,
nor hardcore academia guys.
There is no Scala-only specific stuff in the talk.
Slide 3
Slide 3 text
WHO ARE WE?
Full stack developers*
Server
Mobile / SDKs for
different platforms
Infrastructure / AWS
* - sorry for a buzz-word :)
Slide 4
Slide 4 text
WHAT IS NAVCLOUD?
A cloud based storage service to allow users to
seamlessly synchronize trip information
(destination, favorite locations, community
points of interest, routes etc.) between devices
as well as MyDrive website.
NavCloud aims to be scalable and reactive
while ensuring privacy and security.
Slide 5
Slide 5 text
DEVELOPMENT STACK
Scala
Akka
Spray
RabbitMQ
Riak
AWS
Slide 6
Slide 6 text
SDKS
Java stateless SDK
Encryption/Decryption
Android and iOS statefull SDK
Seamlessly working in offline mode
Re-establishing push notification channel if
connection drops
Refreshing session upon token expiration
Resumable and bandwidth optimized
download/upload for large contents
Slide 7
Slide 7 text
CHARACTERISTICS
Devices are not always available
Edit/View should work in offline mode: No
Strong Consistency
Data should be converged to a correct
eventual state
Order is not guaranteed
Bandwidth is limited: Only changes should be
transmitted
Slide 8
Slide 8 text
WHAT IS CRDT?
DT
Data Type
“ Bad programmers worry about the
code.
Good programmers worry about
data structures and their
relationships ”
Slide 9
Slide 9 text
WHAT IS CRDT?
R
Replicated
CRDT is a family of data structures which has
been designed to be distributed
Slide 10
Slide 10 text
WHAT IS CRDT?
C
Conflict Free
Resolving conflicts is done automatically
Slide 11
Slide 11 text
WHAT DOES IT BRING IN PRACTICE?
local updates without needing remote
synchronization
local merge upon receiving data from other
nodes
guaranteeing that all local merges converge
Slide 12
Slide 12 text
HOW?
MERGE
MERGE
MERGE
MERGE
MERGE
MERGE
Slide 13
Slide 13 text
WHAT IS MERGE?
Binary operation on two CRDTs
Commutative: x • y = y • x
Associative: ( x • y ) • z = x • ( y • z )
Idempotent: x • x = x
Slide 14
Slide 14 text
HOW DOES IT HELP?
In Distributed Systems:
Order is not guaranteed: No problem:
Merge is Commutative and Associative
Events can be delivered more than one
time: No problem: Merge is Idempotent
Slide 15
Slide 15 text
EXAMPLE
G-COUNTER
Slide 16
Slide 16 text
G-COUNTER
Each node has a counter
Each node should only increase its own
counter
G-Counter Data Type: An array of counters
where each element belongs to a node
Slide 17
Slide 17 text
G-COUNTER
Machine A: A:6 B:0 C:0
Machine B: A:0 B:3 C:0
Machine C: A:0 B:0 C:9
Merge: Max on corresponding elements: A:6 B:3
C:9
Total value: Sum of all elements: 6 + 3 + 9 = 18
Slide 18
Slide 18 text
MAX FUNCTION
Binary operation on two CRDTs
Commutative: x max y = y max x
Associative: ( x max y ) max z = x max ( y max z )
Idempotent: x max x = x
Slide 19
Slide 19 text
EXAMPLE
G-SET
Slide 20
Slide 20 text
G-SET
Each node has a set
Each node should add element to its own set
G-Set Data Type: An array of sets where each
set belongs to a node
Slide 21
Slide 21 text
G-SET
Machine A: A:{x, y} B:{} C:{}
Machine B: A:{} B:{z} C:{}
Machine C: A:{} B:{} C:{a, b, c}
Merge: Union on corresponding sets: A:{x, y} B:
{z} C:{a, b, c}
Total value: Union of all sets: {x, y, z, a, b, c}
Slide 22
Slide 22 text
UNION FUNCTION
Binary operation on two CRDTs
Commutative: x ∪ y = y ∪ x
Associative: ( x ∪ y ) ∪ z = x ∪ ( y ∪ z )
Idempotent: x ∪ x = x
Slide 23
Slide 23 text
HOW DID CRDT HELP IN NAVCLOUD?
Slide 24
Slide 24 text
SYNCHRONIZING FAVORITES
SET OF FAVORITES
Name
Latitude/Longitude
...
Slide 25
Slide 25 text
SYNCHRONIZING FAVORITES
USE CASES
Slide 26
Slide 26 text
SYNCHRONIZING FAVORITES
USE CASES
Users can add, delete or modify
Replica's are spread over multiple devices:
Client devices might not be connected (yet)
Modifications have to be done without
synchronization with remote replicas
Slide 27
Slide 27 text
SYNCHRONIZING FAVORITES
NAIVE APPROACH AND PROBLEMS
Whenever clients make connections to the
server local state is sent to the server
Synchronization is done on the server by
using a Last Write Wins strategy
This can result in inconsistencies, due to:
Time when updates are sent to the server
differs from the update time
Network latency
Unreliable clocks on the server
Slide 28
Slide 28 text
SYNCHRONIZING FAVORITES
CRDTS MATCH VERY WELL WITH OUR SITUATION: CLIENTS
Synchronization can be done locally:
Changes are made instantly
No connection is needed for proper
synchronization
Synchronization is decentralized
Order is not important
Latency, failed requests etc. have no affect
Implementation becomes easier, because it
is contained within a single component
Slide 29
Slide 29 text
SYNCHRONIZING FAVORITES
CRDTS MATCH VERY WELL WITH OUR SITUATION: SERVER
We use CRDTs everywhere, from the clients
to even the individual database nodes
We use Riak which supports CRDTs by
storing siblings and allows users to resolve
them themselves
Which means that our database is fully
partition tolerant
Slide 30
Slide 30 text
CRDT
WHICH ONE: 2P-SET
2 Sets:
Add Set
Remove Set: Also known as tombstone set
Merge: Take the union of the add-sets and
remove-sets
Lookup: Contains an element if it is in the
add-set and not in the remove-set
Doesn't work for us:
1. Once removed you cannot add again
2. Mutating values (updates) is not possible
Slide 31
Slide 31 text
CRDT
WHICH ONE: 2P-SET
A B
Add-Set {"cat", "dog" } {"cat", "ape"}
Remove-Set {"cat"} {}
Merge Add-Set {"cat", "dog", "ape"}
Remove-Set {"cat"}
Lookup {"dog", "ape"}
Slide 32
Slide 32 text
CRDT
WHICH ONE: LWW-ELEMENT-SET
Attaches a timestamp to each element
You can add again by adding the element with
a higher timestamp than the one in the
remove-set
Merge: Take the union of the add-sets and
remove-sets
Lookup: Contains the element if it is in add-
set and not in remove-set with a higher
timestamp
Still doesn't work for us, because mutating is
Slide 33
Slide 33 text
CRDT
WHICH ONE: LWW-ELEMENT-SET
A B
Add-Set {(1,"cat"),
(1,"dog")}
{(5,"cat"),
(1,"ape")}
Remove-
Set
{(3,"cat")} {(1,"cat")
Merge Add-Set {(1,"cat"), (5,"cat"), (1,"dog"),
(1,"ape")}
Remove-
Set
{(1,"cat"),(3,"cat")}
Slide 34
Slide 34 text
CRDT
WHICH ONE: OR-SET
You can add again
Store elements with a unique identifier
Deleting an element adds it to the remove-set for
all the (element,id) in the add-set
Merge: Take the union of the add-sets and
remove-sets
Lookup: Contains the element if there is an
element in the add-set with an identifier that is
not in the remove-set
Doesn't work for us, because it doesn't support
updates
Slide 35
Slide 35 text
CRDT
WHICH ONE: OR-SET
A B
Add-Set {(#a,"cat"),
(#b,"dog")}
{(#c,"cat"),
(#d,"ape")}
Remove-
Set
{(#a,"cat")} {(#a,"cat")
Merge Add-Set {(#a,"cat"), (#c,"cat"),
(#b,"dog"), (#d,"ape")}
Remove-Set {(#a,"cat")}
Lookup {"cat", "dog", "ape"}
Slide 36
Slide 36 text
CRDT
OUR-SET
Combination of all the sets
Store elements with a unique identifier
This identifier is actually used to identify an
element
Element can be changed if identifier
remains the same
Updates are possible!
Store elements with a timestamp
Is updated on any change
Slide 37
Slide 37 text
CRDT
OUR-SET
Single Set
No add-set and removed sets
Replaced by a removed flag
Set can only contain one element with a
particular id
Element with highest timestamp wins
Merge: Take union of the two sets and for every
element with the same identifier take only the
highest one
Lookup: Contains the element if there is an
element with the same id and the removed flag is
Slide 38
Slide 38 text
CRDT
OUR-SET
A B
Set {(#a,1,"cat",removed),
(#b,2,"dog",removed)}
{(#a,5,"tiger"),
(#c,1,"ape"),
(#b,1,"dog"}
Merge Set {(#a,5,"tiger"),
(#b,2,"dog",removed), (#c,1,"ape")}
Lookup {"tiger", "ape"}
Slide 39
Slide 39 text
IMPLEMENTATION
CRDT MODEL: FAVORITE
ID
to uniquely identify a favorite
Timestamp
to indicate when the last change was made
Removed Flag
to indicate that the favorite has been
removed
Name
Latitude/Longitude
...
Slide 40
Slide 40 text
IMPLEMENTATION
METHODS
Add a compare function, which compares all
the fields in order of priority:
Timestamp
Removed flag
...
Add an equals and hash function
Slide 41
Slide 41 text
IMPLEMENTATION
USING THE CRDT
Use the same algorithm everywhere
As simple as calling the merge function
Slide 42
Slide 42 text
IMPLEMENTATION
USING THE CRDT
//Synchronize doesn't return anything because client is
already synced
//This is purely for the server and database
def synchronize(fromClient: CRDTSet, database: CRDTcomp
onent): Unit = {
val changedSet = fromClient
val currentSet = database.crdtset
val newSet = currentSet.merge(changedSet)
database.push(newSet) //This is fire and forget
}
Slide 43
Slide 43 text
CONSIDERATIONS & LIMITATIONS
Slide 44
Slide 44 text
"WHAT ABOUT GARBAGE?"
CRDTs tend to grow because of tombstones
Deleted Favorite in the Set == Tombstone
A potentially unbounded growth.
Case: MyDrive user with ~3000 deleted
favorites and 5 non-deleted ones. -> 1Mb
Favorites.json
Slide 45
Slide 45 text
"WHAT ABOUT GARBAGE?"
Solution #1: Prune deleted favorites
But when?
Requirement: all nodes holding a Favorites set should
have seen a deleted element before it can be pruned.
Otherwise deleted elements can be resurrected.
Slide 46
Slide 46 text
"WHAT ABOUT GARBAGE?"
Client-awareness: capturing a timestamp of the last sync
between a client and the service.
if (clients.forAll(_.lastSyncTimestamp > deletedFavorite.lastUpdat
edTimestamp)) {
favorites.drop(deletedFavorite)
}
Slide 47
Slide 47 text
"WHAT ABOUT GARBAGE?"
Solution #2: Sending only diff upon any update.
Client has a set of [A', B', C']; Server has a set of [A'', B''',
C'].
Client modifies and sends [B'']
Before: responding with a full merged set [A'', B''', C'].
We introduced a scoped diff:
Now: responding with a diff set [B'''] as B'' update from
the client has lost to B''' on the server.
Slide 48
Slide 48 text
TROUBLE WITH TIME
There is no such thing as reliable time.
(c) Jonas Bonér, "Life Beyond the Illusion of Present"
Important: Causality and events Ordering.
“ Tracking time is actually tracking
causality. ”
Slide 49
Slide 49 text
TROUBLE WITH TIME
A time that is just good enough.
Ordering updates between different nodes:
If GPS clock is available -> use it (PND case).
Prefer the server time to a client local time.
WARN: conflicts may happen if two or more devices are
modifying the same Favorite element concurrently.
Slide 50
Slide 50 text
TROUBLE WITH TIME
Ordering updates within a node boundary:
Timestamp field as a logical clock.
Timestamp should always grow monotonically.
"+1 Strategy"
def getFavoriteTimestamp(favorite: Favorite): Long = {
Math.max(client.retrieveServerTimestamp(), favorite.lastModified +
1)
}
Slide 51
Slide 51 text
ONE 'MERGE' TO RULE THEM ALL
Client and server should behave the same way
when merging Favorites CRDT states.
==
When given the same input,
their merge functions should emit the same
results.
WARN: divergence can lead to endless
synchronisation loops!
Slide 52
Slide 52 text
ONE 'MERGE' TO RULE THEM ALL
Sharing common CRDT-related code (Classes,
merge/diff/equals/compare logic) FTW.
Case #1: Scala.JS client with Web Server in Scala.
Richard Dallaway
Case #2: Using a TCK library to verify client
compatibility.
"Towards Browser and Server Utopia with Scala.JS"
Slide 53
Slide 53 text
RIAK & CRDTS
Data Agnostic vs Data Awareness
Counters
Sets
Maps
Flags *
Registers *
* - Embedded within a Map only
Slide 54
Slide 54 text
RIAK & CRDTS
Pros
Simplicity. No 'Read -> Merge -> Write' code
is needed on the server.
Composability: Most of the data can be
modelled by combining supported primitive
types.
Proven and tested*.
* - Basho is serious about testing their stuff:
"Distributed data structures with Coq", Christopher Meiklejohn.
Slide 55
Slide 55 text
RIAK & CRDTS
Cons
No fine-grained merge: lack of merge
strategy control on the server.
Clients complexity: clients have to carry a
Data Type context (a-la 'causal context' with
Vector Clocks).
Riak 2.0+ only. *
* - For those who is still on Riak 1.4. ^_^
Slide 56
Slide 56 text
CONCLUSIONS
Academia sometimes is not as scary as it seems to
pragmatic devs.
Look for the best & simplest solutions.
Understand your solution limitations.
Analyse and monitor real usage.
Always search how to tune & improve your solutions
Slide 57
Slide 57 text
USEFUL REFERENCES
, - Noel Welsh, 2013.
,
Christopher Meiklejohn
, - Marc Shapiro, Nuno Preguiça, Carlos Baquero,
Marek Zawirski, 2011
, - Meiklejohn & Van Roy, 2015
CRDTs for fun and eventual profit
Readings in conflict-free replicated data types
A comprehensive study of Convergent and
Commutative Replicated Data Types
Lasp: A language for distributed, coordination-free
programming
Slide 58
Slide 58 text
Image credit: Dex Media
Slide 59
Slide 59 text
WE ARE HIRING!
Interested in hacking on this stuff?
http://www.tomtom.jobs/