Data Replication
● Want to store 2 copies of data for
redundancy and performance reasons.
● Which nodes do we use?
● Random replication: pick 2 nodes completely
randomly or deterministically random.
Slide 3
Slide 3 text
Random Assignment
{A B} {A C} {A D} {A E} {A F} {B C} …
● N = 6, R = 2
● 6 choose 2 = 15 combinations.
● Failure of any 2 nodes results in data loss.
● 1/15 data on each set.
Slide 4
Slide 4 text
Data Loss
Slide 5
Slide 5 text
Facebook (Riak) Replication
{A B}
{B C}
{C D}
{D E}
{E F}
{F A}
● ⅙ data on each set.
● Random two nodes fail, p(loss) =
6/15 = 40%
Slide 6
Slide 6 text
Simple Assignment
{A B}
{C D}
{E F}
● p(loss) = 20%
● ⅓ data on each set
Slide 7
Slide 7 text
Terminology
● N = number of nodes
● R = replication factor (# of copies of data)
● S = scatter width
What is scatter width?
Slide 8
Slide 8 text
S = 4
A B C
B C D A restores from B, C, E, F
C D E
D E F each set = 17% of data
E F A P
F
= 6 / 6c3 = 30%
F A B
Slide 9
Slide 9 text
S = 2
A B C
D E F A restores from B, C
each set = 50% of data
P
F
= 2 / 6c3 = 10%
Slide 10
Slide 10 text
Random Assignment
{A B} {A C} {A D} {A E} {A F} {B C} …
● p(loss) = 100%
● 1/15 data on each set
● Scatter width = 5
● E(loss) = 100% * 1 / 15 = 6.7%
Slide 11
Slide 11 text
Facebook (Riak) Replication
{A B}
{B C}
{C D}
{D E}
{E F}
{F A}
● p(loss) = 40%
● ⅙ data on each set
● Scatter width = 2
● E(loss) = 40% * ⅙ = 6.7%
Slide 12
Slide 12 text
Simple Assignment
{A B}
{C D}
{E F}
● p(loss) = 20%
● ⅓ data on each set
● Scatter width = 1
● E(loss) = 20% * ⅓ = 6.7%
Slide 13
Slide 13 text
The Importance of S
● Affects p(loss).
● Affects speed of restoring single node.
● Low S = low p(loss), high damage, slow
restore
● High S = high p(loss), low damage, fast
restore
Slide 14
Slide 14 text
The Fixed Cost of Failure
● Admitting failure on Twitter has high fixed
cost. Failing for 50% of customers not much
worse than 5%.
● Going to tape has high fixed cost. Restoring
1 TB not much worse than restoring 1 GB.
Slide 15
Slide 15 text
End Copysets.
Goto Tiered Replication.
Slide 16
Slide 16 text
Tiered Replication
To construct a copyset:
1. Order nodes from smallest to largest
scatter width.
2. Pick first R nodes.
Repeat until all nodes have SW >= S.
Slide 17
Slide 17 text
TR With Constraints
To construct a copyset:
1. Order nodes from smallest to largest
scatter width.
2. Pick first R nodes satisfying
constraints.
Repeat until all nodes have SW >= S.
Slide 18
Slide 18 text
Possible Constraints
● Rack awareness.
● Resource differences in nodes.
● Tiered storage. What is that?
Slide 19
Slide 19 text
MTTF
Slide 20
Slide 20 text
Apache Kafka
● High throughput message broker.
● Topics broken into K partitions.
● Each partition handled by
primary/secondaries.
● Classic master/slave replication.
● Consumers subscribe to subset of partitions.
● Trepl (https://pypi.python.org/pypi/trepl)
Slide 21
Slide 21 text
Chartbeat Pings
● Browser sends beacon to our servers.
● 275,000 / sec into “pings” topic.
● “pings” topic broken into 144 partitions.
● 6 brokers.
● R = 2 (cost reduction from 3)
● AZ aware assignment
Slide 22
Slide 22 text
Notes
● Load balancing.
● Copysets is NP-Hard in general.
● Combinatorial design literature.
● Tradeoffs. Embrace or reduce catastrophe?
Copysets: https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon
TR: https://www.usenix.org/system/files/conference/atc15/atc15-paper-cidon.pdf