Slide 1

Slide 1 text

Copysets + Tiered Replication - Cidon et. al Wes Chow / CTO, Chartbeat / @weschow

Slide 2

Slide 2 text

Data Replication ● Want to store 2 copies of data for redundancy and performance reasons. ● Which nodes do we use? ● Random replication: pick 2 nodes completely randomly or deterministically random.

Slide 3

Slide 3 text

Random Assignment {A B} {A C} {A D} {A E} {A F} {B C} … ● N = 6, R = 2 ● 6 choose 2 = 15 combinations. ● Failure of any 2 nodes results in data loss. ● 1/15 data on each set.

Slide 4

Slide 4 text

Data Loss

Slide 5

Slide 5 text

Facebook (Riak) Replication {A B} {B C} {C D} {D E} {E F} {F A} ● ⅙ data on each set. ● Random two nodes fail, p(loss) = 6/15 = 40%

Slide 6

Slide 6 text

Simple Assignment {A B} {C D} {E F} ● p(loss) = 20% ● ⅓ data on each set

Slide 7

Slide 7 text

Terminology ● N = number of nodes ● R = replication factor (# of copies of data) ● S = scatter width What is scatter width?

Slide 8

Slide 8 text

S = 4 A B C B C D A restores from B, C, E, F C D E D E F each set = 17% of data E F A P F = 6 / 6c3 = 30% F A B

Slide 9

Slide 9 text

S = 2 A B C D E F A restores from B, C each set = 50% of data P F = 2 / 6c3 = 10%

Slide 10

Slide 10 text

Random Assignment {A B} {A C} {A D} {A E} {A F} {B C} … ● p(loss) = 100% ● 1/15 data on each set ● Scatter width = 5 ● E(loss) = 100% * 1 / 15 = 6.7%

Slide 11

Slide 11 text

Facebook (Riak) Replication {A B} {B C} {C D} {D E} {E F} {F A} ● p(loss) = 40% ● ⅙ data on each set ● Scatter width = 2 ● E(loss) = 40% * ⅙ = 6.7%

Slide 12

Slide 12 text

Simple Assignment {A B} {C D} {E F} ● p(loss) = 20% ● ⅓ data on each set ● Scatter width = 1 ● E(loss) = 20% * ⅓ = 6.7%

Slide 13

Slide 13 text

The Importance of S ● Affects p(loss). ● Affects speed of restoring single node. ● Low S = low p(loss), high damage, slow restore ● High S = high p(loss), low damage, fast restore

Slide 14

Slide 14 text

The Fixed Cost of Failure ● Admitting failure on Twitter has high fixed cost. Failing for 50% of customers not much worse than 5%. ● Going to tape has high fixed cost. Restoring 1 TB not much worse than restoring 1 GB.

Slide 15

Slide 15 text

End Copysets. Goto Tiered Replication.

Slide 16

Slide 16 text

Tiered Replication To construct a copyset: 1. Order nodes from smallest to largest scatter width. 2. Pick first R nodes. Repeat until all nodes have SW >= S.

Slide 17

Slide 17 text

TR With Constraints To construct a copyset: 1. Order nodes from smallest to largest scatter width. 2. Pick first R nodes satisfying constraints. Repeat until all nodes have SW >= S.

Slide 18

Slide 18 text

Possible Constraints ● Rack awareness. ● Resource differences in nodes. ● Tiered storage. What is that?

Slide 19

Slide 19 text

MTTF

Slide 20

Slide 20 text

Apache Kafka ● High throughput message broker. ● Topics broken into K partitions. ● Each partition handled by primary/secondaries. ● Classic master/slave replication. ● Consumers subscribe to subset of partitions. ● Trepl (https://pypi.python.org/pypi/trepl)

Slide 21

Slide 21 text

Chartbeat Pings ● Browser sends beacon to our servers. ● 275,000 / sec into “pings” topic. ● “pings” topic broken into 144 partitions. ● 6 brokers. ● R = 2 (cost reduction from 3) ● AZ aware assignment

Slide 22

Slide 22 text

Notes ● Load balancing. ● Copysets is NP-Hard in general. ● Combinatorial design literature. ● Tradeoffs. Embrace or reduce catastrophe? Copysets: https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon TR: https://www.usenix.org/system/files/conference/atc15/atc15-paper-cidon.pdf