Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PWL Mini w/ Wes Chow on Tiered Replication: A C...

PWL Mini w/ Wes Chow on Tiered Replication: A Cost-effective Alternative to Full Cluster Geo-replication

Tiered Replication, by Cidon et. al, explores the problem of effective data replication strategies first introduced in the Copysets paper, awarded 2013 Usenix ATC Best Student Paper. While Copysets introduced a randomized algorithm for solving NP Hard redundancy and load balancing constraints around placement of data in distributed filesystem, Tiered Replication proposes a greedy algorithm for solving the same problem and also adding the ability to bake in real world constraints such as rack awareness. Wes will summarize the problem Copysets proposed, show Tiered Replication’s solution, and examine a real world deployment of the algorithm at Chartbeat.

Papers_We_Love

July 27, 2016
Tweet

More Decks by Papers_We_Love

Other Decks in Technology

Transcript

  1. Data Replication • Want to store 2 copies of data

    for redundancy and performance reasons. • Which nodes do we use? • Random replication: pick 2 nodes completely randomly or deterministically random.
  2. Random Assignment {A B} {A C} {A D} {A E}

    {A F} {B C} … • N = 6, R = 2 • 6 choose 2 = 15 combinations. • Failure of any 2 nodes results in data loss. • 1/15 data on each set.
  3. Facebook (Riak) Replication {A B} {B C} {C D} {D

    E} {E F} {F A} • ⅙ data on each set. • Random two nodes fail, p(loss) = 6/15 = 40%
  4. Simple Assignment {A B} {C D} {E F} • p(loss)

    = 20% • ⅓ data on each set
  5. Terminology • N = number of nodes • R =

    replication factor (# of copies of data) • S = scatter width What is scatter width?
  6. S = 4 A B C B C D A

    restores from B, C, E, F C D E D E F each set = 17% of data E F A P F = 6 / 6c3 = 30% F A B
  7. S = 2 A B C D E F A

    restores from B, C each set = 50% of data P F = 2 / 6c3 = 10%
  8. Random Assignment {A B} {A C} {A D} {A E}

    {A F} {B C} … • p(loss) = 100% • 1/15 data on each set • Scatter width = 5 • E(loss) = 100% * 1 / 15 = 6.7%
  9. Facebook (Riak) Replication {A B} {B C} {C D} {D

    E} {E F} {F A} • p(loss) = 40% • ⅙ data on each set • Scatter width = 2 • E(loss) = 40% * ⅙ = 6.7%
  10. Simple Assignment {A B} {C D} {E F} • p(loss)

    = 20% • ⅓ data on each set • Scatter width = 1 • E(loss) = 20% * ⅓ = 6.7%
  11. The Importance of S • Affects p(loss). • Affects speed

    of restoring single node. • Low S = low p(loss), high damage, slow restore • High S = high p(loss), low damage, fast restore
  12. The Fixed Cost of Failure • Admitting failure on Twitter

    has high fixed cost. Failing for 50% of customers not much worse than 5%. • Going to tape has high fixed cost. Restoring 1 TB not much worse than restoring 1 GB.
  13. Tiered Replication To construct a copyset: 1. Order nodes from

    smallest to largest scatter width. 2. Pick first R nodes. Repeat until all nodes have SW >= S.
  14. TR With Constraints To construct a copyset: 1. Order nodes

    from smallest to largest scatter width. 2. Pick first R nodes satisfying constraints. Repeat until all nodes have SW >= S.
  15. Apache Kafka • High throughput message broker. • Topics broken

    into K partitions. • Each partition handled by primary/secondaries. • Classic master/slave replication. • Consumers subscribe to subset of partitions. • Trepl (https://pypi.python.org/pypi/trepl)
  16. Chartbeat Pings • Browser sends beacon to our servers. •

    275,000 / sec into “pings” topic. • “pings” topic broken into 144 partitions. • 6 brokers. • R = 2 (cost reduction from 3) • AZ aware assignment
  17. Notes • Load balancing. • Copysets is NP-Hard in general.

    • Combinatorial design literature. • Tradeoffs. Embrace or reduce catastrophe? Copysets: https://www.usenix.org/conference/atc13/technical-sessions/presentation/cidon TR: https://www.usenix.org/system/files/conference/atc15/atc15-paper-cidon.pdf