Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Controlled Epidemics: Riak's New Gossip Protocol and Metadata Store

Jordan West
October 30, 2013

Controlled Epidemics: Riak's New Gossip Protocol and Metadata Store

Users are familiar with how Riak stores their data, but what about its own internal information like node capabilities and ownership details? That data is stored in one of Riak's internal distributed data stores, of course!

This talk will cover one of these stores: Cluster Metadata. Cluster Metadata is an internal, fully-replicated, eventually consistent DHT that will be included in the next release of Riak. The talk will cover, briefly, how this store is used, why it was built, and what improvements it will bring. The primary focus will be covering its design and implementation through some formal, and not so formal, models of its replication and anti-entropy protocols.

Jordan West

October 30, 2013
Tweet

More Decks by Jordan West

Other Decks in Technology

Transcript

  1. Overview Knb‘k Qd‘cr ‘mc Vqhsdr ’v<0+q<0( Hm,Ldlnqx ‘mc Nm,Chrj Knfhb‘k

    Bknbjr Knb‘k Cdsdbshnm ne Bg‘mfdc Jdxr Cluster Metadata
  2. Plumtree by Example 1 2 3 4 5 {2,3}, {1,4},

    {1,5}, {2,5}, {1,4}, eager
  3. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {1,4}, {}, eager lazy
  4. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {} {1,4}, {}, {} {1,5}, {}, {} {2,5}, {}, {} {1,4}, {}, {} eager lazy msgs
  5. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {} {1,4}, {}, {} {1,5}, {}, {} {2,5}, {}, {} {1,4}, {}, {} eager lazy msgs
  6. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {} {1,4}, {}, {} {1,5}, {}, {} {2,5}, {}, {} {1,4}, {}, {} ab‘rs z‘+/+p‘xkn‘c| eager lazy msgs
  7. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {} {1,5}, {}, {} {2,5}, {}, {} {1,4}, {}, {} {a} ab‘rs z‘+/+p‘xkn‘c| eager lazy msgs
  8. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {} {1,5}, {}, {} {2,5}, {}, {} {1,4}, {}, {} {a} ab‘rs z‘+/+p‘xkn‘c| eager lazy msgs
  9. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {} {1,5}, {}, {2,5}, {}, {} {1,4}, {}, {} {a} {a} ab‘rs z‘+/+p‘xkn‘c| eager lazy msgs
  10. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {} {1,5}, {}, {2,5}, {}, {} {1,4}, {}, {} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| eager lazy msgs
  11. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {} {1,4}, {}, {} {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| eager lazy msgs
  12. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {} {1,4}, {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} eager lazy msgs
  13. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {} {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {1,3,4}, eager lazy msgs
  14. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {} {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} ab‘rs z‘+1+p‘xkn‘c| {1,3,4}, eager lazy msgs
  15. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} ab‘rs z‘+1+p‘xkn‘c| {1,3,4}, ab‘rs z‘+1+p‘xkn‘c| eager lazy msgs
  16. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} ab‘rs z‘+1+p‘xkn‘c| {1,3,4}, ab‘rs z‘+1+p‘xkn‘c| eager lazy msgs
  17. Plumtree by Example 1 2 3 4 5 {2,3}, {},

    {1,4}, {}, {1,5}, {}, {2,5}, {}, {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} ab‘rs z‘+1+p‘xkn‘c| {1,3,4}, ab‘rs z‘+1+p‘xkn‘c| eager lazy msgs
  18. Plumtree by Example 1 2 3 4 5 {2,3}, {1,4},

    {}, {1,5}, {}, {2,5}, {}, {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} ab‘rs z‘+1+p‘xkn‘c| {1,3,4}, ab‘rs z‘+1+p‘xkn‘c| {5}, eager lazy msgs
  19. Plumtree by Example 1 2 3 4 5 {2,3}, {1,4},

    {}, {1,5}, {}, {2,5}, {}, {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} ab‘rs z‘+1+p‘xkn‘c| {1,3,4}, ab‘rs z‘+1+p‘xkn‘c| {5}, pqtmd eager lazy msgs
  20. Plumtree by Example 1 2 3 4 5 {2,3}, {1,4},

    {}, {1,5}, {}, {2,5}, {}, {a} {a} {a} ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} ab‘rs z‘+1+p‘xkn‘c| ab‘rs z‘+1+p‘xkn‘c| {5}, pqtmd {3,4}, {1}, eager lazy msgs
  21. Plumtree by Example 1 2 3 4 5 {2,3}, {5},

    {} {1,4}, {}, {} {1,5}, {}, {} {2}, {5}, {} {3}, {1,4}, {} eager lazy msgs
  22. Plumtree by Example 1 2 4 5 {2,3}, {5}, {}

    {1,4}, {}, {} {1,5}, {}, {} {2}, {5}, {} {3}, {1,4}, {} 3 eager lazy msgs
  23. Plumtree by Example 1 2 4 5 {2,3}, {5}, {1,4},

    {}, {1,5}, {}, {} {2}, {5}, {} {3}, {1,4}, {} 3 ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} eager lazy msgs
  24. Plumtree by Example 1 2 4 5 {2,3}, {5}, {1,4},

    {}, {1,5}, {}, {} {2}, {5}, {} {3}, {1,4}, {} 3 ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} hg‘ud z‘+/| eager lazy msgs
  25. Plumtree by Example 1 2 4 5 {2,3}, {5}, {1,4},

    {}, {1,5}, {}, {} {2}, {5}, {} {} 3 ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} hg‘ud z‘+/| {1,3}, {4}, eager lazy msgs
  26. Plumtree by Example 1 2 4 5 {2,3}, {5}, {1,4},

    {}, {1,5}, {}, {} {2}, {5}, {} {} 3 ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} hg‘ud z‘+/| fq‘es z‘+/| {1,3}, {4}, eager lazy msgs
  27. Plumtree by Example 1 2 4 5 {1,4}, {}, {1,5},

    {}, {} {2}, {5}, {} {} 3 ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} hg‘ud z‘+/| fq‘es z‘+/| {1,3}, {4}, {2,3,5}, {}, eager lazy msgs
  28. Plumtree by Example 1 2 4 5 {1,4}, {}, {1,5},

    {}, {} {2}, {5}, {} {} 3 ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} hg‘ud z‘+/| fq‘es z‘+/| {1,3}, {4}, ab‘rs z‘+/+p‘xkn‘c| {2,3,5}, {}, eager lazy msgs
  29. Plumtree by Example 1 2 4 5 {1,4}, {}, {1,5},

    {}, {} {2}, {5}, {} {} 3 ab‘rs z‘+/+p‘xkn‘c| ab‘rs z‘+0+p‘xkn‘c| {a} {a} hg‘ud z‘+/| fq‘es z‘+/| {1,3}, {4}, hg‘ud z‘+0| ab‘rs z‘+/+p‘xkn‘c| {2,3,5}, {}, eager lazy msgs
  30. Plumtree in Riak Peer Service Requirements Rsqnmf Bnmmdbshuhsx 0/j* Mncdr

    O‘qsh‘kkx Bnmmdbsdc Vd‘jdq Bnmmdbshuhsx 4 , 0// Mncdr Etkkx Bnmmdbsdc
  31. Plumtree in Riak Peer Service Requirements Rsqnmf Bnmmdbshuhsx 0/j* Mncdr

    O‘qsh‘kkx Bnmmdbsdc Qd‘bshud Ldladqrghp Vd‘jdq Bnmmdbshuhsx 4 , 0// Mncdr Etkkx Bnmmdbsdc Npdq‘snq,Bnmsqnkkdc
  32. Plumtree in Riak “[Plumtree] is able to support large number

    of faults while maintaining a high reliability.” Plumtree Focus
  33. Plumtree in Riak “[Plumtree] is able to support large number

    of faults while maintaining a high reliability.” Plumtree Focus
  34. Plumtree in Riak 1 2 4 5 3 ab‘rs z‘+/+p‘xkn‘c|

    ab‘rs z‘+0+p‘xkn‘c| hg‘ud z‘+/| hg‘ud z‘+0|
  35. RMR - 5 Nodes - Stable - 20 Rounds Qntmc

    QLQ / 1 3 5 7 0 1 2 3 4 5 6 7 8 0/ 00 01 02 03 04 05 06 07 08 1/ Aqn‘cb‘rs Fnrrhp Results
  36. Controlled Epidemics LDH - 5 Nodes - Stable - 20

    Rounds Qntmc KCG / 1-14 3-4 5-64 8 0 1 2 3 4 5 6 7 8 0/ 00 01 02 03 04 05 06 07 08 1/ Aqn‘cb‘rs Fnrrhp Results
  37. Controlled Epidemics RMR - 10, 20, 40 Nodes - Stable

    - 20 Rounds Qntmc QLQ / /-14 /-4 /-64 0 0 1 2 3 4 5 6 7 8 0/ 00 01 02 03 04 05 06 07 08 1/ 0/ Mncdr 1/ Mncdr 3/ Mncdr Results
  38. Controlled Epidemics LDH - 10, 20, 40 Nodes - Stable

    - 20 Rounds Qntmc KCG / 0-4 2 3-4 5 0 1 2 3 4 5 6 7 8 0/ 00 01 02 03 04 05 06 07 08 1/ 0/ Mncdr 1/ Mncdr 3/ Mncdr Results
  39. Controlled Epidemics Avg. LDH - 10, 20, 40 Nodes -

    Stable / 0-14 1-4 2-64 4 0/ Mncdr 1/ Mncdr 3/ Mncdr KCG Results
  40. Controlled Epidemics % Failures Before Exchange Needed / 11-4 34

    56-4 8/ 4 Mncdr 0/ Mncdr 1/ Mncdr 3/ Mncdr @uf- $ E‘hktqdr Results
  41. Controlled Epidemics % Failures Before Exchange Needed / 11-4 34

    56-4 8/ 0/ Mncdr 1/ Mncdr 3/ Mncdr E‘mnts < 3 E‘mnts < 1 Results
  42. Controlled Epidemics % Failures Before Exchange Needed (Variance) / 0-64

    2-4 4-14 6 0/ Mncdr 1/ Mncdr 3/ Mncdr E‘mnts < 3 E‘mnts < 1 Results
  43. In Riak Future @kk Atbjds Oqnpdqshdr Mncd B‘p‘ahkhshdr Lhfq‘sd Nsgdq

    Dwhrshmf Ed‘stqdr Lnqd Mdv Ed‘stqdr Cluster Metadata