Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A New Concept of Consistency in Distributed Database and Implementation in Riak

A New Concept of Consistency in Distributed Database and Implementation in Riak

Web+DB forum 技術報告 by Basho

E1923013dacab39eb231a2fffbf7b33c?s=128

UENISHI Kota

November 28, 2013
Tweet

Transcript

  1. ෼ࢄσʔλϕʔεʹ͓͚Δ ৽͍͠੔߹ੑϞσϧͱ Riakʹ͓͚Δ࣮૷ 2013 / 11 / 28 WebDB Forum

    Basho ্੢߁ଠ
  2. ෼ࢄσʔλϕʔεʹ͓͚Δ ݹͯ͘৽͍͠੔߹ੑϞσϧͱ Riakʹ͓͚Δ࣮૷ 2013 / 11 / 28 WebDB Forum

    Basho ্੢߁ଠ
  3. BashoͱRiak •෼ࢄσʔλϕʔεʁ •RiakΛ஌͍ͬͯΔʁ •BashoΛ஌͍ͬͯΔʁ

  4. CAPఆཧͱཧ૝ͷDB •ͲΜͳނোʹରͯ͠΋ (partition tolerance) •σʔλ͸ৗʹ੔߹͓ͯ͠Γ (consistency) •γεςϜ͕ࢭ·Δ͜ͱ͸ͳ͍ (availability) ͜ͷ3ͭΛಉ࣌ʹຬͨ͢γεςϜ͸ଘࡏ͠ͳ͍

  5. •Մ༻ੑ (Availability) ͕ಛ௃ͷσʔλ ϕʔε •ӡ༻͠΍͍͢ɺେ͖ͳσʔλͰ΋ೖΔ •҆ఆੑɺ༧ଌՄೳੑ •ʮσʔλΛઈରʹͳ͘͞ͳ͍ʯ

  6. ͜Μͳͱ͜ΖͰ ಈ͍͍ͯ·͢Riak •Rovio (Angry Birds) •Yahoo!JAPAN ͷΫϥ΢υετϨʔδ •NHS (ΠΪϦε ࠃຽอݥαʔϏε)

    •Bump (=>Google) •ۜߦɺήʔϜɺখചɺηϯαʔɺetc…
  7. How Riak Works

  8. Consistent Hashing • 160-bit Ωʔۭؒ • ۭؒΛ౳෼͢Δ • ύʔςΟγϣϯ͸ϊʔ υ͕ݸผ؅ཧ

    • ϨϓϦΧ͸Nݸͷύʔ ςΟγϣϯʹίϐʔ͞ ΕΔ OPEF OPEF OPEF OPEF hash(“meetups/spamham”) N=3
  9. Consistency͸೉͍͠ •ߋ৽ΛࢭΊΔʢAvailabilityΛԼ͛Δʣ͔ɺߋ৽ͷ্ॻ͖Λ ڐ͢ʢσʔλΛࣦ͏ʣ͔͔͠બ୒ࢶ͕ͳ͍ Server2 Server1 Server3 PUT V=42 PUT V=0

    V=?
  10. Consistencyͷ୅ΘΓʹ •ͱΓ͋͑ͣෳ਺ͷόʔδϣϯͷڞଘΛڐ͢ •Ͳͷόʔδϣϯ͕ਖ਼͍͔͠ɺ΋͘͠͸Ϛʔδ͢Δ͔ΛRead࣌ʹܾఆ Server2 Server1 Server3 PUT V=42 PUT V=0

    V=0 or 42 V=0 V=0 or 42 V=42
  11. APΛ࣮ݱ •ωοτϫʔΫ෼அ͕ى͖͍ͯͯ΋ͱΓ͋͑ͣॻ͖ࠐΈΛڐ͢ Server2 Server1 Server3 PUT V=42 PUT V=0 Server4

    ෮چͨ͠Βॻ͖໭͢ ྆ํ͓࣋ͬͯ͘
  12. γϣοϐϯάΧʔτͷྫ •UnionΛͱΕ͹Α͍ Server2 Server1 Server3 PUT cart=[a,b,d] PUT cart=[a,b,c] union([a,b,c],

    [a,b,d]) => [a,b,c,d] [a,b,c] [a,b,c] or [a,b,d] [a,b,d]
  13. ෳ਺όʔδϣϯΛ ڐ͢͜ͱͷ೉఺ •ϓϩάϥϛϯά͕೉͍͠ʢτϥϯβΫγϣϯ͸ૉ੖ Β͍͠ʣ •ݱ࣮ੈք͸γϣοϐϯάΧʔτͱΧ΢ϯλʔ͚ͩ Ͱ͸ͳ͍ •҆શͳMerge, update͕Ͱ͖Δσʔλߏ଄Λຖճ ߟ͑ͳ͚Ε͹ͳΒͳ͍ •࢖͍ͬͯΔ͏ͪʹࣅͨΑ͏ͳϥΠϒϥϦ͕͋ͪ͜

    ͪͰग़དྷ্͕Δ
  14. ͳͥ೉͍͠ͷ͔ʁ •σʔλͷWriteͱWrite͕ೖΕସΘΓ͏ ΔʹSerializableͲ͜Ζ͔Write΋Ұ؏ ͨ͠ঢ়ଶʹͰ͖ͳ͍ Server2 Server1 Server3 w1 w2 w1

    w2 w2 (w1 lost)
  15. Logical Monoticity •σʔλʹର͢ΔՄ׵ͳૢ࡞ͷΈΛڐ͢ʂ Data = update(w2, update(w1, Data0)) = update(w1,

    update(w2, Data0)) Data = merge(update(w2, Data0), Data)
  16. ౴͑: CRDT •ʮෳ੡ՄೳͳՄ׵σʔλܕʯ •Conflict-Free Replicated Data Types •Commutative Replicated Data

    Types •… •(Going to be included in Riak 2.0) ஫) CRDTͷ࡞ऀ͸Logical Monotinicy ͱ͍͏ݴ༿͸࢖͍ͬͯͳ͍
  17. CRDT in Riak 2.0 •KVSͷVʹʮܕʯΛ࣋ͨͤͯɺܕʹΑͬͯ UpdateͱMergeͷϩδοΫΛܾΊΔ •Read࣌ʹMerge͕αʔόʔଆͰࣗಈతʹ࣮ ߦ͞ΕΔ •ΞϓϦέʔγϣϯ͸ܕΛࢦఆ͢Δ͚ͩͰΑ͘ɺ ෳ਺όʔδϣϯͷϋϯυϦϯά͕ෆཁʹͳΔ

  18. CRDT example •PN-Counter •Set •OR-sets •LWW-register •Graph…

  19. PN-Counter •σϞ

  20. PN-Counter • merge • {a: {1,-1}, b: {1,0}, c: {2,0}}

    • {a: {0,0}, b: {2, 0}, c: {0, -2}} • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2 • update • a͕ {increment, 3} Λड͚෇͚Δͱ • {a: {4,-1}, b: {1,0}, c: {2,0}}
  21. OR-Sets • merge • {a:{“foo”:true}, b:{“bar”:false}} • + {a:{“foo”:true}, b:{“foo”:false,

    “bar”:false}} • => {a:{“foo”:true}, b:{“foo”:false, “bar”:true}} • => [“bar”] • update • add: {a:{}} => +”foo” => {a:{“foo”:false}} • remove: {a: {“foo”:false}} => {a: {“foo”:true}}
  22. OR-Sets •σϞ

  23. Ϣʔεέʔε •ΫϦοΫ਺ͷΧ΢ϯτ (G-counter) • riak-server/types/counters/buckets/likes/datatypes/basho.com -d 1 •γϣοϐϯάΧʔτ (OR-sets) •ϩάΠϯϢʔβʔ਺

    (PN-counter) •͜ΕΒͷ૊Έ߹Θͤ (map & LWW-register, boolean) •{ name : “basho.com”, likes: 20000, users: 3000, links: [ “basho.co.jp”, “basho.co.uk” ], cool: true }
  24. Ͱ͖ͳ͍͜ͱ •ʮ0Ҏ্ʯͷPN-counter •ϢχʔΫͳIDൃߦ •ͦͷଞCAS͕ඞཁͳσʔλߏ଄ͱૢ࡞

  25. ·ͱΊ •Riak͸Մ༻ੑͷ͋Δ෼ࢄσʔλϕʔε •ෳ਺ͷόʔδϣϯΛಉ࣌ʹอ࣋͢ΔͷΛ ڐ͢͜ͱͰՄ༻ੑΛ୲อ •ΞϓϦ։ൃͷ೉қ౓͕՝୊ •CRDTͱ͍͏ܕͷಋೖʹΑΓ؆୯͔ͭ σʔλͷͳ͘ͳΒͳ͍࢓૊ΈΛ࡞ͬͨ

  26. Questions? •Riak 2.0 Λָ͠Έʹ͍ͯͩ͘͠͞ •Web: http://basho.co.jp •Twitter: @BashoJapan •Me: kota@basho.com

    •ML: riak-users-jp@lists.basho.com
  27. Useful links http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf http://arxiv.org/pdf/1210.3368.pdf https://gist.github.com/russelldb/f92f44bdfb619e089a4d http://gsd.di.uminho.pt/members/cbm/ps/scadt3.pdf http://arxiv.org/abs/1011.5808