$30 off During Our Annual Pro Sale. View Details »

A New Concept of Consistency in Distributed Database and Implementation in Riak

UENISHI Kota
November 28, 2013

A New Concept of Consistency in Distributed Database and Implementation in Riak

Web+DB forum 技術報告 by Basho

UENISHI Kota

November 28, 2013
Tweet

More Decks by UENISHI Kota

Other Decks in Technology

Transcript

  1. ෼ࢄσʔλϕʔεʹ͓͚Δ

    ৽͍͠੔߹ੑϞσϧͱ

    Riakʹ͓͚Δ࣮૷
    2013 / 11 / 28 WebDB Forum

    Basho ্੢߁ଠ

    View Slide

  2. ෼ࢄσʔλϕʔεʹ͓͚Δ

    ݹͯ͘৽͍͠੔߹ੑϞσϧͱ

    Riakʹ͓͚Δ࣮૷
    2013 / 11 / 28 WebDB Forum

    Basho ্੢߁ଠ

    View Slide

  3. BashoͱRiak
    •෼ࢄσʔλϕʔεʁ
    •RiakΛ஌͍ͬͯΔʁ
    •BashoΛ஌͍ͬͯΔʁ

    View Slide

  4. CAPఆཧͱཧ૝ͷDB
    •ͲΜͳނোʹରͯ͠΋ (partition
    tolerance)
    •σʔλ͸ৗʹ੔߹͓ͯ͠Γ (consistency)
    •γεςϜ͕ࢭ·Δ͜ͱ͸ͳ͍
    (availability)
    ͜ͷ3ͭΛಉ࣌ʹຬͨ͢γεςϜ͸ଘࡏ͠ͳ͍

    View Slide

  5. •Մ༻ੑ (Availability) ͕ಛ௃ͷσʔλ
    ϕʔε
    •ӡ༻͠΍͍͢ɺେ͖ͳσʔλͰ΋ೖΔ
    •҆ఆੑɺ༧ଌՄೳੑ
    •ʮσʔλΛઈରʹͳ͘͞ͳ͍ʯ

    View Slide

  6. ͜Μͳͱ͜ΖͰ

    ಈ͍͍ͯ·͢Riak
    •Rovio (Angry Birds)
    •Yahoo!JAPAN ͷΫϥ΢υετϨʔδ
    •NHS (ΠΪϦε ࠃຽอݥαʔϏε)
    •Bump (=>Google)
    •ۜߦɺήʔϜɺখചɺηϯαʔɺetc…

    View Slide

  7. How Riak Works

    View Slide

  8. Consistent Hashing
    • 160-bit Ωʔۭؒ
    • ۭؒΛ౳෼͢Δ
    • ύʔςΟγϣϯ͸ϊʔ
    υ͕ݸผ؅ཧ
    • ϨϓϦΧ͸Nݸͷύʔ
    ςΟγϣϯʹίϐʔ͞
    ΕΔ
    OPEF
    OPEF
    OPEF
    OPEF
    hash(“meetups/spamham”)
    N=3

    View Slide

  9. Consistency͸೉͍͠
    •ߋ৽ΛࢭΊΔʢAvailabilityΛԼ͛Δʣ͔ɺߋ৽ͷ্ॻ͖Λ
    ڐ͢ʢσʔλΛࣦ͏ʣ͔͔͠બ୒ࢶ͕ͳ͍
    Server2
    Server1 Server3
    PUT V=42
    PUT V=0
    V=?

    View Slide

  10. Consistencyͷ୅ΘΓʹ
    •ͱΓ͋͑ͣෳ਺ͷόʔδϣϯͷڞଘΛڐ͢
    •Ͳͷόʔδϣϯ͕ਖ਼͍͔͠ɺ΋͘͠͸Ϛʔδ͢Δ͔ΛRead࣌ʹܾఆ
    Server2
    Server1 Server3
    PUT V=42
    PUT V=0
    V=0 or 42
    V=0 V=0 or 42 V=42

    View Slide

  11. APΛ࣮ݱ
    •ωοτϫʔΫ෼அ͕ى͖͍ͯͯ΋ͱΓ͋͑ͣॻ͖ࠐΈΛڐ͢
    Server2
    Server1 Server3
    PUT V=42
    PUT V=0
    Server4
    ෮چͨ͠Βॻ͖໭͢
    ྆ํ͓࣋ͬͯ͘

    View Slide

  12. γϣοϐϯάΧʔτͷྫ
    •UnionΛͱΕ͹Α͍
    Server2
    Server1 Server3
    PUT cart=[a,b,d]
    PUT cart=[a,b,c]
    union([a,b,c], [a,b,d]) => [a,b,c,d]
    [a,b,c] [a,b,c] or [a,b,d] [a,b,d]

    View Slide

  13. ෳ਺όʔδϣϯΛ

    ڐ͢͜ͱͷ೉఺
    •ϓϩάϥϛϯά͕೉͍͠ʢτϥϯβΫγϣϯ͸ૉ੖
    Β͍͠ʣ
    •ݱ࣮ੈք͸γϣοϐϯάΧʔτͱΧ΢ϯλʔ͚ͩ
    Ͱ͸ͳ͍
    •҆શͳMerge, update͕Ͱ͖Δσʔλߏ଄Λຖճ
    ߟ͑ͳ͚Ε͹ͳΒͳ͍
    •࢖͍ͬͯΔ͏ͪʹࣅͨΑ͏ͳϥΠϒϥϦ͕͋ͪ͜
    ͪͰग़དྷ্͕Δ

    View Slide

  14. ͳͥ೉͍͠ͷ͔ʁ
    •σʔλͷWriteͱWrite͕ೖΕସΘΓ͏
    ΔʹSerializableͲ͜Ζ͔Write΋Ұ؏
    ͨ͠ঢ়ଶʹͰ͖ͳ͍
    Server2
    Server1 Server3
    w1
    w2
    w1
    w2
    w2
    (w1 lost)

    View Slide

  15. Logical Monoticity
    •σʔλʹର͢ΔՄ׵ͳૢ࡞ͷΈΛڐ͢ʂ
    Data = update(w2, update(w1, Data0))

    = update(w1, update(w2, Data0))
    Data = merge(update(w2, Data0), Data)

    View Slide

  16. ౴͑: CRDT
    •ʮෳ੡ՄೳͳՄ׵σʔλܕʯ
    •Conflict-Free Replicated Data Types
    •Commutative Replicated Data Types
    •…
    •(Going to be included in Riak 2.0)
    ஫) CRDTͷ࡞ऀ͸Logical Monotinicy ͱ͍͏ݴ༿͸࢖͍ͬͯͳ͍

    View Slide

  17. CRDT in Riak 2.0
    •KVSͷVʹʮܕʯΛ࣋ͨͤͯɺܕʹΑͬͯ
    UpdateͱMergeͷϩδοΫΛܾΊΔ
    •Read࣌ʹMerge͕αʔόʔଆͰࣗಈతʹ࣮
    ߦ͞ΕΔ
    •ΞϓϦέʔγϣϯ͸ܕΛࢦఆ͢Δ͚ͩͰΑ͘ɺ
    ෳ਺όʔδϣϯͷϋϯυϦϯά͕ෆཁʹͳΔ

    View Slide

  18. CRDT example
    •PN-Counter
    •Set
    •OR-sets
    •LWW-register
    •Graph…

    View Slide

  19. PN-Counter
    •σϞ

    View Slide

  20. PN-Counter
    • merge
    • {a: {1,-1}, b: {1,0}, c: {2,0}}
    • {a: {0,0}, b: {2, 0}, c: {0, -2}}
    • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2
    • update
    • a͕ {increment, 3} Λड͚෇͚Δͱ
    • {a: {4,-1}, b: {1,0}, c: {2,0}}

    View Slide

  21. OR-Sets
    • merge
    • {a:{“foo”:true}, b:{“bar”:false}}
    • + {a:{“foo”:true}, b:{“foo”:false, “bar”:false}}
    • => {a:{“foo”:true}, b:{“foo”:false, “bar”:true}}
    • => [“bar”]
    • update
    • add: {a:{}} => +”foo” => {a:{“foo”:false}}
    • remove: {a: {“foo”:false}} => {a: {“foo”:true}}

    View Slide

  22. OR-Sets
    •σϞ

    View Slide

  23. Ϣʔεέʔε
    •ΫϦοΫ਺ͷΧ΢ϯτ (G-counter)
    • riak-server/types/counters/buckets/likes/datatypes/basho.com -d 1
    •γϣοϐϯάΧʔτ (OR-sets)
    •ϩάΠϯϢʔβʔ਺ (PN-counter)
    •͜ΕΒͷ૊Έ߹Θͤ (map & LWW-register,
    boolean)
    •{ name : “basho.com”, likes: 20000, users: 3000,
    links: [ “basho.co.jp”, “basho.co.uk” ], cool: true }

    View Slide

  24. Ͱ͖ͳ͍͜ͱ
    •ʮ0Ҏ্ʯͷPN-counter
    •ϢχʔΫͳIDൃߦ
    •ͦͷଞCAS͕ඞཁͳσʔλߏ଄ͱૢ࡞

    View Slide

  25. ·ͱΊ
    •Riak͸Մ༻ੑͷ͋Δ෼ࢄσʔλϕʔε
    •ෳ਺ͷόʔδϣϯΛಉ࣌ʹอ࣋͢ΔͷΛ
    ڐ͢͜ͱͰՄ༻ੑΛ୲อ
    •ΞϓϦ։ൃͷ೉қ౓͕՝୊
    •CRDTͱ͍͏ܕͷಋೖʹΑΓ؆୯͔ͭ
    σʔλͷͳ͘ͳΒͳ͍࢓૊ΈΛ࡞ͬͨ

    View Slide

  26. Questions?
    •Riak 2.0 Λָ͠Έʹ͍ͯͩ͘͠͞
    •Web: http://basho.co.jp
    •Twitter: @BashoJapan
    •Me: [email protected]
    •ML: [email protected]

    View Slide

  27. Useful links
    http://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdf
    http://arxiv.org/pdf/1210.3368.pdf
    https://gist.github.com/russelldb/f92f44bdfb619e089a4d
    http://gsd.di.uminho.pt/members/cbm/ps/scadt3.pdf
    http://arxiv.org/abs/1011.5808

    View Slide