Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Notes on Eventual Consistency

Notes on Eventual Consistency

結果整合性などの復習

UENISHI Kota

December 11, 2014
Tweet

More Decks by UENISHI Kota

Other Decks in Technology

Transcript

  1. ݁Ռ੔߹ੑͳͲͷ෮श
    2014/12/11 Ϗοάσʔλج൫ษڧձ
    @NTT෢ଂ໺௨ݚ
    Bashoδϟύϯɹ্੢

    View Slide

  2. ࣗݾ঺հ
    • @kuenishi
    • Github, Twitter, etc
    • ෼ࢄγεςϜྺ6೥
    • Bashoδϟύϯͷํ͔Βདྷ·ͨ͠
    • Riak CSͷ։ൃ
    • ͦͷଞ೔ຊͷ͜ͱ
    • msgpack-erlang ϝϯςφ

    View Slide

  3. BashoͱRiak
    •෼ࢄσʔλϕʔεʁ
    •RiakΛ஌͍ͬͯΔʁ
    •BashoΛ஌͍ͬͯΔʁ

    View Slide

  4. ͋Β͢͡
    •݁Ռ੔߹ੑ͸ڧ੔߹ੑͷྼԽ൛Ͱ͸ͳ͘ɺ
    ผछͷ໰୊Λղͨ͘ΊͷҟͳΔఆٛ
    •ผछͷ໰୊ is Մ༻ੑ
    •ηϚϯςΟΫε͕ҟͳΔͷͰΞϓϦͷઃܭ
    ͷ࢓ํ͕ͪΐͬͱมΘΔ

    View Slide

  5. ݁Ռ੔߹ੑ͸΋͏ݹ͍ʁ
    •2006೥ͷٕज़Ͱ͠ΐ
    •Ϗοάσʔλؔ܎ͳ͘Ͷʁ
    •DynamoDB͸ڧ੔߹ੑΛఏڙ͍ͯ͠Δ
    •ωοτϫʔΫ͸੾Εͳ͍Ͱ͠ΐʁ
    •ͦΜͳͷ࢖ͬͯΔਓ͍Δͷʁ
    •ΞϓϦ͕࡞Γʹ͍͘…

    View Slide

  6. Ϗοάσʔλج൫ݚڀձͱ
    ݁Ռ੔߹ੑ
    •ϏοάσʔλΛѻ͏େن໛ͳγεςϜʹͳ
    Ε͹ͳΔ΄ͲյΕΔ෦඼͸ଟ͍
    •͕͔͔͍ۚͬͯΔͷͰɺٻΊΒΕΔՄ༻ੑ
    ͸ߴ͍
    •ӡ༻ָ͕

    View Slide

  7. View Slide

  8. ݁Ռ੔߹ੑͷ࣮༻ྫʢ਎ۙʣ
    •σʔλϕʔεͷόοΫΞοϓ
    •rsyncͰͷϑΝΠϧͷόοΫΞοϓ
    •Google Wave
    aaaa
    bbbb
    y
    x

    View Slide

  9. CAP Theorem
    • Consistent: ෳ਺ͷAtomic Objectʹର͢Δ
    ࿈ଓͨ͠ૢ࡞ (w1, w3, w4, ….) ͕શͯಉҰ
    Ͱ͋Δ͜ͱ (linearizable)
    • Available: Atomic Objectʹૢ࡞ w1, w2,
    …Λ࣮ߦͯ͠Ϩεϙϯε͕ಘΒΕΔ͜ͱ
    • Partition Tolerant: ૹͬͨϝοηʔδ͕૕
    ࣦͯ͠΋ਖ਼͍͠ʢatomicʣͳϨεϙϯε͕
    ಘΒΕΔ͜ͱ
    G1
    G2
    write
    read
    Gilbert and Lynch, Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

    View Slide

  10. CAPఆཧ͕ఆٛ͢Δ੔߹ੑ
    •CAPఆཧ͕ఆٛ͢Δ੔߹ੑ㲈Linearizability
    •શͯͷෳ੡Ͱɺશͯͷʢߋ৽ʣૢ࡞͕ॱ൪௨Γ
    ϦϓϨΠ͞ΕΔ͜ͱΛอো͢Δ
    •ACIDͷͦΕͱ͸ͪΐͬͱҧ͏

    View Slide

  11. ͳͥڧ੔߹ੑͷ࣮ݱ͕೉͍͠ͷ͔
    • ଟ਺ܾͱ͔ atomic broadcast Λ࢖͏ͱͯ͠΋ύ
    ϑΥʔϚϯεͷϖφϧςΟ͕͋Δ
    • asynchrony + partial failureͷ೉͠͞
    • ࢮ׆؂ࢹ is hard => Downtime
    • ӡ༻ੑ
    • ੾Γସ͑ɺ੾Γ໭͠ɺ༳Ε·͢༳Ε·͢

    View Slide

  12. Consistency͸೉͍͠
    •ߋ৽ΛࢭΊΔʢAvailabilityΛԼ͛Δʣ͔ɺߋ৽ͷ্ॻ͖Λ
    ڐ͢ʢσʔλΛࣦ͏ʣ͔͔͠બ୒ࢶ͕ͳ͍
    Server2
    Server1 Server3
    PUT V=42
    PUT V=0
    V=?

    View Slide

  13. Atomic Broadcasting
    is Difficult
    • ϨϓϦέʔγϣϯ͸ॱ൪͕ೖΕସΘΔ
    • CPUͷΞ΢τΦϒΦʔμʔ࣮ߦͱಉ͡
    w1
    w1
    w1
    w2
    w2
    w2
    Actor 0
    Actor 1
    Actor 2
    w2
    w2
    w1

    View Slide

  14. Consensus Based
    Replication
    • ϨϓϦέʔγϣϯͷϦʔμʔΛଟ਺ܾͰબग़
    • or ϨϓϦέʔγϣϯຖʹଟ਺ܾ
    w1
    w1
    w1 w2
    w2
    w2
    Actor 0
    Actor 1
    Actor 2
    w2
    w2
    w1

    View Slide

  15. ݁Ռ੔߹ੑ
    •Eventual Consistency
    •Ͳ͏͍͏ܦ࿏ΛḷΔʹͤΑɺෳ੡͕
    ࠷ऴతʹಉ͡ঢ়ଶʹऩଋ͢Δ͜ͱ
    •Read Repair
    •AAE
    •CRDT
    v0
    v1
    •(Vector Clocks)

    View Slide

  16. Siblings
    •ͱΓ͋͑ͣෳ਺ͷόʔδϣϯͷڞଘΛڐ͢
    •Ͳͷόʔδϣϯ͕ਖ਼͍͔͠ɺ΋͘͠͸Ϛʔδ͢Δ͔ΛRead࣌ʹܾఆ
    Server2
    Server1 Server3
    PUT V=42
    PUT V=0
    V=0 or 42
    V=0 V=0 or 42 V=42

    View Slide

  17. APΛ࣮ݱ
    •ωοτϫʔΫ෼அ͕ى͖͍ͯͯ΋ͱΓ͋͑ͣॻ͖ࠐΈΛڐ͢
    Server2
    Server1 Server3
    PUT V=42
    PUT V=0
    Server4
    ෮چͨ͠Βॻ͖໭͢
    ྆ํ͓࣋ͬͯ͘

    View Slide

  18. γϣοϐϯάΧʔτͷྫ
    •UnionΛͱΕ͹Α͍
    Server2
    Server1 Server3
    PUT cart=[a,b,d]
    PUT cart=[a,b,c]
    union([a,b,c], [a,b,d]) => [a,b,c,d]
    [a,b,c] [a,b,c] or [a,b,d] [a,b,d]

    View Slide

  19. Read Repair
    v2
    v2
    get(“conferences/thoughtworks”)
    Get Handler (FSM)
    client
    Riak
    Coordinating node
    Cluster
    6 7 8 9 10 11 12 13 14 15 16
    R=2 v1 v2
    v2
    v1
    v2
    v1
    v1 v2
    v2

    View Slide

  20. Active Anti Entropy
    • APࢦ޲ͷDBͷσʔλྼԽΛ๷͙
    ͨΊͷόοΫάϥ΢ϯυॲཧ
    • Merkle-TreeΛ࢖ͬͯύʔςΟγϣ
    ϯຖͷʮνΣοΫαϜʯΛܭࢉ
    • ࠩ෼Λݟ͚ͭͨΒͦ͜ΛRead
    Repair͢Δ
    hash(vnode=0,
    pid=0)
    hash(vnode=1,
    pid=0)
    hash(vnode=2,
    pid=0)

    View Slide

  21. CRDT
    • ॱ൪͕ೖΕସΘͬͯ΋݁Ռ͕มΘΒͳ͍ܕ
    • update(w1, update(w2, Data0) = update(w2,
    update(w1, Data0) = Data
    w1
    w1
    w1
    w2
    w2
    w2
    Actor 0
    Actor 1
    Actor 2
    w1(w2(Data0)) => Data
    w1(w2(Data0)) => Data
    w2(w1(Data0)) => Data

    View Slide

  22. CRDT: PN-Counter
    • merge
    • {a: {1,-1}, b: {1,0}, c: {2,0}}
    • {a: {0,0}, b: {2, 0}, c: {0, -2}}
    • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2
    • update
    • a͕ {increment, 3} Λड͚෇͚Δͱ
    • {a: {4,-1}, b: {1,0}, c: {2,0}}

    View Slide

  23. CRDT: OR-Sets
    • merge
    • {a:{“foo”:true}, b:{“bar”:false}}
    • + {a:{“foo”:true}, b:{“foo”:false, “bar”:false}}
    • => {a:{“foo”:true}, b:{“foo”:false, “bar”:true}}
    • => [“bar”]
    • update
    • add: {a:{}} => +”foo” => {a:{“foo”:false}}
    • remove: {a: {“foo”:false}} => {a: {“foo”:true}}

    View Slide

  24. ӡ༻ָ͕
    • ΧδϡΞϧʹϊʔυ΍ωοτϫʔΫΛ্͛Լ͛Ͱ͖Δ
    • ੔߹ੑΛอͭͨΊͷϚελʔ͕୭͔Λؾʹ͢Δඞཁ͕ͳ͍
    • ڧ੔߹ੑΛอͭͨΊͷΦϖϨʔγϣϯ͕μ΢ϯλΠϜʹͳ
    Βͳ͍
    • ੔߹ੑνΣοΫɺϦΧόϦɺόοΫΞοϓ
    • ނো࣌ͷΦϖϨʔγϣϯ͕͔ͳΓ୯७

    View Slide

  25. ݁Ռ੔߹ੑΛ࠾༻ͨ͠
    ৔߹ͷ՝୊
    •;ͭ͏ͷϓϩάϥϛϯάͱ͸ҟͳΔηϚϯςΟ
    ΫεʹͳΔ
    •ΞϓϦέʔγϣϯʹ͜Ε·ͰͱҟͳΔલఏΛཁ
    ٻ͢Δ͜ͱʹͳΔ
    •CRDTͰҰ෦ղܾɺ͚ͩͲ…

    View Slide

  26. Ԡ༻ྫ

    View Slide

  27. League of Legends
    •MMORPGͷνϟοτ͸Մ༻ੑͱϨε
    ϙϯελΠϜ໋͕
    •10ms ͕ੜࢮΛ෼͚Δ
    (C) Riot Games

    View Slide

  28. •Riak্Ͱಈ͘ “ߴՄ༻” Ϋϥ΢υ
    ετϨʔδ
    •ΦϒδΣΫτͷϝλσʔλ͸݁Ռ
    ੔߹తσʔλߏ଄
    •໰: 5GBͷσʔλ͕Concurrentʹ
    Ξοϓϩʔυ͞Ε͖ͯͨΒʁ
    •໰: ͔͠΋ͦΕ͕ผͷେ཮΁ͷ
    ΞοϓϩʔυͩͬͨΒʁ
    /foo.bar

    View Slide

  29. σʔληϯλʔؒϨϓϦέʔγϣϯ
    •DCؒωοτϫʔΫ͸઀ଓੑ΍
    ଳҬ·ͰؚΊͯৗʹਖ਼͘͠ӡ
    ༻͢Δͷ͕೉͍͠
    •CAPఆཧͷཁ੥͔Βɺಉظత
    ϨϓϦέʔγϣϯ͸೉͍͠
    •Մ༻ੑΛอͭͨΊʹɺ݁Ռ੔
    ߹͢ΔσʔλϞσϧΛ࠾༻

    View Slide

  30. ·ͱΊ
    •݁Ռ੔߹ੑ͸ڧ੔߹ੑͷྼԽ൛Ͱ͸ͳ͘ɺผछͷ໰୊
    Λղͨ͘ΊͷҟͳΔఆٛ
    •ผछͷ໰୊ is Մ༻ੑ
    •ηϚϯςΟΫε͕ҟͳΔͷͰΞϓϦͷઃܭͷ࢓ํ͕
    ͪΐͬͱมΘΔ
    •݁Ռ੔߹ੑΛอͭͨΊͷ͍͔ͭ͘ͷٕज़Λ঺հ

    View Slide

  31. We are hiring.
    •࣮ੈքͷ෼ࢄγεςϜͷ
    ໰୊ʹڵຯ͋Δਓʂ
    •@BashoJapan
    [email protected]

    View Slide

  32. Questions?

    View Slide