結果整合性などの復習
݁Ռ߹ੑͳͲͷ෮श2014/12/11 Ϗοάσʔλج൫ษڧձ@NTTଂ௨ݚBashoδϟύϯɹ্
View Slide
ࣗݾհ• @kuenishi• Github, Twitter, etc• ࢄγεςϜྺ6• Bashoδϟύϯͷํ͔Βདྷ·ͨ͠• Riak CSͷ։ൃ• ͦͷଞຊͷ͜ͱ• msgpack-erlang ϝϯςφ
BashoͱRiak•ࢄσʔλϕʔεʁ•RiakΛ͍ͬͯΔʁ•BashoΛ͍ͬͯΔʁ
͋Β͢͡•݁Ռ߹ੑڧ߹ੑͷྼԽ൛Ͱͳ͘ɺผछͷΛղͨ͘ΊͷҟͳΔఆٛ•ผछͷ is Մ༻ੑ•ηϚϯςΟΫε͕ҟͳΔͷͰΞϓϦͷઃܭͷํ͕ͪΐͬͱมΘΔ
݁Ռ߹ੑ͏ݹ͍ʁ•2006ͷٕज़Ͱ͠ΐ•Ϗοάσʔλؔͳ͘Ͷʁ•DynamoDBڧ߹ੑΛఏڙ͍ͯ͠Δ•ωοτϫʔΫΕͳ͍Ͱ͠ΐʁ•ͦΜͳͷͬͯΔਓ͍Δͷʁ•ΞϓϦ͕࡞Γʹ͍͘…
Ϗοάσʔλج൫ݚڀձͱ݁Ռ߹ੑ•ϏοάσʔλΛѻ͏େنͳγεςϜʹͳΕͳΔ΄ͲյΕΔ෦ଟ͍•͕͔͔͍ۚͬͯΔͷͰɺٻΊΒΕΔՄ༻ੑߴ͍•ӡ༻ָ͕
݁Ռ߹ੑͷ࣮༻ྫʢۙʣ•σʔλϕʔεͷόοΫΞοϓ•rsyncͰͷϑΝΠϧͷόοΫΞοϓ•Google Waveaaaabbbbyx
CAP Theorem• Consistent: ෳͷAtomic Objectʹର͢Δ࿈ଓͨ͠ૢ࡞ (w1, w3, w4, ….) ͕શͯಉҰͰ͋Δ͜ͱ (linearizable)• Available: Atomic Objectʹૢ࡞ w1, w2,…Λ࣮ߦͯ͠Ϩεϙϯε͕ಘΒΕΔ͜ͱ• Partition Tolerant: ૹͬͨϝοηʔδ͕ࣦͯ͠ਖ਼͍͠ʢatomicʣͳϨεϙϯε͕ಘΒΕΔ͜ͱG1G2writereadGilbert and Lynch, Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
CAPఆཧ͕ఆٛ͢Δ߹ੑ•CAPఆཧ͕ఆٛ͢Δ߹ੑ㲈Linearizability•શͯͷෳͰɺશͯͷʢߋ৽ʣૢ࡞͕ॱ൪௨ΓϦϓϨΠ͞ΕΔ͜ͱΛอো͢Δ•ACIDͷͦΕͱͪΐͬͱҧ͏
ͳͥڧ߹ੑͷ࣮ݱ͕͍͠ͷ͔• ଟܾͱ͔ atomic broadcast Λ͏ͱͯ͠ύϑΥʔϚϯεͷϖφϧςΟ͕͋Δ• asynchrony + partial failureͷ͠͞• ࢮ׆ࢹ is hard => Downtime• ӡ༻ੑ• Γସ͑ɺΓ͠ɺ༳Ε·͢༳Ε·͢
Consistency͍͠•ߋ৽ΛࢭΊΔʢAvailabilityΛԼ͛Δʣ͔ɺߋ৽ͷ্ॻ͖Λڐ͢ʢσʔλΛࣦ͏ʣ͔͔͠બࢶ͕ͳ͍Server2Server1 Server3PUT V=42PUT V=0V=?
Atomic Broadcastingis Difficult• ϨϓϦέʔγϣϯॱ൪͕ೖΕସΘΔ• CPUͷΞτΦϒΦʔμʔ࣮ߦͱಉ͡w1w1w1w2w2w2Actor 0Actor 1Actor 2w2w2w1
Consensus BasedReplication• ϨϓϦέʔγϣϯͷϦʔμʔΛଟܾͰબग़• or ϨϓϦέʔγϣϯຖʹଟܾw1w1w1 w2w2w2Actor 0Actor 1Actor 2w2w2w1
݁Ռ߹ੑ•Eventual Consistency•Ͳ͏͍͏ܦ࿏ΛḷΔʹͤΑɺෳ͕࠷ऴతʹಉ͡ঢ়ଶʹऩଋ͢Δ͜ͱ•Read Repair•AAE•CRDTv0v1•(Vector Clocks)
Siblings•ͱΓ͋͑ͣෳͷόʔδϣϯͷڞଘΛڐ͢•Ͳͷόʔδϣϯ͕ਖ਼͍͔͠ɺ͘͠Ϛʔδ͢Δ͔ΛRead࣌ʹܾఆServer2Server1 Server3PUT V=42PUT V=0V=0 or 42V=0 V=0 or 42 V=42
APΛ࣮ݱ•ωοτϫʔΫஅ͕ى͖͍ͯͯͱΓ͋͑ͣॻ͖ࠐΈΛڐ͢Server2Server1 Server3PUT V=42PUT V=0Server4෮چͨ͠Βॻ͖྆͢ํ͓࣋ͬͯ͘
γϣοϐϯάΧʔτͷྫ•UnionΛͱΕΑ͍Server2Server1 Server3PUT cart=[a,b,d]PUT cart=[a,b,c]union([a,b,c], [a,b,d]) => [a,b,c,d][a,b,c] [a,b,c] or [a,b,d] [a,b,d]
Read Repairv2v2get(“conferences/thoughtworks”)Get Handler (FSM)clientRiakCoordinating nodeCluster6 7 8 9 10 11 12 13 14 15 16R=2 v1 v2v2v1v2v1v1 v2v2
Active Anti Entropy• APࢦͷDBͷσʔλྼԽΛ͙ͨΊͷόοΫάϥϯυॲཧ• Merkle-TreeΛͬͯύʔςΟγϣϯຖͷʮνΣοΫαϜʯΛܭࢉ• ࠩΛݟ͚ͭͨΒͦ͜ΛReadRepair͢Δhash(vnode=0,pid=0)hash(vnode=1,pid=0)hash(vnode=2,pid=0)
CRDT• ॱ൪͕ೖΕସΘͬͯ݁Ռ͕มΘΒͳ͍ܕ• update(w1, update(w2, Data0) = update(w2,update(w1, Data0) = Dataw1w1w1w2w2w2Actor 0Actor 1Actor 2w1(w2(Data0)) => Dataw1(w2(Data0)) => Dataw2(w1(Data0)) => Data
CRDT: PN-Counter• merge• {a: {1,-1}, b: {1,0}, c: {2,0}}• {a: {0,0}, b: {2, 0}, c: {0, -2}}• => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2• update• a͕ {increment, 3} Λड͚͚Δͱ• {a: {4,-1}, b: {1,0}, c: {2,0}}
CRDT: OR-Sets• merge• {a:{“foo”:true}, b:{“bar”:false}}• + {a:{“foo”:true}, b:{“foo”:false, “bar”:false}}• => {a:{“foo”:true}, b:{“foo”:false, “bar”:true}}• => [“bar”]• update• add: {a:{}} => +”foo” => {a:{“foo”:false}}• remove: {a: {“foo”:false}} => {a: {“foo”:true}}
ӡ༻ָ͕• ΧδϡΞϧʹϊʔυωοτϫʔΫΛ্͛Լ͛Ͱ͖Δ• ߹ੑΛอͭͨΊͷϚελʔ͕୭͔Λؾʹ͢Δඞཁ͕ͳ͍• ڧ߹ੑΛอͭͨΊͷΦϖϨʔγϣϯ͕μϯλΠϜʹͳΒͳ͍• ߹ੑνΣοΫɺϦΧόϦɺόοΫΞοϓ• ނো࣌ͷΦϖϨʔγϣϯ͕͔ͳΓ୯७
݁Ռ߹ੑΛ࠾༻ͨ͠߹ͷ՝•;ͭ͏ͷϓϩάϥϛϯάͱҟͳΔηϚϯςΟΫεʹͳΔ•ΞϓϦέʔγϣϯʹ͜Ε·ͰͱҟͳΔલఏΛཁٻ͢Δ͜ͱʹͳΔ•CRDTͰҰ෦ղܾɺ͚ͩͲ…
Ԡ༻ྫ
League of Legends•MMORPGͷνϟοτՄ༻ੑͱϨεϙϯελΠϜ໋͕•10ms ͕ੜࢮΛ͚Δ(C) Riot Games
•Riak্Ͱಈ͘ “ߴՄ༻” ΫϥυετϨʔδ•ΦϒδΣΫτͷϝλσʔλ݁Ռ߹తσʔλߏ•: 5GBͷσʔλ͕ConcurrentʹΞοϓϩʔυ͞Ε͖ͯͨΒʁ•: ͔ͦ͠Ε͕ผͷେͷΞοϓϩʔυͩͬͨΒʁ/foo.bar
σʔληϯλʔؒϨϓϦέʔγϣϯ•DCؒωοτϫʔΫଓੑଳҬ·ͰؚΊͯৗʹਖ਼͘͠ӡ༻͢Δͷ͕͍͠•CAPఆཧͷཁ͔ΒɺಉظతϨϓϦέʔγϣϯ͍͠•Մ༻ੑΛอͭͨΊʹɺ݁Ռ߹͢ΔσʔλϞσϧΛ࠾༻
·ͱΊ•݁Ռ߹ੑڧ߹ੑͷྼԽ൛Ͱͳ͘ɺผछͷΛղͨ͘ΊͷҟͳΔఆٛ•ผछͷ is Մ༻ੑ•ηϚϯςΟΫε͕ҟͳΔͷͰΞϓϦͷઃܭͷํ͕ͪΐͬͱมΘΔ•݁Ռ߹ੑΛอͭͨΊͷ͍͔ͭ͘ͷٕज़Λհ
We are hiring.•࣮ੈքͷࢄγεςϜͷʹڵຯ͋Δਓʂ•@BashoJapan•[email protected]
Questions?