Web+DB forum 技術報告 by Basho
ࢄσʔλϕʔεʹ͓͚Δ ৽͍͠߹ੑϞσϧͱ Riakʹ͓͚Δ࣮2013 / 11 / 28 WebDB Forum Basho ্߁ଠ
View Slide
ࢄσʔλϕʔεʹ͓͚Δ ݹͯ͘৽͍͠߹ੑϞσϧͱ Riakʹ͓͚Δ࣮2013 / 11 / 28 WebDB Forum Basho ্߁ଠ
BashoͱRiak•ࢄσʔλϕʔεʁ •RiakΛ͍ͬͯΔʁ •BashoΛ͍ͬͯΔʁ
CAPఆཧͱཧͷDB•ͲΜͳނোʹରͯ͠ (partitiontolerance) •σʔλৗʹ߹͓ͯ͠Γ (consistency) •γεςϜ͕ࢭ·Δ͜ͱͳ͍(availability)͜ͷ3ͭΛಉ࣌ʹຬͨ͢γεςϜଘࡏ͠ͳ͍
•Մ༻ੑ (Availability) ͕ಛͷσʔλϕʔε •ӡ༻͍͢͠ɺେ͖ͳσʔλͰೖΔ •҆ఆੑɺ༧ଌՄೳੑ •ʮσʔλΛઈରʹͳ͘͞ͳ͍ʯ
͜Μͳͱ͜ΖͰ ಈ͍͍ͯ·͢Riak•Rovio (Angry Birds) •Yahoo!JAPAN ͷΫϥυετϨʔδ •NHS (ΠΪϦε ࠃຽอݥαʔϏε) •Bump (=>Google) •ۜߦɺήʔϜɺখചɺηϯαʔɺetc…
How Riak Works
Consistent Hashing• 160-bit Ωʔۭؒ • ۭؒΛ͢Δ • ύʔςΟγϣϯϊʔυ͕ݸผཧ • ϨϓϦΧNݸͷύʔςΟγϣϯʹίϐʔ͞ΕΔOPEFOPEFOPEFOPEFhash(“meetups/spamham”)N=3
Consistency͍͠•ߋ৽ΛࢭΊΔʢAvailabilityΛԼ͛Δʣ͔ɺߋ৽ͷ্ॻ͖Λڐ͢ʢσʔλΛࣦ͏ʣ͔͔͠બࢶ͕ͳ͍Server2Server1 Server3PUT V=42PUT V=0V=?
ConsistencyͷΘΓʹ•ͱΓ͋͑ͣෳͷόʔδϣϯͷڞଘΛڐ͢ •Ͳͷόʔδϣϯ͕ਖ਼͍͔͠ɺ͘͠Ϛʔδ͢Δ͔ΛRead࣌ʹܾఆServer2Server1 Server3PUT V=42PUT V=0V=0 or 42V=0 V=0 or 42 V=42
APΛ࣮ݱ•ωοτϫʔΫஅ͕ى͖͍ͯͯͱΓ͋͑ͣॻ͖ࠐΈΛڐ͢Server2Server1 Server3PUT V=42PUT V=0Server4෮چͨ͠Βॻ͖྆͢ํ͓࣋ͬͯ͘
γϣοϐϯάΧʔτͷྫ•UnionΛͱΕΑ͍Server2Server1 Server3PUT cart=[a,b,d]PUT cart=[a,b,c]union([a,b,c], [a,b,d]) => [a,b,c,d][a,b,c] [a,b,c] or [a,b,d] [a,b,d]
ෳόʔδϣϯΛ ڐ͢͜ͱͷ•ϓϩάϥϛϯά͕͍͠ʢτϥϯβΫγϣϯૉΒ͍͠ʣ •ݱ࣮ੈքγϣοϐϯάΧʔτͱΧϯλʔ͚ͩͰͳ͍ •҆શͳMerge, update͕Ͱ͖ΔσʔλߏΛຖճߟ͑ͳ͚ΕͳΒͳ͍ •͍ͬͯΔ͏ͪʹࣅͨΑ͏ͳϥΠϒϥϦ͕͋ͪͪ͜Ͱग़དྷ্͕Δ
ͳ͍ͥ͠ͷ͔ʁ•σʔλͷWriteͱWrite͕ೖΕସΘΓ͏ΔʹSerializableͲ͜Ζ͔WriteҰ؏ͨ͠ঢ়ଶʹͰ͖ͳ͍Server2Server1 Server3w1w2w1w2w2(w1 lost)
Logical Monoticity•σʔλʹର͢ΔՄͳૢ࡞ͷΈΛڐ͢ʂData = update(w2, update(w1, Data0)) = update(w1, update(w2, Data0))Data = merge(update(w2, Data0), Data)
͑: CRDT•ʮෳՄೳͳՄσʔλܕʯ •Conflict-Free Replicated Data Types •Commutative Replicated Data Types •… •(Going to be included in Riak 2.0)) CRDTͷ࡞ऀLogical Monotinicy ͱ͍͏ݴ༿͍ͬͯͳ͍
CRDT in Riak 2.0•KVSͷVʹʮܕʯΛ࣋ͨͤͯɺܕʹΑͬͯUpdateͱMergeͷϩδοΫΛܾΊΔ •Read࣌ʹMerge͕αʔόʔଆͰࣗಈతʹ࣮ߦ͞ΕΔ •ΞϓϦέʔγϣϯܕΛࢦఆ͢Δ͚ͩͰΑ͘ɺෳόʔδϣϯͷϋϯυϦϯά͕ෆཁʹͳΔ
CRDT example•PN-Counter •Set •OR-sets •LWW-register •Graph…
PN-Counter•σϞ
PN-Counter• merge • {a: {1,-1}, b: {1,0}, c: {2,0}} • {a: {0,0}, b: {2, 0}, c: {0, -2}} • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2 • update • a͕ {increment, 3} Λड͚͚Δͱ • {a: {4,-1}, b: {1,0}, c: {2,0}}
OR-Sets• merge • {a:{“foo”:true}, b:{“bar”:false}} • + {a:{“foo”:true}, b:{“foo”:false, “bar”:false}} • => {a:{“foo”:true}, b:{“foo”:false, “bar”:true}} • => [“bar”] • update • add: {a:{}} => +”foo” => {a:{“foo”:false}} • remove: {a: {“foo”:false}} => {a: {“foo”:true}}
OR-Sets•σϞ
Ϣʔεέʔε•ΫϦοΫͷΧϯτ (G-counter) • riak-server/types/counters/buckets/likes/datatypes/basho.com -d 1 •γϣοϐϯάΧʔτ (OR-sets) •ϩάΠϯϢʔβʔ (PN-counter) •͜ΕΒͷΈ߹Θͤ (map & LWW-register,boolean) •{ name : “basho.com”, likes: 20000, users: 3000,links: [ “basho.co.jp”, “basho.co.uk” ], cool: true }
Ͱ͖ͳ͍͜ͱ•ʮ0Ҏ্ʯͷPN-counter •ϢχʔΫͳIDൃߦ •ͦͷଞCAS͕ඞཁͳσʔλߏͱૢ࡞
·ͱΊ•RiakՄ༻ੑͷ͋Δࢄσʔλϕʔε •ෳͷόʔδϣϯΛಉ࣌ʹอ࣋͢ΔͷΛڐ͢͜ͱͰՄ༻ੑΛ୲อ •ΞϓϦ։ൃͷқ͕՝ •CRDTͱ͍͏ܕͷಋೖʹΑΓ؆୯͔ͭσʔλͷͳ͘ͳΒͳ͍ΈΛ࡞ͬͨ
Questions?•Riak 2.0 Λָ͠Έʹ͍ͯͩ͘͠͞ •Web: http://basho.co.jp •Twitter: @BashoJapan •Me: [email protected] •ML: [email protected]
Useful linkshttp://hal.upmc.fr/docs/00/55/55/88/PDF/techreport.pdfhttp://arxiv.org/pdf/1210.3368.pdfhttps://gist.github.com/russelldb/f92f44bdfb619e089a4dhttp://gsd.di.uminho.pt/members/cbm/ps/scadt3.pdfhttp://arxiv.org/abs/1011.5808