WebDB Forum 2014 のBashoジャパンの発表スライドです
Riak 2.0ʹ͓͚Δશจݕࡧͱ߹ੑͷWebDB Forum 2014Basho Japan ্ 2014/11/19
View Slide
ࣗݾհ• ࢄγεςϜྺ6• Bashoδϟύϯͷํ͔Βདྷ·ͨ͠• Riak CSͷ։ൃ• ͦͷଞຊͷ͜ͱ
BashoͱRiak•ࢄσʔλϕʔεʁ•RiakΛ͍ͬͯΔʁ•BashoΛ͍ͬͯΔʁ
We are hiring•࣮ੈքͷࢄγεςϜͷʹڵຯ͋Δਓʂ•@BashoJapan•[email protected]
•APࢦͷσʔλϕʔεʢ݁Ռ߹ੑʣ•҆ఆੑɺ༧ଌՄೳੑ•ʮσʔλΛઈରʹͳ͘͞ͳ͍ʯ•৽͍͠ݕࡧΤϯδϯͱͯ͠SolrΛόϯυϧ
͜Μͳͱ͜ΖͰಈ͍͍ͯ·͢Riak•Rovio (Angry Birds)•Yahoo!JAPAN ͷΫϥυετϨʔδ•NHS (ΠΪϦε ࠃຽอݥαʔϏε)•League of Legends (MMORPG)•ۜߦɺήʔϜɺখചɺηϯαʔɺetc…
CAPఆཧͱཧͷDB•ͲΜͳނোʹରͯ͠ (partitiontolerance)•σʔλৗʹ߹͓ͯ͠Γ (consistency)•γεςϜ͕ࢭ·Δ͜ͱͳ͍(availability)͜ͷ3ͭΛಉ࣌ʹຬͨ͢γεςϜଘࡏ͠ͳ͍
CAP Theorem• C: ෳͷAtomic Objectʹର͢Δ࿈ଓͨ͠ૢ࡞ (w1, w3, w4, ….) ͕શͯಉҰͰ͋Δ͜ͱ (linearizable)• A: Atomic Objectʹૢ࡞ w1, w2,…Λ࣮ߦͰ͖Δ͜ͱ• P: ωοτϫʔΫ͕ΕΔͱAtomicObjectͷϝοηʔδ͕౸ୡ͠ͳ͍
Is the network reliable?
How Riak Works(mostly about search)
run Riak with Search$ which java$ sudo apt-get install riak$ echo “search = on” >> /etc/riak/riak.conf$ riak start$ riak-admin bucket-type create tt$ riak-admin bucket-type activate tt$ curl -XPUT http://localhost:8098/search/index/t$ curl -XPUT -H ‘content-type: application/json’ \http://localhost:8098/types/tt/props \-d ‘{“props”:{“search_index”:”t”}}’
Consistent Hashing• 160-bit Ωʔۭؒ• ۭؒΛ͢Δ• ύʔςΟγϣϯϊʔυ͕ݸผཧ• ϨϓϦΧNݸͷύʔςΟγϣϯʹίϐʔ͞ΕΔOPEFOPEFOPEFOPEFhash(“meetups/spamham”)N=3
σʔλϕʔεͷݕࡧ• DBࠐܕ• groongaͳͲ• “LIKE”• Pros• ߹ੑΛҡ͍࣋͢͠• Cons• ີ݁߹ʹͳΔ• ֎෦࿈ܞܕ• Solr• Elasticsearch• Pros• ૄ݁߹ʹͰ͖Δ• Cons• ߹ੑΛҡ࣋͠ʹ͍͘
DBͱΠϯσοΫεͷ߹ੑ• DBͷσʔλຊମͱɺΠϯσοΫεΛ࣌ʹ҆શʹߋ৽͢ΔͨΊʹτϥϯβΫγϣϯ͕ඞཁ• ີ݁߹ͷγεςϜͰ͋ΕϩοΫͳͲΛͬͯ߹ੑΛอো͍͢͠
ૄ݁߹ʹ͢ΔϝϦοτ• ಉ͡ݕࡧΤϯδϯͰ͍ΖΜͳDBʹରԠͰ͖Δ• ӡ༻ϊϋπʔϧɺ։ൃΛूதͰ͖Δ• όʔδϣϯ߹ͷ੍͕؇͞ΕΔ• ยํ͕ނো͍ͯͯ͠͏ยํಈ࡞Ͱ͖Δ• ֦ு͘͢͠ͳΔ
྆ํͱ͍͍ͱ͜ͲΓ͍ͨ͠
Yokozuna
as Riak Search 2.0= +σʔλͷӬଓԽΫϥελͷཧγϯϓϧͳૢ࡞RVFSZJOEFYJOH
Riak Search 2.0• RiakʹSolrΛόϯυϧ• RiakͷKVΛߋ৽͢Δͱσʔλ͕ࣗಈతʹΠϯσοΫεԽ͞ΕΔ• SolrͷϓϥάΠϯεΩʔϚɺΫΤϦΛͦͷ··͏͜ͱ͕Ͱ͖Δ• ԽɺϊʔυՃআRiakͷ୲
ૄ݁߹ʁີ݁߹ʁ• ύοέʔδಉࠝ• ϓϩηεͱͯ͠ૄ݁߹• ΠϯσοΫεͱσʔλؒ݁Ռ߹ੑΛอো• ΠϯσοΫεͷԽ݁Ռ߹ੑΛอো
Solrͱૄ݁߹• Riak ϓϩηε͕ Solr ͷϓϩηεΛ fork ͢Δ• ཧɺࢹશͯRiak͕ߦ͏• ผϓϩηεɺϝϞϦۭؒผ
ΠϯσοΫεԽͷྲྀΕ• Riakͷߋ৽ͱҰ৺ಉମʢdocument-basedindexingʣ• ΠϯσοΫεԽʹࣦഊ͢ΔͱPUT͕ࣦഊ
ݕࡧΫΤϦͷॲཧ• Riakͷ2iMapReduceͱಉछͷΫΤϦ͕Δ• 3͓͖ͭͷ vnode ʹΞΫηε͢ΔͷͰɺॏෳͳ͍ू
߹ੑͷͱҟৗܥ߹ੑҡ࣋ͷͨΊʹߋ৽ϩάΛอ࣋ͯ͠ϦΧόϦ͢Δͱ͍͏ํ๏ͰɺՄ༻ੑͱஅੑΛ୲อͰ͖ͳ͍ɻRiakͰଞͷํ๏Ͱσʔλͷ߹ੑΛҡ࣋͢Δɻ
Α͋͘Δҟৗܥ #1•ϊʔυނোͰෳΛ3ͭ࡞Εͳ͍•෮چͯ͠ɺ2ͭͷ··ʹͳͬͯ͠·͏
Α͋͘Δҟৗܥ #1• ෮چͨ͋͠ͱʹRead Repair͢Δ• GETͨ͠ͱ͖ʹɺσʔλʹ͕ܽؕ͋ͬͨΒRiakଆͰPUT͢͠v2 v2 notfoundv2
Read Repair͕ݺΕͳ͍ͱσʔλ͕Βͳ͍ʁYES
ΫϥΠΞϯτ͔ΒGET͕ݺΕͳ͍ͱσʔλ͕Βͳ͍ʁNO
AAE(Active Anti Entropy)https://www.flickr.com/photos/51pct/7507525118/
Active Anti Entropy• APࢦͷDBͷσʔλྼԽΛ͙ͨΊͷόοΫάϥϯυॲཧ• Merkle-TreeΛͬͯύʔςΟγϣϯຖͷʮνΣοΫαϜʯΛܭࢉ• ࠩΛݟ͚ͭͨΒͦ͜ΛReadRepair͢Δhash(vnode=0,pid=0)hash(vnode=1,pid=0)hash(vnode=2,pid=0)
Α͋͘Δҟৗܥ #2• Solr͕ΠϯσοΫεԽ͢Δલʹམͪͨ• Key, Valueอଘ͞Ε͕ͨΠϯσοΫεͳ͍
Read Repair͕ݺΕΕΠϯσοΫε͕म෮͞ΕΔʁNO
YZ Active Anti Entropy• ݕࡧΠϯσοΫεʹ͍ͭͯMerkle TreeΛ࡞• ύʔςΟγϣϯຖʹKey-ValueͷTreeͱൺֱ͠ɺ͕ࠩ͋ΕΠϯσοΫεΛमਖ਼hash(vnode=0,pid=0, kv)hash(vnode=0,pid=0, yz)
Α͋͘Δҟৗܥ #2•Searchઐ༻AAE͕ಈ࡞͍ͯ͠Δ•ΠϯσοΫεͱσʔλͷෆ߹Λݟ͚ͭͯम෮͢ΔAAE
Α͋͘Δҟৗܥ #3• ωοτϫʔΫ͕Ε͍ͯͨͱ͖ʹ྆ଆͰߋ৽͕ى͖ͨʢSplit Brainʣ• Hinted HandoffʹΑΓͲͪΒॻ͖ࠐΈޭ
Α͋͘Δҟৗܥ #3• ෮چ࣌ʹHandoffͰฦ͞Εɺ྆ํͷΛอ࣋• GET·ͨAAEʹΑͬͯRead Repair͞ΕΔ
ҟৗܥ #1, #2, #3•Ͳͷҟৗɺಛʹۓٸͷରॲඞཁͳ͍•߹ੑݕࠪͱम෮͕ࣗಈͰߦΘΕΔ•ϧʔνϯϫʔΫͷӡ༻࡞ۀ͕΄ͱΜͲͳ͍
·ͱΊ• σʔλϕʔεͱશจݕࡧͷ• ߹ੑΛͱΔ͔ɺૄ݁߹ΛͱΔ͔ͷ• ৽͍͠Riak Search 2.0• ਫฏࢄͯ͠εέʔϧΞτͰ͖Δݕࡧ• ݁Ռ߹ੑΛબͿ͜ͱʹΑΓૄ݁߹ԽͱࢄԽʹޭʢ͔͠APܕʹʂʣ• ݁Ռ߹ੑΛબͨ͠ઃܭʹΑΓӡ༻͕؆୯ʹ
Questions?
FAQ•ΦϯϥΠϯͷεΩʔϚมߋʁ•μΠφϛοΫεΩʔϚʁ
outline• σʔλϕʔεʹೖ͍ͬͯΔσʔλΛશจݕࡧ͍ͨ͠→ࢁͷΠϯσοΫε͕Ͱ͖ΔɺFKeyͰΠϯσοΫε…ͱ͍͏͚ͩͰ• શจݕࡧΤϯδϯͷछྨ• ֎෦ܕʢ߹ੑΛҡ࣋͠ʹ͍͕͘Ԡ༻͕ޮ͘ɺރΕͨιϑτΣΞϓϥάΠϯ͕ଟ͍ʣ• ΈࠐΈܕʢ߹ੑΛҡ͍͕࣋͢͠ɺϝϯςφϯε͕͍͠ʣ• Riakͦͷதؒɹόϯυϧͯ͠ΔͷͰָɺ͚ͩͲ࣮ମSolrɺ߹ੑΛࣗಈతʹҡ࣋͢Δ• όϯυϧͷํ๏ɺ߹ੑɺࢄͷํ๏• ݕࡧΠϯσοΫε͚ͩΛࢄͤ͞Δͷ͍͠• DBͱΠϯσοΫεͷ߹ɺϨϓϦΧͱϨϓϦΧͷ߹ɺJepsenͷͳ͠