Slide 1

Slide 1 text

݁Ռ੔߹ੑͳͲͷ෮श 2014/12/11 Ϗοάσʔλج൫ษڧձ @NTT෢ଂ໺௨ݚ Bashoδϟύϯɹ্੢

Slide 2

Slide 2 text

ࣗݾ঺հ • @kuenishi • Github, Twitter, etc • ෼ࢄγεςϜྺ6೥ • Bashoδϟύϯͷํ͔Βདྷ·ͨ͠ • Riak CSͷ։ൃ • ͦͷଞ೔ຊͷ͜ͱ • msgpack-erlang ϝϯςφ

Slide 3

Slide 3 text

BashoͱRiak •෼ࢄσʔλϕʔεʁ •RiakΛ஌͍ͬͯΔʁ •BashoΛ஌͍ͬͯΔʁ

Slide 4

Slide 4 text

͋Β͢͡ •݁Ռ੔߹ੑ͸ڧ੔߹ੑͷྼԽ൛Ͱ͸ͳ͘ɺ ผछͷ໰୊Λղͨ͘ΊͷҟͳΔఆٛ •ผछͷ໰୊ is Մ༻ੑ •ηϚϯςΟΫε͕ҟͳΔͷͰΞϓϦͷઃܭ ͷ࢓ํ͕ͪΐͬͱมΘΔ

Slide 5

Slide 5 text

݁Ռ੔߹ੑ͸΋͏ݹ͍ʁ •2006೥ͷٕज़Ͱ͠ΐ •Ϗοάσʔλؔ܎ͳ͘Ͷʁ •DynamoDB͸ڧ੔߹ੑΛఏڙ͍ͯ͠Δ •ωοτϫʔΫ͸੾Εͳ͍Ͱ͠ΐʁ •ͦΜͳͷ࢖ͬͯΔਓ͍Δͷʁ •ΞϓϦ͕࡞Γʹ͍͘…

Slide 6

Slide 6 text

Ϗοάσʔλج൫ݚڀձͱ ݁Ռ੔߹ੑ •ϏοάσʔλΛѻ͏େن໛ͳγεςϜʹͳ Ε͹ͳΔ΄ͲյΕΔ෦඼͸ଟ͍ •͕͔͔͍ۚͬͯΔͷͰɺٻΊΒΕΔՄ༻ੑ ͸ߴ͍ •ӡ༻ָ͕

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

݁Ռ੔߹ੑͷ࣮༻ྫʢ਎ۙʣ •σʔλϕʔεͷόοΫΞοϓ •rsyncͰͷϑΝΠϧͷόοΫΞοϓ •Google Wave aaaa bbbb y x

Slide 9

Slide 9 text

CAP Theorem • Consistent: ෳ਺ͷAtomic Objectʹର͢Δ ࿈ଓͨ͠ૢ࡞ (w1, w3, w4, ….) ͕શͯಉҰ Ͱ͋Δ͜ͱ (linearizable) • Available: Atomic Objectʹૢ࡞ w1, w2, …Λ࣮ߦͯ͠Ϩεϙϯε͕ಘΒΕΔ͜ͱ • Partition Tolerant: ૹͬͨϝοηʔδ͕૕ ࣦͯ͠΋ਖ਼͍͠ʢatomicʣͳϨεϙϯε͕ ಘΒΕΔ͜ͱ G1 G2 write read Gilbert and Lynch, Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services

Slide 10

Slide 10 text

CAPఆཧ͕ఆٛ͢Δ੔߹ੑ •CAPఆཧ͕ఆٛ͢Δ੔߹ੑ㲈Linearizability •શͯͷෳ੡Ͱɺશͯͷʢߋ৽ʣૢ࡞͕ॱ൪௨Γ ϦϓϨΠ͞ΕΔ͜ͱΛอো͢Δ •ACIDͷͦΕͱ͸ͪΐͬͱҧ͏

Slide 11

Slide 11 text

ͳͥڧ੔߹ੑͷ࣮ݱ͕೉͍͠ͷ͔ • ଟ਺ܾͱ͔ atomic broadcast Λ࢖͏ͱͯ͠΋ύ ϑΥʔϚϯεͷϖφϧςΟ͕͋Δ • asynchrony + partial failureͷ೉͠͞ • ࢮ׆؂ࢹ is hard => Downtime • ӡ༻ੑ • ੾Γସ͑ɺ੾Γ໭͠ɺ༳Ε·͢༳Ε·͢

Slide 12

Slide 12 text

Consistency͸೉͍͠ •ߋ৽ΛࢭΊΔʢAvailabilityΛԼ͛Δʣ͔ɺߋ৽ͷ্ॻ͖Λ ڐ͢ʢσʔλΛࣦ͏ʣ͔͔͠બ୒ࢶ͕ͳ͍ Server2 Server1 Server3 PUT V=42 PUT V=0 V=?

Slide 13

Slide 13 text

Atomic Broadcasting is Difficult • ϨϓϦέʔγϣϯ͸ॱ൪͕ೖΕସΘΔ • CPUͷΞ΢τΦϒΦʔμʔ࣮ߦͱಉ͡ w1 w1 w1 w2 w2 w2 Actor 0 Actor 1 Actor 2 w2 w2 w1

Slide 14

Slide 14 text

Consensus Based Replication • ϨϓϦέʔγϣϯͷϦʔμʔΛଟ਺ܾͰબग़ • or ϨϓϦέʔγϣϯຖʹଟ਺ܾ w1 w1 w1 w2 w2 w2 Actor 0 Actor 1 Actor 2 w2 w2 w1

Slide 15

Slide 15 text

݁Ռ੔߹ੑ •Eventual Consistency •Ͳ͏͍͏ܦ࿏ΛḷΔʹͤΑɺෳ੡͕ ࠷ऴతʹಉ͡ঢ়ଶʹऩଋ͢Δ͜ͱ •Read Repair •AAE •CRDT v0 v1 •(Vector Clocks)

Slide 16

Slide 16 text

Siblings •ͱΓ͋͑ͣෳ਺ͷόʔδϣϯͷڞଘΛڐ͢ •Ͳͷόʔδϣϯ͕ਖ਼͍͔͠ɺ΋͘͠͸Ϛʔδ͢Δ͔ΛRead࣌ʹܾఆ Server2 Server1 Server3 PUT V=42 PUT V=0 V=0 or 42 V=0 V=0 or 42 V=42

Slide 17

Slide 17 text

APΛ࣮ݱ •ωοτϫʔΫ෼அ͕ى͖͍ͯͯ΋ͱΓ͋͑ͣॻ͖ࠐΈΛڐ͢ Server2 Server1 Server3 PUT V=42 PUT V=0 Server4 ෮چͨ͠Βॻ͖໭͢ ྆ํ͓࣋ͬͯ͘

Slide 18

Slide 18 text

γϣοϐϯάΧʔτͷྫ •UnionΛͱΕ͹Α͍ Server2 Server1 Server3 PUT cart=[a,b,d] PUT cart=[a,b,c] union([a,b,c], [a,b,d]) => [a,b,c,d] [a,b,c] [a,b,c] or [a,b,d] [a,b,d]

Slide 19

Slide 19 text

Read Repair v2 v2 get(“conferences/thoughtworks”) Get Handler (FSM) client Riak Coordinating node Cluster 6 7 8 9 10 11 12 13 14 15 16 R=2 v1 v2 v2 v1 v2 v1 v1 v2 v2

Slide 20

Slide 20 text

Active Anti Entropy • APࢦ޲ͷDBͷσʔλྼԽΛ๷͙ ͨΊͷόοΫάϥ΢ϯυॲཧ • Merkle-TreeΛ࢖ͬͯύʔςΟγϣ ϯຖͷʮνΣοΫαϜʯΛܭࢉ • ࠩ෼Λݟ͚ͭͨΒͦ͜ΛRead Repair͢Δ hash(vnode=0, pid=0) hash(vnode=1, pid=0) hash(vnode=2, pid=0)

Slide 21

Slide 21 text

CRDT • ॱ൪͕ೖΕସΘͬͯ΋݁Ռ͕มΘΒͳ͍ܕ • update(w1, update(w2, Data0) = update(w2, update(w1, Data0) = Data w1 w1 w1 w2 w2 w2 Actor 0 Actor 1 Actor 2 w1(w2(Data0)) => Data w1(w2(Data0)) => Data w2(w1(Data0)) => Data

Slide 22

Slide 22 text

CRDT: PN-Counter • merge • {a: {1,-1}, b: {1,0}, c: {2,0}} • {a: {0,0}, b: {2, 0}, c: {0, -2}} • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2 • update • a͕ {increment, 3} Λड͚෇͚Δͱ • {a: {4,-1}, b: {1,0}, c: {2,0}}

Slide 23

Slide 23 text

CRDT: OR-Sets • merge • {a:{“foo”:true}, b:{“bar”:false}} • + {a:{“foo”:true}, b:{“foo”:false, “bar”:false}} • => {a:{“foo”:true}, b:{“foo”:false, “bar”:true}} • => [“bar”] • update • add: {a:{}} => +”foo” => {a:{“foo”:false}} • remove: {a: {“foo”:false}} => {a: {“foo”:true}}

Slide 24

Slide 24 text

ӡ༻ָ͕ • ΧδϡΞϧʹϊʔυ΍ωοτϫʔΫΛ্͛Լ͛Ͱ͖Δ • ੔߹ੑΛอͭͨΊͷϚελʔ͕୭͔Λؾʹ͢Δඞཁ͕ͳ͍ • ڧ੔߹ੑΛอͭͨΊͷΦϖϨʔγϣϯ͕μ΢ϯλΠϜʹͳ Βͳ͍ • ੔߹ੑνΣοΫɺϦΧόϦɺόοΫΞοϓ • ނো࣌ͷΦϖϨʔγϣϯ͕͔ͳΓ୯७

Slide 25

Slide 25 text

݁Ռ੔߹ੑΛ࠾༻ͨ͠ ৔߹ͷ՝୊ •;ͭ͏ͷϓϩάϥϛϯάͱ͸ҟͳΔηϚϯςΟ ΫεʹͳΔ •ΞϓϦέʔγϣϯʹ͜Ε·ͰͱҟͳΔલఏΛཁ ٻ͢Δ͜ͱʹͳΔ •CRDTͰҰ෦ղܾɺ͚ͩͲ…

Slide 26

Slide 26 text

Ԡ༻ྫ

Slide 27

Slide 27 text

League of Legends •MMORPGͷνϟοτ͸Մ༻ੑͱϨε ϙϯελΠϜ໋͕ •10ms ͕ੜࢮΛ෼͚Δ (C) Riot Games

Slide 28

Slide 28 text

•Riak্Ͱಈ͘ “ߴՄ༻” Ϋϥ΢υ ετϨʔδ •ΦϒδΣΫτͷϝλσʔλ͸݁Ռ ੔߹తσʔλߏ଄ •໰: 5GBͷσʔλ͕Concurrentʹ Ξοϓϩʔυ͞Ε͖ͯͨΒʁ •໰: ͔͠΋ͦΕ͕ผͷେ཮΁ͷ ΞοϓϩʔυͩͬͨΒʁ /foo.bar

Slide 29

Slide 29 text

σʔληϯλʔؒϨϓϦέʔγϣϯ •DCؒωοτϫʔΫ͸઀ଓੑ΍ ଳҬ·ͰؚΊͯৗʹਖ਼͘͠ӡ ༻͢Δͷ͕೉͍͠ •CAPఆཧͷཁ੥͔Βɺಉظత ϨϓϦέʔγϣϯ͸೉͍͠ •Մ༻ੑΛอͭͨΊʹɺ݁Ռ੔ ߹͢ΔσʔλϞσϧΛ࠾༻

Slide 30

Slide 30 text

·ͱΊ •݁Ռ੔߹ੑ͸ڧ੔߹ੑͷྼԽ൛Ͱ͸ͳ͘ɺผछͷ໰୊ Λղͨ͘ΊͷҟͳΔఆٛ •ผछͷ໰୊ is Մ༻ੑ •ηϚϯςΟΫε͕ҟͳΔͷͰΞϓϦͷઃܭͷ࢓ํ͕ ͪΐͬͱมΘΔ •݁Ռ੔߹ੑΛอͭͨΊͷ͍͔ͭ͘ͷٕज़Λ঺հ

Slide 31

Slide 31 text

We are hiring. •࣮ੈքͷ෼ࢄγεςϜͷ ໰୊ʹڵຯ͋Δਓʂ •@BashoJapan •[email protected]

Slide 32

Slide 32 text

Questions?