UENISHI Kota
April 21, 2015
13k

データベース アーキテクチャーの動向と使い分け

QConTokyo ( http://www.qcontokyo.com/KotaUENISHI_2015.html ) の発表スライド

April 21, 2015

Transcript

2. ࣗݾ঺հ • @kuenishi • Github, Twitter, etc • ෼ࢄγεςϜྺ7೥ •

Bashoδϟύϯͷํ͔Βདྷ·ͨ͠ • Riak CSͷ։ൃ • εϐϦνϡΞϧͳ࿩Λ͠·͢

7. Durability “The ACID property which guarantees that transactions that have

committed will survive permanently. “ http://en.wikipedia.org/wiki/Durability_(database_systems)

18. ϨϓϦέʔγϣϯ͸೉͍͠ •CAPఆཧ • ෳ਺ͷϊʔυ͕อ͍࣋ͯ͠Δɺ࠷ॳ͸ಉ͡ΦϒδΣΫτ ʹมߋͷྻΛૹΓଓ͚Δ • ϝοηʔδ͕౸ୡ͠ͳ͍ͱ͖ʹɺશͯͷϊʔυ͕ಉ͡ม ߋͷྻΛอ࣋͢Δ͜ͱ͕Ͱ͖ͳ͍ • ೉͠͞ͷࠜݯ͸ނো୯ҐΛ෼͚ͨ͜ͱ

• ผͷ΋ͷ͕ಉҰͰ͋Δ͜ͱΛอূ͢Δ

20. ղ๏ͦͷ̌: Master-Slave • ߋ৽ͷ໋ྩྻ͕།ҰͰ͋Δ͜ͱΛอূ͢Δ • εϨʔϒɺϨϓϦΧ͸ɺܾఆ͞Εͨߋ৽ͷ໋ྩྻΛड͚ औͬͯϩʔΧϧʹ൓ө͢Δ͚ͩ • ϚελʔෆࡏͰ͸Կ΋Ͱ͖ͳ͍ c

c w1: x=a w2: x=b r1: read x w3: x=c w1: x=a w2: x=b r1: read x w3: x=c
21. ղ๏ͦͷ̍: ίϯηϯαε • ҟͳΔ࣮ମ͕ಉ͡ঢ়ଶΛ࣋ͭ͜ͱ͕ΰʔϧ • ϨϓϦέʔγϣϯ͸͍ΘΏΔ෼ࢄ߹ҙ໰୊ • ͜ΕΛղ͘௨৴ํࣜΛɺίϯηϯαεϓϩτίϧͱ͍͏ c c

w1: x=a c w2: x=b r1: read x w3: x=c

24. 2000೥୅ Web࣌୅ (1/2) • ΞϓϦέʔγϣϯɺϛυϧ΢ΣΞͷϨΠϠ(TCP/IP) ͰϨϓϦ έʔγϣϯ͕ҰൠతʹͳΔ • ωοτϫʔΫϨϕϧͰͷಉظܕɻยܥ͕ނোͯ͠΋ಈ࡞ܧଓ •

ReadΛεέʔϧΞ΢τͰ͖ΔλΠϓͷ΋ͷ΋͍͔ͭ͘ొ৔ • Master͔ΒSlave (Replica)΁ࠩ෼Λྲྀ͢λΠϓ͕ओྲྀ • MySQLͷbinlog, GFS (BigTable), HDFS (HBase)
25. Master-Slave͸೉͍͠ • Ϛελʔ੾Γସ͑ͷλΠϜϥά • Split brain଱ੑ c b w1: x=a

r1: read x w3: x=c w2: x=b
26. 2000೥୅ Web࣌୅ (2/2) • WebγεςϜͷෳࡶԽɺڊେԽ • ίϯηϯαεܕͷϨϓϦέʔγϣϯͷ࣮༻Խ • ωοτϫʔΫ෼அ͕ى͖ͯ΋ͳΜͱ͔ͳΔ •

Dynamo, Chubby, ZooKeeper, SQL Server (2008?~) • Paxos (1989), Quorum (1979) ͳͲ 2/3 Ack
27. Quorum: ίϯηϯαε͸೉͍͠ • ্ॻ͖Λڐ༰͢ΔφΠʔϒͳϓϩτίϧઃܭͰ͸؆୯ ʹσʔλ͕ഁյ͞ΕΔ • ͍ͭͰ΋୭Ͱ΋ނো͢Δ͠໧Δ͠෮׆͢Δ…ͱ͍͏ݱ ࣮ੈքͰ͸࣮༻తͰ͸ͳ͍ a? c?

w1: x=a b? w2: x=b r1: read x w3: x=c
28. Paxos: ίϯηϯαε͸೉͍͠ • 2ϑΣʔζͷ߹ҙϓϩτίϧ • Proposer (஋ΛఏҊ͢Δਓ) Λଟ਺ܾͰܾఆ • Proposed

Value (ఏҊ͞Εͨ஋) Λଟ਺ܾͰܾఆ •ఏҊ಺༰ʹॱং൪߸Λৼͬͯ৽چ؅ཧ͢Δ •͍ͭͰ΋୭Ͱ΋ނো͢Δ͠໧Δ͠෮׆͢Δ…ͱ͍͏੍໿ ԼͰ΋ɺແݶʹ͕࣌ؒ͋ͬͯա൒਺͕ނো͍ͯ͠ͳ͚Ε ͹߹ҙ͢Δ •࣮૷͸೉͍͕͠ɺؤுΕ͹ͳΜͱ͔ͳΔ
29. ίϯηϯαεܕ ϨϓϦέʔγϣϯͷ෼ྨ • CPܕ • ෳ੡ؒͷಉҰੑΛอো͢ΔλΠϓ • Paxos, RaftͳͲͷΞϧΰϦζϜΛ࠾༻ •

ωοτϫʔΫ෼அͨ͠ͱ͖ʹଟ਺ଆͷωοτϫʔΫʹ͍Δϊʔ υ͔͠ར༻Ͱ͖ͳ͍ • APܕ • ෳ੡͕׬શʹҰக͍ͯ͠ͳ͍͜ͱΛڐ༰͢Δ • Vector Clock΍CRDTʹΑΓҼՌ੔߹ੑΛอোʢ΋͘͠͸୯ͳ ΔλΠϜελϯϓʣ • ωοτϫʔΫ෼அͯ͠΋ɺ͢΂ͯͷϊʔυͰར༻Մೳ
30. ϨϓϦέʔγϣϯ͔ΒΈͨ σʔλϕʔεͷ෼ྨ • Master-Slaveܕ • ࣮૷͕γϯϓϧɺߴ଎ • ίϯηϯαεͱMaster-SlaveͷϋΠϒϦουܕ • Ϛελʔબग़ʹίϯηϯαεϓϩτίϧΛ࠾༻

• ϨϓϦέʔγϣϯͦͷ΋ͷ͸Master-Slave • ίϯηϯαεܕ • ϨϓϦέʔγϣϯʹ΋ίϯηϯαεϓϩτίϧΛར༻ • Ϛελʔނোʹ൐͏μ΢ϯλΠϜ͕ͳ͍
31. ϨϓϦέʔγϣϯ͔ΒΈͨ σʔλϕʔεͷ෼ྨ • Master-Slaveܕ • MySQL, PostgreSQL • ίϯηϯαεͱMaster-SlaveͷϋΠϒϦουܕ •

MongoDB, HBase, Redis • ίϯηϯαεܕ • Riak, Cassandra (͍ͣΕ΋AP, CPϞʔυ͋Γ) • CouchBase (CPܕ)
32. 2010೥୅ Ϋϥ΢υͷ࣌୅ • NewSQLͱ͍ΘΕΔ෼ྨͷొ৔ • FoundationDB, NuoDB • طଘͷNoSQL͕SQL(-likeͳ΋ͷ)Λ࣮૷͢Δ৔߹ •

NewSQL ͷதʹ͸ ACID Λຬͨ͢(?)΋ͷ΋ • ෳ਺σʔληϯλʔͰͷϨϓϦέʔγϣϯ͕ඞਢʹ • ωοτϫʔΫ෼அ΍ϨΠςϯγ͕ΑΓॏཁͳ՝୊ʹ • MPP͕OLAPͷϫʔΫϩʔυͰ࣮༻Խʢ෼ࢄΫΤϦॲཧʣ • BigQuery, Impala, PrestoDB

35. ෼ࢄDBͰACID •ݱ࣮తͳઃܭ͸ͻͱ௨Γ͔͠ͳ͍ •ίϯηϯαεʹΑΔMasterબग़ʴM/SϨϓϦέʔ γϣϯ or CPܕͷϨϓϦέʔγϣϯ •λΠϜελϯϓͷಉظΛอূ͢Δ࢓૊Έ •ָ؍తฒߦੑ੍ޚ •MegaStore (2011),

Spanner (2012)

CRDT, boom
38. CRDT • ָ؍తϨϓϦέʔγϣϯΛ؆୯ʹ͢Δσʔλ ܕͱϨϓϦέʔγϣϯٕज़ͷͻͱͭ • Conﬂict-Free Replicated Data Types •

w1(w2(x)) == w2(x1(x)) Λຬͨ͢Α͏ͳ σʔλܕɾσʔλߏ଄ͱԋࢉࢠͷ૊Έ߹Θͤ • ωοτϫʔΫ෼அ࣌Ͱ΋ߋ৽ɺಡΈग़͠Մೳ
39. CRDTྫ: G-Counter • merge •a͕͍࣋ͬͯΔσʔλ: {a: 1, b: 1, c:

2} •b͕͍࣋ͬͯΔσʔλ: {a: 0, b: 2, c: 0} • x => {a: 1, b:2, c:2} => 5 • update • a͕ {increment, 3} Λड͚ͱΔͱ{a: 4, b: 1, c: 2} • C < x ͱ͍͏৚݅ԋࢉΛॲཧͰ͖Δ
40. CRDTྫ: PN-Counter • merge • {a: {1,-1}, b: {1,0}, c:

{2,0}} • {a: {0,0}, b: {2, 0}, c: {0, -2}} • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2 • update • a͕ {increment, 3} Λड͚෇͚Δͱ • {a: {4,-1}, b: {1,0}, c: {2,0}} • c < x ͱ͍͏৚݅ԋࢉΛॲཧͰ͖ͳ͍
41. CRDTྫ: OR-Sets • merge • a:{“foo”:false, “bar”:true, “baz”:true} • +

b:{“bar”:true, “baz”:false}} • => {“foo”:false, “bar”:true, “baz”:true} • => [“foo”] • update • add: a:{} => +”foo” => a:{“foo”:false} • remove: a: {“foo”:false} => a: {“foo”:true}
42. CRDT • ωοτϫʔΫ෼அ࣌Ͱ΋ߋ৽ɺಡΈग़͠Մೳ • Writeͷ ”ฒߦॲཧ” ͕ՄೳʹͳΔσʔλ • ஋Λܭࢉ͢Δํ๏ʹҰఆͷ੍໿͕͋Δ •

ޮ཰తͳCRDTͷ࣮૷͸·ͩݚڀஈ֊
43. ༧૝: 2010೥୅ޙ൒ • ࣮૷໘Ͱ͸޻෉ͷ༨஍͕͋ΓɺACIDΛຬͨͦ͏ͱ͢Δ෼ࢄDB͸·ͩ ·ͩొ৔͢Δ •෼ࢄΛߟྀͨ͠ฒߦੑ੍ޚ •σʔληϯλʔΛލ͙CPܕϨϓϦέʔγϣϯɺτϥϯβΫγϣϯ •ӡ༻ϊ΢ϋ΢ͷීٴ • NoSQLσʔλϕʔεͷ࠾༻͸͠͹Β͘ଓͩ͘Ζ͏ʢ͍͔ͭ͘͸౫ଡ͞

ΕΔͩΖ͏ʣ • ڧ͍੔߹ੑͱָ؍తϨϓϦέʔγϣϯͷϋΠϒϦουܕσʔλϕʔε ͕ొ৔͢ΔͩΖ͏

46. •2000೥͜Ζ͔Βɺσʔλϕʔεͷ2େٕज़ཁૉʹɺঃʑʹ ෼ࢄγεςϜͷٕज़͕ཁૉٕज़ͱͯ͠ඞਢʹͳ͍ͬͯͬͨ •2015೥·Ͱʹొ৔ͨ͠σʔλϕʔεͷϨϓϦέʔγϣϯ ٕज़ʹ͍ͭͯ؆୯ʹղઆ •2015೥ޙ൒ʹ͸ɺCPͱAPͷϨϓϦέʔγϣϯΛಉ͡Πϯ λʔϑΣʔεͰ࢖͍෼͚ɺACIDΛຬͨ͢෼ࢄσʔλϕʔε ͕ొ৔͢ΔͩΖ͏ •2020೥ͷେ·͔ͳໝ^H༧૝Λ ·ͱΊ ※Disclaimer:

͜ͷࢿྉͷ಺༰͸্੢ͷݸਓతͳ༧૝Ͱ͋ΓɺԿΒ͔ͷະདྷΛอূ͢Δ΋ͷͰ͸͋Γ·ͤΜ
47. •Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the

feasibility of consistent, available, partition-tolerant web services. •James C. Corbett et al., 2012. Spanner: Google’s Globally Distributed Database. •Yasushi Saito and Marc Shapiro. 2005. Optimistic Replication. •Peter Bailis and Kyle Kingsbury. 2014. The Network is Reliable. •Peter Bailis et al. 2014. Coordination Avoidance in Database Systems. •Mihai Letia et al. 2010. Consistency without Concurrency Control in Large, Dynamic Systems. ࢀߟจݙ