Upgrade to Pro — share decks privately, control downloads, hide ads and more …

データベース アーキテクチャーの動向と使い分け

データベース アーキテクチャーの動向と使い分け

QConTokyo ( http://www.qcontokyo.com/KotaUENISHI_2015.html ) の発表スライド

E1923013dacab39eb231a2fffbf7b33c?s=128

UENISHI Kota

April 21, 2015
Tweet

Transcript

  1. σʔλϕʔε ΞʔΩςΫνϟʔͷಈ޲ ͱ࢖͍෼͚ 2015/4/21 QConTokyo Bashoδϟύϯɹ্੢߁ଠ

  2. ࣗݾ঺հ • @kuenishi • Github, Twitter, etc • ෼ࢄγεςϜྺ7೥ •

    Bashoδϟύϯͷํ͔Βདྷ·ͨ͠ • Riak CSͷ։ൃ • εϐϦνϡΞϧͳ࿩Λ͠·͢
  3. ఏڙ

  4. ACID

  5. Atomicity Consistency Isolation Durability

  6. Atomicity Consistency Isolation Durability

  7. Durability “The ACID property which guarantees that transactions that have

    committed will survive permanently. “ http://en.wikipedia.org/wiki/Durability_(database_systems)
  8. Permanently?

  9. ӬଓԽ͸ ʢࢥ͍ͬͯͨΑΓʣ λΠϔϯ

  10. https://www.flickr.com/photos/bathyporeia/9086009348/

  11. ॾߦແৗ सब्बेसंखाराअिफच्चा େൠᔷᒒܦɹॾߦແৗ၎

  12. ੜ໓ͷ๏͸ۤͰ͋Δͱ͞Ε͍ͯΔ͕ɺ ੜ໓͢Δ͔ΒۤͳͷͰ͸ͳ͍ɻੜ໓͢ ΔଘࡏͰ͋Δʹ΋͔͔ΘΒͣɺͦΕΛ ৗॅͳ΋ͷͰ͋Δͱ؍Δ͔Β͕ۤੜ͡ ΔͷͰ͋Δɻ Wikipedia ʮॾߦແৗʯ

  13. ه࿥ഔମ͸յΕΔ

  14. ه࿥ഔମ͸յΕΔ ͳΒ ෳ੡͢Ε͹Α͍

  15. ཧ૝ https://www.flickr.com/photos/rebeccalongworth/3445143739/

  16. ݱ࣮ https://www.flickr.com/photos/leeziet/3021612079/ https://www.flickr.com/photos/pranavbhasin/6109327813/

  17. CAPఆཧ

  18. ϨϓϦέʔγϣϯ͸೉͍͠ •CAPఆཧ • ෳ਺ͷϊʔυ͕อ͍࣋ͯ͠Δɺ࠷ॳ͸ಉ͡ΦϒδΣΫτ ʹมߋͷྻΛૹΓଓ͚Δ • ϝοηʔδ͕౸ୡ͠ͳ͍ͱ͖ʹɺશͯͷϊʔυ͕ಉ͡ม ߋͷྻΛอ࣋͢Δ͜ͱ͕Ͱ͖ͳ͍ • ೉͠͞ͷࠜݯ͸ނো୯ҐΛ෼͚ͨ͜ͱ

    • ผͷ΋ͷ͕ಉҰͰ͋Δ͜ͱΛอূ͢Δ
  19. ຌྫ

  20. ղ๏ͦͷ̌: Master-Slave • ߋ৽ͷ໋ྩྻ͕།ҰͰ͋Δ͜ͱΛอূ͢Δ • εϨʔϒɺϨϓϦΧ͸ɺܾఆ͞Εͨߋ৽ͷ໋ྩྻΛड͚ औͬͯϩʔΧϧʹ൓ө͢Δ͚ͩ • ϚελʔෆࡏͰ͸Կ΋Ͱ͖ͳ͍ c

    c w1: x=a w2: x=b r1: read x w3: x=c w1: x=a w2: x=b r1: read x w3: x=c
  21. ղ๏ͦͷ̍: ίϯηϯαε • ҟͳΔ࣮ମ͕ಉ͡ঢ়ଶΛ࣋ͭ͜ͱ͕ΰʔϧ • ϨϓϦέʔγϣϯ͸͍ΘΏΔ෼ࢄ߹ҙ໰୊ • ͜ΕΛղ͘௨৴ํࣜΛɺίϯηϯαεϓϩτίϧͱ͍͏ c c

    w1: x=a c w2: x=b r1: read x w3: x=c
  22. ۪ऀ͸ܦݧʹֶͼɺ ݡऀ͸ྺ࢙ʹֶͿ

  23. 1990೥୅ RDBMSීٴظ •OS͕ҙࣝ͢Δ͜ͱ͸ͳ͘ɺΧʔωϧҎԼͰෳ੡ •σΟεΫϨϕϧͰಉظܕɻยํ͕ނোͨ͠Βશܥނো •ϋʔυσΟεΫͱωοτϫʔΫଳҬ͕رগ •RAID, SANͰ·ͱΊͯҰݩ؅ཧɺӡ༻ •τϥϯβΫγϣϯɺΫΤϦॲཧͷجૅٕज़ͷཱ֬

  24. 2000೥୅ Web࣌୅ (1/2) • ΞϓϦέʔγϣϯɺϛυϧ΢ΣΞͷϨΠϠ(TCP/IP) ͰϨϓϦ έʔγϣϯ͕ҰൠతʹͳΔ • ωοτϫʔΫϨϕϧͰͷಉظܕɻยܥ͕ނোͯ͠΋ಈ࡞ܧଓ •

    ReadΛεέʔϧΞ΢τͰ͖ΔλΠϓͷ΋ͷ΋͍͔ͭ͘ొ৔ • Master͔ΒSlave (Replica)΁ࠩ෼Λྲྀ͢λΠϓ͕ओྲྀ • MySQLͷbinlog, GFS (BigTable), HDFS (HBase)
  25. Master-Slave͸೉͍͠ • Ϛελʔ੾Γସ͑ͷλΠϜϥά • Split brain଱ੑ c b w1: x=a

    r1: read x w3: x=c w2: x=b
  26. 2000೥୅ Web࣌୅ (2/2) • WebγεςϜͷෳࡶԽɺڊେԽ • ίϯηϯαεܕͷϨϓϦέʔγϣϯͷ࣮༻Խ • ωοτϫʔΫ෼அ͕ى͖ͯ΋ͳΜͱ͔ͳΔ •

    Dynamo, Chubby, ZooKeeper, SQL Server (2008?~) • Paxos (1989), Quorum (1979) ͳͲ 2/3 Ack
  27. Quorum: ίϯηϯαε͸೉͍͠ • ্ॻ͖Λڐ༰͢ΔφΠʔϒͳϓϩτίϧઃܭͰ͸؆୯ ʹσʔλ͕ഁյ͞ΕΔ • ͍ͭͰ΋୭Ͱ΋ނো͢Δ͠໧Δ͠෮׆͢Δ…ͱ͍͏ݱ ࣮ੈքͰ͸࣮༻తͰ͸ͳ͍ a? c?

    w1: x=a b? w2: x=b r1: read x w3: x=c
  28. Paxos: ίϯηϯαε͸೉͍͠ • 2ϑΣʔζͷ߹ҙϓϩτίϧ • Proposer (஋ΛఏҊ͢Δਓ) Λଟ਺ܾͰܾఆ • Proposed

    Value (ఏҊ͞Εͨ஋) Λଟ਺ܾͰܾఆ •ఏҊ಺༰ʹॱং൪߸Λৼͬͯ৽چ؅ཧ͢Δ •͍ͭͰ΋୭Ͱ΋ނো͢Δ͠໧Δ͠෮׆͢Δ…ͱ͍͏੍໿ ԼͰ΋ɺແݶʹ͕࣌ؒ͋ͬͯա൒਺͕ނো͍ͯ͠ͳ͚Ε ͹߹ҙ͢Δ •࣮૷͸೉͍͕͠ɺؤுΕ͹ͳΜͱ͔ͳΔ
  29. ίϯηϯαεܕ ϨϓϦέʔγϣϯͷ෼ྨ • CPܕ • ෳ੡ؒͷಉҰੑΛอো͢ΔλΠϓ • Paxos, RaftͳͲͷΞϧΰϦζϜΛ࠾༻ •

    ωοτϫʔΫ෼அͨ͠ͱ͖ʹଟ਺ଆͷωοτϫʔΫʹ͍Δϊʔ υ͔͠ར༻Ͱ͖ͳ͍ • APܕ • ෳ੡͕׬શʹҰக͍ͯ͠ͳ͍͜ͱΛڐ༰͢Δ • Vector Clock΍CRDTʹΑΓҼՌ੔߹ੑΛอোʢ΋͘͠͸୯ͳ ΔλΠϜελϯϓʣ • ωοτϫʔΫ෼அͯ͠΋ɺ͢΂ͯͷϊʔυͰར༻Մೳ
  30. ϨϓϦέʔγϣϯ͔ΒΈͨ σʔλϕʔεͷ෼ྨ • Master-Slaveܕ • ࣮૷͕γϯϓϧɺߴ଎ • ίϯηϯαεͱMaster-SlaveͷϋΠϒϦουܕ • Ϛελʔબग़ʹίϯηϯαεϓϩτίϧΛ࠾༻

    • ϨϓϦέʔγϣϯͦͷ΋ͷ͸Master-Slave • ίϯηϯαεܕ • ϨϓϦέʔγϣϯʹ΋ίϯηϯαεϓϩτίϧΛར༻ • Ϛελʔނোʹ൐͏μ΢ϯλΠϜ͕ͳ͍
  31. ϨϓϦέʔγϣϯ͔ΒΈͨ σʔλϕʔεͷ෼ྨ • Master-Slaveܕ • MySQL, PostgreSQL • ίϯηϯαεͱMaster-SlaveͷϋΠϒϦουܕ •

    MongoDB, HBase, Redis • ίϯηϯαεܕ • Riak, Cassandra (͍ͣΕ΋AP, CPϞʔυ͋Γ) • CouchBase (CPܕ)
  32. 2010೥୅ Ϋϥ΢υͷ࣌୅ • NewSQLͱ͍ΘΕΔ෼ྨͷొ৔ • FoundationDB, NuoDB • طଘͷNoSQL͕SQL(-likeͳ΋ͷ)Λ࣮૷͢Δ৔߹ •

    NewSQL ͷதʹ͸ ACID Λຬͨ͢(?)΋ͷ΋ • ෳ਺σʔληϯλʔͰͷϨϓϦέʔγϣϯ͕ඞਢʹ • ωοτϫʔΫ෼அ΍ϨΠςϯγ͕ΑΓॏཁͳ՝୊ʹ • MPP͕OLAPͷϫʔΫϩʔυͰ࣮༻Խʢ෼ࢄΫΤϦॲཧʣ • BigQuery, Impala, PrestoDB
  33. ෮श: σʔλϕʔεͷཁૉٕज़ •ΫΤϦॲཧͷ࠷దԽ •SQLΛղੳͯ͠ɺ౷ܭ৘ใ͔Β࠷దͳΫΤϦϓϥϯ Λ࡞੒ɾ࣮ߦ͢Δ •ͦͷͨΊͷσʔλ഑ஔɺΠϯσοΫεઓུ •τϥϯβΫγϣϯॲཧͷ࠷దԽ •AnomalyΛഉআ͠੔߹ੑΛอূͭͭ͠ɺͳΔ΂͘ ଎͘σʔλΛߋ৽͍ͯ͘͠

  34. Ϋϥ΢υ࣌୅ͷ σʔλϕʔεͷཁૉٕज़ •෼ࢄ؀ڥͰͷΫΤϦॲཧͷ࠷దԽ •MPPͰฒྻॲཧɺނো࣌͸౤ػ࣮ߦ •Nested ColumnarͰσʔλ഑ஔΛہॴԽ •෼ࢄ؀ڥͰͷτϥϯβΫγϣϯॲཧͷ࠷దԽ •෼ࢄ͍ͯ͠ΔͷʹAnomalyΛഉআʁ੔߹ੑΛอূʁ •ϊʔυ͚ؒͩͰͳ͘DCؒͷ੔߹ੑ΋՝୊

  35. ෼ࢄDBͰACID •ݱ࣮తͳઃܭ͸ͻͱ௨Γ͔͠ͳ͍ •ίϯηϯαεʹΑΔMasterબग़ʴM/SϨϓϦέʔ γϣϯ or CPܕͷϨϓϦέʔγϣϯ •λΠϜελϯϓͷಉظΛอূ͢Δ࢓૊Έ •ָ؍తฒߦੑ੍ޚ •MegaStore (2011),

    Spanner (2012)
  36. ෼ࢄDBͰACID •ͦͷ··τϨʔυΦϑʹͳΔ •ωοτϫʔΫ෼அ࣌ͷՄ༻ੑ •λΠϜελϯϓ؅ཧϊʔυʁ→SPOF •TSOΛOLTPͷϫʔΫϩʔυʹͦͷ··Ԡ༻͠ ͨΒΞϘʔτͷཛྷ

  37. WriteΛεέʔϧͤ͞Δ •PaxosͳͲ͸ɺίϯηϯαεϝϯόΛݻఆ͠ͳ͚Ε͹ͳΒͳ͍ •ָ؍తϨϓϦέʔγϣϯ (2005) •ڧ͍੔߹ੑΛຬͨ͞ͳ͍͕ɺಛघͳঢ়گԼͰผछͷ੔߹ੑΛอূ͢Δ ࢓૊Έʢ݁Ռ੔߹ੑͳͲʣ •DNSͳͲ •ԋࢉࢠͷॱ൪͕ೖΕସΘͬͯ΋੔߹͢ΔΑ͏ͳσʔλ؅ཧͷ࢓૊Έ •Vector Clocks,

    CRDT, boom
  38. CRDT • ָ؍తϨϓϦέʔγϣϯΛ؆୯ʹ͢Δσʔλ ܕͱϨϓϦέʔγϣϯٕज़ͷͻͱͭ • Conflict-Free Replicated Data Types •

    w1(w2(x)) == w2(x1(x)) Λຬͨ͢Α͏ͳ σʔλܕɾσʔλߏ଄ͱԋࢉࢠͷ૊Έ߹Θͤ • ωοτϫʔΫ෼அ࣌Ͱ΋ߋ৽ɺಡΈग़͠Մೳ
  39. CRDTྫ: G-Counter • merge •a͕͍࣋ͬͯΔσʔλ: {a: 1, b: 1, c:

    2} •b͕͍࣋ͬͯΔσʔλ: {a: 0, b: 2, c: 0} • x => {a: 1, b:2, c:2} => 5 • update • a͕ {increment, 3} Λड͚ͱΔͱ{a: 4, b: 1, c: 2} • C < x ͱ͍͏৚݅ԋࢉΛॲཧͰ͖Δ
  40. CRDTྫ: PN-Counter • merge • {a: {1,-1}, b: {1,0}, c:

    {2,0}} • {a: {0,0}, b: {2, 0}, c: {0, -2}} • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2 • update • a͕ {increment, 3} Λड͚෇͚Δͱ • {a: {4,-1}, b: {1,0}, c: {2,0}} • c < x ͱ͍͏৚݅ԋࢉΛॲཧͰ͖ͳ͍
  41. CRDTྫ: OR-Sets • merge • a:{“foo”:false, “bar”:true, “baz”:true} • +

    b:{“bar”:true, “baz”:false}} • => {“foo”:false, “bar”:true, “baz”:true} • => [“foo”] • update • add: a:{} => +”foo” => a:{“foo”:false} • remove: a: {“foo”:false} => a: {“foo”:true}
  42. CRDT • ωοτϫʔΫ෼அ࣌Ͱ΋ߋ৽ɺಡΈग़͠Մೳ • Writeͷ ”ฒߦॲཧ” ͕ՄೳʹͳΔσʔλ • ஋Λܭࢉ͢Δํ๏ʹҰఆͷ੍໿͕͋Δ •

    ޮ཰తͳCRDTͷ࣮૷͸·ͩݚڀஈ֊
  43. ༧૝: 2010೥୅ޙ൒ • ࣮૷໘Ͱ͸޻෉ͷ༨஍͕͋ΓɺACIDΛຬͨͦ͏ͱ͢Δ෼ࢄDB͸·ͩ ·ͩొ৔͢Δ •෼ࢄΛߟྀͨ͠ฒߦੑ੍ޚ •σʔληϯλʔΛލ͙CPܕϨϓϦέʔγϣϯɺτϥϯβΫγϣϯ •ӡ༻ϊ΢ϋ΢ͷීٴ • NoSQLσʔλϕʔεͷ࠾༻͸͠͹Β͘ଓͩ͘Ζ͏ʢ͍͔ͭ͘͸౫ଡ͞

    ΕΔͩΖ͏ʣ • ڧ͍੔߹ੑͱָ؍తϨϓϦέʔγϣϯͷϋΠϒϦουܕσʔλϕʔε ͕ొ৔͢ΔͩΖ͏
  44. •OLTP޲͚ͷσʔλϕʔε͕ߋ৽ॲཧͷՄ༻ੑͱੑೳΛ໨తʹָ؍Ϩ ϓϦέʔγϣϯΛಋೖ࢝͠ΊΔͩΖ͏ •ۀ຿ॲཧͷ͏ͪඞͣ͠΋શ͕ͯڧ͍੔߹ੑΛඞཁͱ͍ͯ͠ΔΘ͚Ͱ ͸ͳ͍ •ΞϓϦέʔγϣϯଆͰڧ͍੔߹ੑͱָ؍తϨϓϦέʔγϣϯΛ࢖͍ ෼͚Δ͜ͱͰύϑΥʔϚϯεΛग़͢͜ͱ͕ظ଴Ͱ͖Δ •ΠϯλʔϑΣʔεͱͯ͠͸SQL, JDBC͕࢒ΔͷͰ͸ͳ͍͔ ϋΠϒϦουܕσʔλϕʔε

  45. •Θ͔Γ·ͤΜ •ϋΠϒϦουܕσʔλϕʔεͷ҆ఆ࣮ͨ͠૷͕ొ৔ɺීٴ •SSD΍NVM͕ීٴ͠IO͸ϘτϧωοΫͰ͸ͳ͘ͳΔ •Shared NothingܕͷεέʔϧΞ΢τܕDBͰ͸ͳ͘ͳΔͷͰ͸ •ϝϞϦόϯυ෯·ͨ͸CPU͕ϘτϧωοΫʹͳΔ •৽͍͠ϋʔυ΢ΣΞ͕ొ৔͢Ε͹ɺ·ͨͲ͏ͳΔ͔෼͔Βͳ͍ •2000೥୅ͷٕज़X͕࠶ొ৔ ༧૝: 2020೥୅

  46. •2000೥͜Ζ͔Βɺσʔλϕʔεͷ2େٕज़ཁૉʹɺঃʑʹ ෼ࢄγεςϜͷٕज़͕ཁૉٕज़ͱͯ͠ඞਢʹͳ͍ͬͯͬͨ •2015೥·Ͱʹొ৔ͨ͠σʔλϕʔεͷϨϓϦέʔγϣϯ ٕज़ʹ͍ͭͯ؆୯ʹղઆ •2015೥ޙ൒ʹ͸ɺCPͱAPͷϨϓϦέʔγϣϯΛಉ͡Πϯ λʔϑΣʔεͰ࢖͍෼͚ɺACIDΛຬͨ͢෼ࢄσʔλϕʔε ͕ొ৔͢ΔͩΖ͏ •2020೥ͷେ·͔ͳໝ^H༧૝Λ ·ͱΊ ※Disclaimer:

    ͜ͷࢿྉͷ಺༰͸্੢ͷݸਓతͳ༧૝Ͱ͋ΓɺԿΒ͔ͷະདྷΛอূ͢Δ΋ͷͰ͸͋Γ·ͤΜ
  47. •Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the

    feasibility of consistent, available, partition-tolerant web services. •James C. Corbett et al., 2012. Spanner: Google’s Globally Distributed Database. •Yasushi Saito and Marc Shapiro. 2005. Optimistic Replication. •Peter Bailis and Kyle Kingsbury. 2014. The Network is Reliable. •Peter Bailis et al. 2014. Coordination Avoidance in Database Systems. •Mihai Letia et al. 2010. Consistency without Concurrency Control in Large, Dynamic Systems. ࢀߟจݙ