Upgrade to Pro — share decks privately, control downloads, hide ads and more …

データベース アーキテクチャーの動向と使い分け

データベース アーキテクチャーの動向と使い分け

QConTokyo ( http://www.qcontokyo.com/KotaUENISHI_2015.html ) の発表スライド

UENISHI Kota

April 21, 2015
Tweet

More Decks by UENISHI Kota

Other Decks in Technology

Transcript

  1. σʔλϕʔε
    ΞʔΩςΫνϟʔͷಈ޲
    ͱ࢖͍෼͚
    2015/4/21 QConTokyo
    Bashoδϟύϯɹ্੢߁ଠ

    View Slide

  2. ࣗݾ঺հ
    • @kuenishi
    • Github, Twitter, etc
    • ෼ࢄγεςϜྺ7೥
    • Bashoδϟύϯͷํ͔Βདྷ·ͨ͠
    • Riak CSͷ։ൃ
    • εϐϦνϡΞϧͳ࿩Λ͠·͢

    View Slide

  3. ఏڙ

    View Slide

  4. ACID

    View Slide

  5. Atomicity
    Consistency
    Isolation
    Durability

    View Slide

  6. Atomicity
    Consistency
    Isolation
    Durability

    View Slide

  7. Durability
    “The ACID property which guarantees
    that transactions that have committed
    will survive permanently. “
    http://en.wikipedia.org/wiki/Durability_(database_systems)

    View Slide

  8. Permanently?

    View Slide

  9. ӬଓԽ͸
    ʢࢥ͍ͬͯͨΑΓʣ
    λΠϔϯ

    View Slide

  10. https://www.flickr.com/photos/bathyporeia/9086009348/

    View Slide

  11. ॾߦແৗ
    सब्बेसंखाराअिफच्चा
    େൠᔷᒒܦɹॾߦແৗ၎

    View Slide

  12. ੜ໓ͷ๏͸ۤͰ͋Δͱ͞Ε͍ͯΔ͕ɺ
    ੜ໓͢Δ͔ΒۤͳͷͰ͸ͳ͍ɻੜ໓͢
    ΔଘࡏͰ͋Δʹ΋͔͔ΘΒͣɺͦΕΛ
    ৗॅͳ΋ͷͰ͋Δͱ؍Δ͔Β͕ۤੜ͡
    ΔͷͰ͋Δɻ
    Wikipedia ʮॾߦແৗʯ

    View Slide

  13. ه࿥ഔମ͸յΕΔ

    View Slide

  14. ه࿥ഔମ͸յΕΔ
    ͳΒ
    ෳ੡͢Ε͹Α͍

    View Slide

  15. ཧ૝
    https://www.flickr.com/photos/rebeccalongworth/3445143739/

    View Slide

  16. ݱ࣮
    https://www.flickr.com/photos/leeziet/3021612079/
    https://www.flickr.com/photos/pranavbhasin/6109327813/

    View Slide

  17. CAPఆཧ

    View Slide

  18. ϨϓϦέʔγϣϯ͸೉͍͠
    •CAPఆཧ
    • ෳ਺ͷϊʔυ͕อ͍࣋ͯ͠Δɺ࠷ॳ͸ಉ͡ΦϒδΣΫτ
    ʹมߋͷྻΛૹΓଓ͚Δ
    • ϝοηʔδ͕౸ୡ͠ͳ͍ͱ͖ʹɺશͯͷϊʔυ͕ಉ͡ม
    ߋͷྻΛอ࣋͢Δ͜ͱ͕Ͱ͖ͳ͍
    • ೉͠͞ͷࠜݯ͸ނো୯ҐΛ෼͚ͨ͜ͱ
    • ผͷ΋ͷ͕ಉҰͰ͋Δ͜ͱΛอূ͢Δ

    View Slide

  19. ຌྫ

    View Slide

  20. ղ๏ͦͷ̌: Master-Slave
    • ߋ৽ͷ໋ྩྻ͕།ҰͰ͋Δ͜ͱΛอূ͢Δ
    • εϨʔϒɺϨϓϦΧ͸ɺܾఆ͞Εͨߋ৽ͷ໋ྩྻΛड͚
    औͬͯϩʔΧϧʹ൓ө͢Δ͚ͩ
    • ϚελʔෆࡏͰ͸Կ΋Ͱ͖ͳ͍
    c c
    w1: x=a
    w2: x=b
    r1: read x
    w3: x=c
    w1: x=a
    w2: x=b
    r1: read x
    w3: x=c

    View Slide

  21. ղ๏ͦͷ̍: ίϯηϯαε
    • ҟͳΔ࣮ମ͕ಉ͡ঢ়ଶΛ࣋ͭ͜ͱ͕ΰʔϧ
    • ϨϓϦέʔγϣϯ͸͍ΘΏΔ෼ࢄ߹ҙ໰୊
    • ͜ΕΛղ͘௨৴ํࣜΛɺίϯηϯαεϓϩτίϧͱ͍͏
    c c
    w1: x=a
    c
    w2: x=b
    r1: read x
    w3: x=c

    View Slide

  22. ۪ऀ͸ܦݧʹֶͼɺ
    ݡऀ͸ྺ࢙ʹֶͿ

    View Slide

  23. 1990೥୅ RDBMSීٴظ
    •OS͕ҙࣝ͢Δ͜ͱ͸ͳ͘ɺΧʔωϧҎԼͰෳ੡
    •σΟεΫϨϕϧͰಉظܕɻยํ͕ނোͨ͠Βશܥނো
    •ϋʔυσΟεΫͱωοτϫʔΫଳҬ͕رগ
    •RAID, SANͰ·ͱΊͯҰݩ؅ཧɺӡ༻
    •τϥϯβΫγϣϯɺΫΤϦॲཧͷجૅٕज़ͷཱ֬

    View Slide

  24. 2000೥୅ Web࣌୅ (1/2)
    • ΞϓϦέʔγϣϯɺϛυϧ΢ΣΞͷϨΠϠ(TCP/IP) ͰϨϓϦ
    έʔγϣϯ͕ҰൠతʹͳΔ
    • ωοτϫʔΫϨϕϧͰͷಉظܕɻยܥ͕ނোͯ͠΋ಈ࡞ܧଓ
    • ReadΛεέʔϧΞ΢τͰ͖ΔλΠϓͷ΋ͷ΋͍͔ͭ͘ొ৔
    • Master͔ΒSlave (Replica)΁ࠩ෼Λྲྀ͢λΠϓ͕ओྲྀ
    • MySQLͷbinlog, GFS (BigTable), HDFS (HBase)

    View Slide

  25. Master-Slave͸೉͍͠
    • Ϛελʔ੾Γସ͑ͷλΠϜϥά
    • Split brain଱ੑ
    c b
    w1: x=a
    r1: read x
    w3: x=c
    w2: x=b

    View Slide

  26. 2000೥୅ Web࣌୅ (2/2)
    • WebγεςϜͷෳࡶԽɺڊେԽ
    • ίϯηϯαεܕͷϨϓϦέʔγϣϯͷ࣮༻Խ
    • ωοτϫʔΫ෼அ͕ى͖ͯ΋ͳΜͱ͔ͳΔ
    • Dynamo, Chubby, ZooKeeper, SQL Server (2008?~)
    • Paxos (1989), Quorum (1979) ͳͲ
    2/3 Ack

    View Slide

  27. Quorum: ίϯηϯαε͸೉͍͠
    • ্ॻ͖Λڐ༰͢ΔφΠʔϒͳϓϩτίϧઃܭͰ͸؆୯
    ʹσʔλ͕ഁյ͞ΕΔ
    • ͍ͭͰ΋୭Ͱ΋ނো͢Δ͠໧Δ͠෮׆͢Δ…ͱ͍͏ݱ
    ࣮ੈքͰ͸࣮༻తͰ͸ͳ͍
    a? c?
    w1: x=a
    b?
    w2: x=b
    r1: read x
    w3: x=c

    View Slide

  28. Paxos: ίϯηϯαε͸೉͍͠
    • 2ϑΣʔζͷ߹ҙϓϩτίϧ
    • Proposer (஋ΛఏҊ͢Δਓ) Λଟ਺ܾͰܾఆ
    • Proposed Value (ఏҊ͞Εͨ஋) Λଟ਺ܾͰܾఆ
    •ఏҊ಺༰ʹॱং൪߸Λৼͬͯ৽چ؅ཧ͢Δ
    •͍ͭͰ΋୭Ͱ΋ނো͢Δ͠໧Δ͠෮׆͢Δ…ͱ͍͏੍໿
    ԼͰ΋ɺແݶʹ͕࣌ؒ͋ͬͯա൒਺͕ނো͍ͯ͠ͳ͚Ε
    ͹߹ҙ͢Δ
    •࣮૷͸೉͍͕͠ɺؤுΕ͹ͳΜͱ͔ͳΔ

    View Slide

  29. ίϯηϯαεܕ
    ϨϓϦέʔγϣϯͷ෼ྨ
    • CPܕ
    • ෳ੡ؒͷಉҰੑΛอো͢ΔλΠϓ
    • Paxos, RaftͳͲͷΞϧΰϦζϜΛ࠾༻
    • ωοτϫʔΫ෼அͨ͠ͱ͖ʹଟ਺ଆͷωοτϫʔΫʹ͍Δϊʔ
    υ͔͠ར༻Ͱ͖ͳ͍
    • APܕ
    • ෳ੡͕׬શʹҰக͍ͯ͠ͳ͍͜ͱΛڐ༰͢Δ
    • Vector Clock΍CRDTʹΑΓҼՌ੔߹ੑΛอোʢ΋͘͠͸୯ͳ
    ΔλΠϜελϯϓʣ
    • ωοτϫʔΫ෼அͯ͠΋ɺ͢΂ͯͷϊʔυͰར༻Մೳ

    View Slide

  30. ϨϓϦέʔγϣϯ͔ΒΈͨ
    σʔλϕʔεͷ෼ྨ
    • Master-Slaveܕ
    • ࣮૷͕γϯϓϧɺߴ଎
    • ίϯηϯαεͱMaster-SlaveͷϋΠϒϦουܕ
    • Ϛελʔબग़ʹίϯηϯαεϓϩτίϧΛ࠾༻
    • ϨϓϦέʔγϣϯͦͷ΋ͷ͸Master-Slave
    • ίϯηϯαεܕ
    • ϨϓϦέʔγϣϯʹ΋ίϯηϯαεϓϩτίϧΛར༻
    • Ϛελʔނোʹ൐͏μ΢ϯλΠϜ͕ͳ͍

    View Slide

  31. ϨϓϦέʔγϣϯ͔ΒΈͨ
    σʔλϕʔεͷ෼ྨ
    • Master-Slaveܕ
    • MySQL, PostgreSQL
    • ίϯηϯαεͱMaster-SlaveͷϋΠϒϦουܕ
    • MongoDB, HBase, Redis
    • ίϯηϯαεܕ
    • Riak, Cassandra (͍ͣΕ΋AP, CPϞʔυ͋Γ)
    • CouchBase (CPܕ)

    View Slide

  32. 2010೥୅ Ϋϥ΢υͷ࣌୅
    • NewSQLͱ͍ΘΕΔ෼ྨͷొ৔
    • FoundationDB, NuoDB
    • طଘͷNoSQL͕SQL(-likeͳ΋ͷ)Λ࣮૷͢Δ৔߹
    • NewSQL ͷதʹ͸ ACID Λຬͨ͢(?)΋ͷ΋
    • ෳ਺σʔληϯλʔͰͷϨϓϦέʔγϣϯ͕ඞਢʹ
    • ωοτϫʔΫ෼அ΍ϨΠςϯγ͕ΑΓॏཁͳ՝୊ʹ
    • MPP͕OLAPͷϫʔΫϩʔυͰ࣮༻Խʢ෼ࢄΫΤϦॲཧʣ
    • BigQuery, Impala, PrestoDB

    View Slide

  33. ෮श: σʔλϕʔεͷཁૉٕज़
    •ΫΤϦॲཧͷ࠷దԽ
    •SQLΛղੳͯ͠ɺ౷ܭ৘ใ͔Β࠷దͳΫΤϦϓϥϯ
    Λ࡞੒ɾ࣮ߦ͢Δ
    •ͦͷͨΊͷσʔλ഑ஔɺΠϯσοΫεઓུ
    •τϥϯβΫγϣϯॲཧͷ࠷దԽ
    •AnomalyΛഉআ͠੔߹ੑΛอূͭͭ͠ɺͳΔ΂͘
    ଎͘σʔλΛߋ৽͍ͯ͘͠

    View Slide

  34. Ϋϥ΢υ࣌୅ͷ
    σʔλϕʔεͷཁૉٕज़
    •෼ࢄ؀ڥͰͷΫΤϦॲཧͷ࠷దԽ
    •MPPͰฒྻॲཧɺނো࣌͸౤ػ࣮ߦ
    •Nested ColumnarͰσʔλ഑ஔΛہॴԽ
    •෼ࢄ؀ڥͰͷτϥϯβΫγϣϯॲཧͷ࠷దԽ
    •෼ࢄ͍ͯ͠ΔͷʹAnomalyΛഉআʁ੔߹ੑΛอূʁ
    •ϊʔυ͚ؒͩͰͳ͘DCؒͷ੔߹ੑ΋՝୊

    View Slide

  35. ෼ࢄDBͰACID
    •ݱ࣮తͳઃܭ͸ͻͱ௨Γ͔͠ͳ͍
    •ίϯηϯαεʹΑΔMasterબग़ʴM/SϨϓϦέʔ
    γϣϯ or CPܕͷϨϓϦέʔγϣϯ
    •λΠϜελϯϓͷಉظΛอূ͢Δ࢓૊Έ
    •ָ؍తฒߦੑ੍ޚ
    •MegaStore (2011), Spanner (2012)

    View Slide

  36. ෼ࢄDBͰACID
    •ͦͷ··τϨʔυΦϑʹͳΔ
    •ωοτϫʔΫ෼அ࣌ͷՄ༻ੑ
    •λΠϜελϯϓ؅ཧϊʔυʁ→SPOF
    •TSOΛOLTPͷϫʔΫϩʔυʹͦͷ··Ԡ༻͠
    ͨΒΞϘʔτͷཛྷ

    View Slide

  37. WriteΛεέʔϧͤ͞Δ
    •PaxosͳͲ͸ɺίϯηϯαεϝϯόΛݻఆ͠ͳ͚Ε͹ͳΒͳ͍
    •ָ؍తϨϓϦέʔγϣϯ (2005)
    •ڧ͍੔߹ੑΛຬͨ͞ͳ͍͕ɺಛघͳঢ়گԼͰผछͷ੔߹ੑΛอূ͢Δ
    ࢓૊Έʢ݁Ռ੔߹ੑͳͲʣ
    •DNSͳͲ
    •ԋࢉࢠͷॱ൪͕ೖΕସΘͬͯ΋੔߹͢ΔΑ͏ͳσʔλ؅ཧͷ࢓૊Έ
    •Vector Clocks, CRDT, boom

    View Slide

  38. CRDT
    • ָ؍తϨϓϦέʔγϣϯΛ؆୯ʹ͢Δσʔλ
    ܕͱϨϓϦέʔγϣϯٕज़ͷͻͱͭ
    • Conflict-Free Replicated Data Types
    • w1(w2(x)) == w2(x1(x)) Λຬͨ͢Α͏ͳ
    σʔλܕɾσʔλߏ଄ͱԋࢉࢠͷ૊Έ߹Θͤ
    • ωοτϫʔΫ෼அ࣌Ͱ΋ߋ৽ɺಡΈग़͠Մೳ

    View Slide

  39. CRDTྫ: G-Counter
    • merge
    •a͕͍࣋ͬͯΔσʔλ: {a: 1, b: 1, c: 2}
    •b͕͍࣋ͬͯΔσʔλ: {a: 0, b: 2, c: 0}
    • x => {a: 1, b:2, c:2} => 5
    • update
    • a͕ {increment, 3} Λड͚ͱΔͱ{a: 4, b: 1, c: 2}
    • C < x ͱ͍͏৚݅ԋࢉΛॲཧͰ͖Δ

    View Slide

  40. CRDTྫ: PN-Counter
    • merge
    • {a: {1,-1}, b: {1,0}, c: {2,0}}
    • {a: {0,0}, b: {2, 0}, c: {0, -2}}
    • => {a: {1,-1}, b:{2,0}, c:{2,-2}} => 2
    • update
    • a͕ {increment, 3} Λड͚෇͚Δͱ
    • {a: {4,-1}, b: {1,0}, c: {2,0}}
    • c < x ͱ͍͏৚݅ԋࢉΛॲཧͰ͖ͳ͍

    View Slide

  41. CRDTྫ: OR-Sets
    • merge
    • a:{“foo”:false, “bar”:true, “baz”:true}
    • + b:{“bar”:true, “baz”:false}}
    • => {“foo”:false, “bar”:true, “baz”:true}
    • => [“foo”]
    • update
    • add: a:{} => +”foo” => a:{“foo”:false}
    • remove: a: {“foo”:false} => a: {“foo”:true}

    View Slide

  42. CRDT
    • ωοτϫʔΫ෼அ࣌Ͱ΋ߋ৽ɺಡΈग़͠Մೳ
    • Writeͷ ”ฒߦॲཧ” ͕ՄೳʹͳΔσʔλ
    • ஋Λܭࢉ͢Δํ๏ʹҰఆͷ੍໿͕͋Δ
    • ޮ཰తͳCRDTͷ࣮૷͸·ͩݚڀஈ֊

    View Slide

  43. ༧૝: 2010೥୅ޙ൒
    • ࣮૷໘Ͱ͸޻෉ͷ༨஍͕͋ΓɺACIDΛຬͨͦ͏ͱ͢Δ෼ࢄDB͸·ͩ
    ·ͩొ৔͢Δ
    •෼ࢄΛߟྀͨ͠ฒߦੑ੍ޚ
    •σʔληϯλʔΛލ͙CPܕϨϓϦέʔγϣϯɺτϥϯβΫγϣϯ
    •ӡ༻ϊ΢ϋ΢ͷීٴ
    • NoSQLσʔλϕʔεͷ࠾༻͸͠͹Β͘ଓͩ͘Ζ͏ʢ͍͔ͭ͘͸౫ଡ͞
    ΕΔͩΖ͏ʣ
    • ڧ͍੔߹ੑͱָ؍తϨϓϦέʔγϣϯͷϋΠϒϦουܕσʔλϕʔε
    ͕ొ৔͢ΔͩΖ͏

    View Slide

  44. •OLTP޲͚ͷσʔλϕʔε͕ߋ৽ॲཧͷՄ༻ੑͱੑೳΛ໨తʹָ؍Ϩ
    ϓϦέʔγϣϯΛಋೖ࢝͠ΊΔͩΖ͏
    •ۀ຿ॲཧͷ͏ͪඞͣ͠΋શ͕ͯڧ͍੔߹ੑΛඞཁͱ͍ͯ͠ΔΘ͚Ͱ
    ͸ͳ͍
    •ΞϓϦέʔγϣϯଆͰڧ͍੔߹ੑͱָ؍తϨϓϦέʔγϣϯΛ࢖͍
    ෼͚Δ͜ͱͰύϑΥʔϚϯεΛग़͢͜ͱ͕ظ଴Ͱ͖Δ
    •ΠϯλʔϑΣʔεͱͯ͠͸SQL, JDBC͕࢒ΔͷͰ͸ͳ͍͔
    ϋΠϒϦουܕσʔλϕʔε

    View Slide

  45. •Θ͔Γ·ͤΜ
    •ϋΠϒϦουܕσʔλϕʔεͷ҆ఆ࣮ͨ͠૷͕ొ৔ɺීٴ
    •SSD΍NVM͕ීٴ͠IO͸ϘτϧωοΫͰ͸ͳ͘ͳΔ
    •Shared NothingܕͷεέʔϧΞ΢τܕDBͰ͸ͳ͘ͳΔͷͰ͸
    •ϝϞϦόϯυ෯·ͨ͸CPU͕ϘτϧωοΫʹͳΔ
    •৽͍͠ϋʔυ΢ΣΞ͕ొ৔͢Ε͹ɺ·ͨͲ͏ͳΔ͔෼͔Βͳ͍
    •2000೥୅ͷٕज़X͕࠶ొ৔
    ༧૝: 2020೥୅

    View Slide

  46. •2000೥͜Ζ͔Βɺσʔλϕʔεͷ2େٕज़ཁૉʹɺঃʑʹ
    ෼ࢄγεςϜͷٕज़͕ཁૉٕज़ͱͯ͠ඞਢʹͳ͍ͬͯͬͨ
    •2015೥·Ͱʹొ৔ͨ͠σʔλϕʔεͷϨϓϦέʔγϣϯ
    ٕज़ʹ͍ͭͯ؆୯ʹղઆ
    •2015೥ޙ൒ʹ͸ɺCPͱAPͷϨϓϦέʔγϣϯΛಉ͡Πϯ
    λʔϑΣʔεͰ࢖͍෼͚ɺACIDΛຬͨ͢෼ࢄσʔλϕʔε
    ͕ొ৔͢ΔͩΖ͏
    •2020೥ͷେ·͔ͳໝ^H༧૝Λ
    ·ͱΊ
    ※Disclaimer: ͜ͷࢿྉͷ಺༰͸্੢ͷݸਓతͳ༧૝Ͱ͋ΓɺԿΒ͔ͷະདྷΛอূ͢Δ΋ͷͰ͸͋Γ·ͤΜ

    View Slide

  47. •Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and
    the feasibility of consistent, available, partition-tolerant web
    services.
    •James C. Corbett et al., 2012. Spanner: Google’s Globally
    Distributed Database.
    •Yasushi Saito and Marc Shapiro. 2005. Optimistic Replication.
    •Peter Bailis and Kyle Kingsbury. 2014. The Network is
    Reliable.
    •Peter Bailis et al. 2014. Coordination Avoidance in Database
    Systems.
    •Mihai Letia et al. 2010. Consistency without Concurrency
    Control in Large, Dynamic Systems.
    ࢀߟจݙ

    View Slide