Talk on Database (ja)

Talk on Database (ja)

筑波大学の2013年度の情報システム特別講義Dのスライド

E1923013dacab39eb231a2fffbf7b33c?s=128

UENISHI Kota

January 31, 2014
Tweet

Transcript

  1. 2.

    ͲͪΒ༷Ͱ͔͢ʁ •  @kuenishi •  NTTݚڀॴ ˠ Basho •  Erlang/OTPΛॻ͍ͯ࢓ࣄΛ͍ͯ͠·͢ • 

    ෼ࢄσʔλϕʔεɾ෼ࢄγεςϜͷ঎༻։ൃྺ 6೥ •  ϛυϧ΢ΣΞతͳ΋ͷ͕ൺֱతಘҙͰ͢
  2. 8.

    ྺ্࢙ͷσʔλϕʔε •  IDS (GE, 1963) •  ֊૚ܕDB •  IMS (IBM,

    1966 -), •  ωοτϫʔΫܕDB •  CODASYL (1969, COBOL), •  ࠓ΋ಈ͍͍ͯΔ੡඼΋͋Δ 8
  3. 9.

    1977೥ System R (IBM) •  ؔ܎୅਺ͷཧ࿦ʹج͍ͮͨσʔλૢ࡞ݴޠ SQLΛ࣮૷ͨ͠ੈքॳͷσʔλϕʔε •  Ұ؏ͨ͠σʔλʹର͢Δෳ਺ͷૢ࡞Λશͯ ੒ޭɾશࣦͯഊʹ·ͱΊɺෳ਺ͷߋ৽ཁٻ

    Λฒߦ੍ޚ͠ɺσʔλΛӬଓԽ͢Δτϥϯ βΫγϣϯॲཧΛ࣮૷ͨ͠ੈքॳͷσʔλ ϕʔε •  ۀ຿ॲཧͷϞσϧʹඇৗʹΑ͘Ϛονͨ͠ ͨΊɺരൃతʹීٴɹˠISOͰඪ४Խ •  1983೥ Oracle v3 •  1987೥ Sybase SQL Server (ݱMicrosoft SQL Server) 9
  4. 14.

    Web Scale΁ͷΞϓϩʔν •  Ոఉ༻ίϯϐϡʔλͱಉ͡HW •  LAMP + Memcached(2003) •  Redis

    (2009) •  Google: GFS(2003), BigTable(2006) •  Hadoop (2006), HBase (2006) •  Amazon: Dynamo(2006) •  Cassandra (2008), Riak (2009) 14
  5. 18.

    18

  6. 21.

    21 Locality vs Load Balancing - Use Cases • ઌಡΈΛޮ͔ͤͯόϧΫͰγʔέϯγϟϧΞ

    Ϋηε • - HBase, BigTable, etc • ϋογϡͰ෼ࢄͤͯ͞ෛՙ෼ࢄͯ͠ϥϯμϜ ΞΫηε • - Riak, Cassie, etc
  7. 24.

    ੔߹ੑʹ͸2ͭͷจ຺͕͋Δ •  ڞ௨͢Δͷ͸ʮ୭͕ݟͯ΋ಉ͡Α͏ʹݟ͑Δ͜ͱʯ •  ෳ਺छྨͷσʔλؒͷInvariant͕कΒΕ͍ͯΔ͜ͱ •  “ACID” తͳAnomaly͕ൃੜ͠ͳ͍͜ͱ •  Phantom

    Read, Write Skew, etc etc… •  ෳ਺ͷσʔλͷίϐʔ͕ಉ͡Ͱ͋Δ͜ͱ •  ෳ਺ͷίϐʔΛ҆શʹߋ৽Ͱ͖Δ͜ͱ •  ผʑͷਓͰ΋ಉ͡σʔλΛݟΕ͍ͯΔ͜ͱ
  8. 31.

    ނোతࠔΔ͜ͱͷ෼ྨ Crash Failure ɹ (fail-stop, fail-safe) ނোͨ͠Βࢭ·Δɺࢭ·ͬͨ͜ͱ͕͢ ͙ʹ෼͔Δ Crash Failure

    (fail-silent) ໧ͬͯࢮ͵ Crash Recovery ނোͨ͠ϑϦͯ͠஌ΒΜ΀Γͯ͠ؼͬͯ ͘Δ Omission Failure (receive / send) ϝοηʔδܽམ Timing Failure ·ͱ΋ͳ଎౓Ͱಈ͔ͳ͘ͳΔ Response Failure Ϩεϙϯε͕͓͔͍͠ʢσʔλ͕յΕͯ ͍Δetcʣ Arbitrary Failure Ϗβϯνϯނোɺѱҙͷ͋ΔϠπ͕͍Δ 31
  9. 34.

    34 What is PAXOS? – Example   Two phase election

    – phase 1  Larger n is prior 21:32 8ß: ¼ ÛËÙêîĝ 21:36 O2Ě 21:35 [_ÕðJK 21:36 O2ÙÐÄorz 21:38 O2ÙÐÞorz n=3 n=1 n=2 8
  10. 35.

    35 What is PAXOS? – Example   Two phase election

    – phase 2  confirmation 21:42 Ãğ 21:41 O2ÙÀÀ ëÞĝĚ 21:43 a~Å… 21:44 Ãğ 21:43 GJ proposer 21:43 ÀêO2ÚyÁËÚ Ù|N`çÜßÙ n=4 9
  11. 37.
  12. 43.

    ͱΓ͋͑ͣॻ͍͓͍ͯͯ ॻ͖ࠐΈ͕িಥͨ͠Βʁ •  ྆ํ͓͍࣋ͬͯͯɺিಥͨ͠σʔ λͷʮҼՌؔ܎ʯΛ໌Β͔ʹ͢Δ •  “Vector Clock” •  {a:

    1, b:2, c:1} ͱ {a: 2, b:2, c:1} •  {a: 1, b:2, c:1} ͱ {a: 1, c:1, d:1} •  িಥͨ͠΋ͷ͸ΞϓϦͰղܾ 43 w1 w2 r
  13. 44.

    44 CRDT • CRDT (Conflict-Free Replicated Data Types • “AP”

    Λαϙʔτ • Counter, Register, Sets, Maps • →ผεϥΠυʁ
  14. 45.

    45 CRDTͰͷRead • ॱ൪͕ೖΕସΘͬͯ΋݁Ռ͕มΘΒͳ͍ܕ • update(w1, update(w2, Data0) = update(w2,

    update(w1, Data0) = Data w1 w1 w1 w2 w2 w2 Actor 0 Actor 1 Actor 2 w1(w2(Data0)) => Data w1(w2(Data0)) => Data w2(w1(Data0)) => Data
  15. 46.

    ACID vs BASE ACID BASE ੔߹͍ͯ͠ͳͯ͘΋ ৗʹσʔλʹΞΫη εͰ͖Δ Basically Avaiable

    Atomicity ෳ਺ͷૢ࡞ͷ੒ޭɾ ࣦഊΛ·ͱΊΔ Consistency ৑௕Խ͞Εͨσʔλ ΍ෳ਺ͷϦϨʔγϣ ϯ͕ৗʹ੔߹͍ͯ͠ Δ ࠷ऴతʹ੔߹ͨ͠ঢ় ଶʹͳΔ͜ͱ͕อূ ͞Ε͍ͯΕ͹Α͍ Eventually Consistent Isolation ฒߦ؅ཧͯ͠ߋ৽్ தͷঢ়ଶΛݟͤͳ͍ Durability σʔλΛӬଓԽͯ͠ ࣦΘͳ͍ ϨϓϦΧ͸ܾఆ࿦త Ͱ͸ͳ͘ɺ֬཰తͰ ͋ͬͨΓɺάϩʔό ϧʹҰ؏͍ͯ͠ͳ͘ ͯ΋Α͍ Soft-state 46
  16. 48.

    ΫΠζɿͲ͜·Ͱ͕DB? Ͳ͔͜Β͕NoSQL? ACIDͳτϥϯβΫγϣϯ τϥϯβΫγϣϯͳ͠ SQLΠϯλʔϑ Σʔε Oracle MySQL PostgreSQL SQL

    Server, DB2 Cassandra (CQL) Hive, Presto Impala ಠࣗAPI InnoDB BerkeleyDB FoundationDB Riak MongoDB Redis 48
  17. 50.

    CAPఆཧ͔ΒΈͨ੡඼ͷ෼ྨ •  CAॏࢹ •  RDBMS, MongoDB, HBase, etc… •  ੔߹ੑҡ࣋ͷͨΊͷ෼ࢄ߹ҙʢਖ਼͘͠ϑΣΠϧΦʔ

    όʔ͢ΔͨΊʣͷ࢓૊Έ͕ඞཁ •  APॏࢹ •  Cassandra, Riak, CouchDB •  ෼ࢄ߹ҙͷ࢓૊Έ͕ෆཁʹ࣮૷΋ӡ༻΋؆୯ 50
  18. 52.

    ϦϨʔγϣφϧϞσϧΛఘΊΔ 52 •  ςʔϒϧߏ଄ => KVS (Mapߏ଄) •  Pkey͸ΧϥϜͷͻͱ͔ͭΒબͿ΋ͷ=> ඞਢͷ΋ͷ

    •  JOINΛͤ͞ͳ͍ •  ͜ΕʹΑΓεέʔϧΞ΢τܕͷ෼ࢄ͕Մೳʹ •  εΩʔϚΛࣄલఆٛ͠ͳ͍
  19. 53.

    εΩʔϚ: σʔλϕʔεͷσʔλߏ଄ΛܾΊΔ΋ͷ •  εΩʔϚΛࣄલʹఆٛ: ϦϨʔγϣφϧϞσϧʹԊͬͨ࢖͍ํ •  Pros: σʔλߏ଄ʹڧྗͳ੍໿Λ՝ͨ͢Ί࠷దԽ͠΍͍͢ˍΞϓϦ Λ։ൃ͠΍͍͢ • 

    Cons: ΞϓϦͷมߋίετ͕ߴ͍ˍ࠷ॳʹద੾ʹઃܭ͠ͳ͍ͱ͔ͳ Γมߋʹऑ͍ •  εΩʔϚΛࣄલʹܾΊΔඞཁ͕ͳ͍৔߹ •  Pro: σʔλߏ଄Λޙ͔ΒͰ΋͔ͳΓॊೈʹมߋͰ͖Δ •  Con: ςετ͕ͳ͍ͱόά͕ग़΍͍͢ɺσʔλߏ଄͕ෳࡶͩͱ࠷ద Խ͠ʹ͍͘ 53
  20. 54.

    ৽͍͠σʔλϞσϧ ΧϥϜࢦ޲ •  ΧϥϜΛ͍͘ΒͰ΋૿΍ ͢͜ͱ͕Ͱ͖Δ •  Query LanguageΛ࡞Δ •  ʮΧϥϜϑΝϛϦʔʯ

    •  HBase, Cassandra υΩϡϝϯτࢦ޲ •  ݸʑͷϨίʔυ͕ࣗ༝ͳ ܗࣜΛ࣋ͭ •  MapReduceͰΫΤϦ •  XML, JSON, etc … •  CouchDB, MongoDB 54
  21. 58.

    58

  22. 59.

    ͍··Ͱઆ໌͖ͯͨ͠ͷͰ؆୯ ໰୊ Riak͸Ͳ͏͍ͯ͠Δ͔ εέʔϦϯά ϋογϡతϧʔςΟϯάͰGossipͳਫฏ෼ࢄ ੔߹ੑͷ؅ཧ Vector Clocks, CRDTͳͲRead࣌ʹղܾ 2.0Ͱ͸Paxos΋ೖΔ

    Մ༻ੑͷอূ Hinted HandoffΛ࢖ͬͯৗʹॻ͖ࠐΈ͕Ͱ͖ΔΑ͏ ʹ͍ͯ͠Δ σʔλϞσϧ Blob + Document Based ΠϯσοΫεΛ͸ΕΔ MapReduce͕Ͱ͖Δ ॲཧܥ Erlang VM 59
  23. 60.

    ࣮૷໘ͰͷRiakͷಛ௃ •  ʮ໷தʹΞϥʔτͰى͜͞Εͳ͍ʯ͜ͱΛ໨ࢦͨ͠ •  Erlang VMΛ࢖༻ •  GCʹΑΔఀࢭ͕࣌ؒͳ͍ •  ແఀࢭͰͷύονద༻ɺղੳɺૢ࡞͕Մೳ

    •  ίϚϯυମܥɾϑΝΠϧߏ଄ɾσʔλߏ଄ΛͳΔ΂͘γϯϓϧʹઃ ܭ͠ӡ༻ෛ୲Λ௿ݮ •  ҆ఆੑΛॏࢹͨ͠ઃܭ •  ίϯύΫγϣϯɺ༧ଌՄೳੑ 60
  24. 62.

    ࣮૷໘΋ߟྀͨ͠ ੡඼બ୒ͷϙΠϯτ •  JOINͳͲσʔλͷਖ਼نԽ͕ඞཁͳͱ͖: RDBMS •  ΦϯϝϞϦͰ࠷৽ͷ࣌ܥྻσʔλΛݟ͍ͨͱ͖: MongoDB •  ΧϥϜΛ͍͘ΒͰ΋૿΍͍ͨ͠ͱ͖

    •  SQL෩ͷΫΤϦΛ͍ͨ͠ͱ͖ or Մ༻ੑ: Cassandra •  େྔσʔλʹޮ཰తʹόονॲཧΛ͍ͨ͠ͱ͖: Hbase •  σʔλΛͳ͘͞ͳ͍&μ΢ϯλΠϜΛͳ͍ͨ͘͠ͱ͖: Riak 62