Slide 1

Slide 1 text

1 GCPͰ͸͡ΊΔ εϞʔϧελʔτͳσʔλ׆༻ #bq_sushi ver. bq_sushi #4 2016-09-06 Takashi Nishibayashi

Slide 2

Slide 2 text

2 Takashi Nishibayashi Software Engineer Zucks AdNetwork, Zucks Inc. Data analysis team ݱࡏ͸഑৴ޮ཰ͷ࠷దԽ ೖࡳՁ֨ࣗಈௐ੔ϩδοΫɺ഑৴αʔ όʔͷ޿ࠂબ୒ϩδοΫͷ։ൃʹैࣄ @hagino3000

Slide 3

Slide 3 text

3 3 ͜Ε͸Կ͔ ಉ೔ͷGCP NEXT TOKYOͷࣄྫ঺հηογϣ ϯͰൃදͨ͠಺༰ͷॖখ൛Ͱ͢

Slide 4

Slide 4 text

4 4 Zucks AdNetwork ͷσʔλ׆༻ͷมભ

Slide 5

Slide 5 text

5 5 5 ϓϩδΣΫτ։࢝࣌ͷཧ૝ͱݱ࣮

Slide 6

Slide 6 text

6 6 6 ໨ࢦ͢ॴ(Ծ) ޿ࠂ഑৴αʔόʔͰΠϯϓϨογϣϯຖʹػցֶशϞσϧʹΑΔίϯ όʔδϣϯ༧ଌɺΫϦοΫ཰༧ଌΛߦͳ͍഑৴ޮ཰ΛΞοϓ ݱ࣮ େྔͷϩάϑΝΠϧ͕༷ʑͳϑΥʔϚοτͰAWS S3ʹஔ͔Ε͍ͯΔ Ϛελσʔλ͸MySQLʹ֨ೲ͞Ε͍ͯΔ Elastic SearchʹೖͬͯΔͷ͸௚ۙ2िؒ

Slide 7

Slide 7 text

7 7 7

Slide 8

Slide 8 text

8 8 8 ͍͖ͳΓ౸ୡ͸Ͱ͖ͳ͍

Slide 9

Slide 9 text

9 1ظ: ·ͣ͸σʔλαΠΤϯςΟετ͕ར༻Ͱ͖ΔΑ͏ʹ ü  ωοτ޿ࠂۀքͰػցֶश͕ྲྀߦ͍ͬͯΔͱ͸͍͑ɺࣗαʔϏεͷ σʔλͰ΋ͦΕ͕Մೳͳͷ͔ݕূ͍ͨ͠ ü  ࣮ݧ΍ԾઆݕূͷͨΊʹਓ͕ؒσʔλΛखܰʹར༻͍ͨ͠ ü  ݶΒΕͨਓ͕ؒΫΤϦ΍ूܭΛ࣮ߦͰ͖Ε͹ྑ͍ ü  ਺ඦϛϦඵͷԠ౴ੑೳ͸ٻΊͳ͍ ü  σʔλετΞͷ؅ཧʹख͕͔͔ؒΒͳ͍ࣄ͕ॏཁ ü  σʔλྔ͸ 600GByte/day ఔ౓͕ͩɺ·ͩ·ͩ૿͑ͦ͏

Slide 10

Slide 10 text

10 1ظ: ·ͣ͸σʔλαΠΤϯςΟετ͕ར༻Ͱ͖ΔΑ͏ʹ ²  ޿ࠂͷ഑৴ϩάΛBigQueryʹྲྀ͠ࠐΜͩ ²  MySQLͷϚελσʔλ΋BigQueryʹಉظ ²  WebUI΍PandasɺBigQuery Pythonܦ༝Ͱར༻ ²  BigQueryͰαϒαϯϓϦϯάͯ͠ϩʔΧϧϚγϯͰֶश ²  AWS EMRୀ໾ ²  Elastic Searchୀ໾ ²  Cloud Datalab betaʹඈͼ͍ͭͯരࢮ (2016೥1݄)

Slide 11

Slide 11 text

11 2ظ: όονॲཧ͔Βར༻Ͱ͖ΔΑ͏ʹ ü  ܧଓతʹճ͍࣮ͨ͠ݧ΍ɺ༧ଌॲཧͷόονΛcronͰ૸Β͍ͤͨ ü  ෼ੳλεΫʹݶΒͣɺ഑৴γεςϜଆͷόονॲཧ΋࢖͍͍ͨ ü  ػೳຖͷ࢖༻ঢ়گ(ΫΤϦίετ౳)͸೺Ѳ͍ͨ͠

Slide 12

Slide 12 text

12 2ظ: όονॲཧ͔Βར༻Ͱ͖ΔΑ͏ʹ ²  CloudLoggingͷઃఆͰBigQueryͷ؂ࠪϩάΛBigQueryʹΤΫεϙʔτ ²  ػೳຖʹαʔϏεΞΧ΢ϯτΛ෷͍ग़ͯ͠ɺ࢖༻ঢ়گΛ೺Ѳ ²  ίετ͕௓ͶͨΒ௨஌ ²  ೖࡳ୯Ձࣗಈௐ੔όονɺෆਖ਼ΫϦοΫ൑ఆόον͕Քಈ ²  ϧʔϧϕʔεɺҟৗݕ஌ϕʔεͷࣝผλεΫ͸SQLͰॻ͚Δ ²  ࣮ݧ݁Ռ͸Cloud Storage/BigQueryʹอଘ

Slide 13

Slide 13 text

13

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

15 Audit Logͷ༻్ ²  ػೳຖͷΫΤϦίετ ²  ೔ຖͷΫΤϦίετ ²  ςετ༻ͷςʔϒϧ࡞੒ऀௐࠪ ²  ࢖ΘΕ͍ͯͳ͍ςʔϒϧௐࠪ

Slide 16

Slide 16 text

16 3ظ: ͢΂ͯͷ৬छͷϝϯόʔ͕σʔλΛར༻Ͱ͖ΔΑ͏ʹ ü  ఆܕͷௐࠪλεΫ͸ΤϯδχΞ๊͕͑ͨ͘ͳ͍ ü  ίετ͕രൃ͠ͳ͍Α͏ʹར༻ऀΛ૿΍͍ͨ͠ ü  SQLॻ͚Δਓ͕૿͑Δͱྑ͍ײ͡ʹͳΔͷͰ͸

Slide 17

Slide 17 text

17 3ظ: ͢΂ͯͷ৬छͷϝϯόʔ͕σʔλΛར༻Ͱ͖ΔΑ͏ʹ ²  re:dashͰΫΤϦͰ͖ΔΑ͏ʹͨ͠ ²  ΤϯδχΞ͕ཁ๬ΛݩʹςϯϓϨʔτͷΫΤϦΛ࡞੒ ²  Ϩϙʔτը໘ͷϓϩτλΠϓʹ΋ ²  ΫΤϦ୯ҐͷίετϦϛοτઃఆ(re:dashͷػೳ)ͰߴֹΫΤϦ࣮ߦ Λ཈ࢭ

Slide 18

Slide 18 text

18 ཁٻ͞ΕΔσʔλ඼࣭Ϩϕϧ΋มΘΔ ü  Ϣʔεέʔε͕૿͑Δͱσʔλ඼࣭͕՝୊ʹ ü  23࣌୆ͷϩάऔΓࠐΈ͕ऴͬͨ௚ޙʹॲཧΛ૸Β͍ͤͨΜ͚ͩͲ? ²  Stream Insert, Batch Insert, ΫΤϦશͯϦτϥΠػߏ͸ඞਢ ²  ݄ʹ1౓͸BigQueryͷௐࢠͷѱ͍೔͕͋Δ ²  σʔλͷऔΓࠐΈ࿙ΕɺॏෳऔΓࠐΈνΣοΫͷόονΛՔಇ ²  σʔλͷऔΓࠐΈঢ়گ͕֎෦͔Β֬ೝͰ͖Δ࢓૊Έ

Slide 19

Slide 19 text

19 ෭࣍త੒Ռ෺ •  ΤϯδχΞ͕͍ͭͰ΋഑৴ϩάͷௐ͕ࠪՄೳʹ •  MySQLͰѻ͑ͳ͔ͬͨαΠζͷσʔλΛݩʹͨ͠ҙࢥܾఆ͕Մೳʹ •  ༷ʑͳόονॲཧ͕σʔλΛར༻Մೳʹ •  SQLΛॻ͚ͩ͘ͰϨϙʔτ͕ࣗ༝ʹ࡞੒Մೳʹ •  ϓϩδΣΫτͷϝϯόʔશһ͕σʔλʹΞΫηεՄೳʹ

Slide 20

Slide 20 text

20 ͦͷଞ •  ΦϯϥΠϯͰ౎౓σʔλΛࢀর͢ΔΑ͏ͳॲཧʹBigQuery͸޲͔ͳ͍ •  Key-ValueͰҾ͚ΔΑ͏ʹͯ͠BigtableΛ࢖ͬͨํ͕͍͍ •  BigQueryͷલʹΩϟογϡϨΠϠΛ༻ҙ͢Δࣄྫ΋ •  Cloud Dataproc or Cloud Dataflow…… •  Spotify͸Spark͸ෳࡶ͗ͯ͢࢖͑ͳ͍ͱͷࣄͰDataflowΛscala͔Βར༻ •  https://github.com/spotify/scio •  Cloud Datalab͕৽͘͠ͳͬͨͦ͏ͳͷͰظ଴ •  Jupyter NotebookͷΫϥ΢υ൛

Slide 21

Slide 21 text

21 ·ͱΊ •  ͍͖ͳΓ೉͍͠ॴΛૂ͏ͱ੒Ռ͕ग़Δ·Ͱ͕͔͔࣌ؒΔͨΊɺ஍ͳΒ͠Λ͠ ͳ͕Βσʔλ׆༻ΛਐΊ͍ͯΔ •  SQLͰهड़Ͱ͖Δϧʔϧϕʔε΍ҟৗݕ஌ϕʔεͷॲཧ͸ػցֶशͱൺֱ͢ Δͱૣ͘੒Ռ͕ग़ͤΔ •  Cloud Storage, Cloud Logging, Cloud Dataprocͱͷ࿈ܞ͕ڧԽ͞Εɺ BigQueryͷϢʔεέʔε͕૿͑ͨ •  ਺ඦmsecͷԠ౴ੑೳɺಉ࣌ΫΤϦ࣮ߦ਺ɺ҆ఆੑΛٻΊͳ͚Ε͹BigQuery͸ Ϧʔζφϒϧʹ࢖͑Δ

Slide 22

Slide 22 text

22 ิ଍ BigQueryͰ౷ܭྔΛग़࣌͢ʹ࢖͏ΫΤϦϝϞ http://qiita.com/hagino3000/items/e9ed62638ebe54391188

Slide 23

Slide 23 text

23 23 Thank You