Upgrade to Pro — share decks privately, control downloads, hide ads and more …

GCPではじめるスモールスタートなデータ活用

 GCPではじめるスモールスタートなデータ活用

2016-09-06
bq_sushi #4での発表資料です

Takashi Nishibayashi

September 06, 2016
Tweet

More Decks by Takashi Nishibayashi

Other Decks in Technology

Transcript

  1. 1
    GCPͰ͸͡ΊΔ
    εϞʔϧελʔτͳσʔλ׆༻
    #bq_sushi ver.
    bq_sushi #4
    2016-09-06
    Takashi Nishibayashi

    View Slide

  2. 2
    Takashi Nishibayashi

    Software Engineer
    Zucks AdNetwork, Zucks Inc.
    Data analysis team

    ݱࡏ͸഑৴ޮ཰ͷ࠷దԽ
    ೖࡳՁ֨ࣗಈௐ੔ϩδοΫɺ഑৴αʔ
    όʔͷ޿ࠂબ୒ϩδοΫͷ։ൃʹैࣄ
    @hagino3000

    View Slide

  3. 3
    3
    ͜Ε͸Կ͔
    ಉ೔ͷGCP NEXT TOKYOͷࣄྫ঺հηογϣ
    ϯͰൃදͨ͠಺༰ͷॖখ൛Ͱ͢

    View Slide

  4. 4
    4
    Zucks AdNetwork ͷσʔλ׆༻ͷมભ

    View Slide

  5. 5
    5
    5
    ϓϩδΣΫτ։࢝࣌ͷཧ૝ͱݱ࣮

    View Slide

  6. 6
    6
    6
    ໨ࢦ͢ॴ(Ծ)
    ޿ࠂ഑৴αʔόʔͰΠϯϓϨογϣϯຖʹػցֶशϞσϧʹΑΔίϯ
    όʔδϣϯ༧ଌɺΫϦοΫ཰༧ଌΛߦͳ͍഑৴ޮ཰ΛΞοϓ
    ݱ࣮
    େྔͷϩάϑΝΠϧ͕༷ʑͳϑΥʔϚοτͰAWS S3ʹஔ͔Ε͍ͯΔ
    Ϛελσʔλ͸MySQLʹ֨ೲ͞Ε͍ͯΔ
    Elastic SearchʹೖͬͯΔͷ͸௚ۙ2िؒ

    View Slide

  7. 7
    7
    7

    View Slide

  8. 8
    8
    8
    ͍͖ͳΓ౸ୡ͸Ͱ͖ͳ͍

    View Slide

  9. 9
    1ظ: ·ͣ͸σʔλαΠΤϯςΟετ͕ར༻Ͱ͖ΔΑ͏ʹ
    ü  ωοτ޿ࠂۀքͰػցֶश͕ྲྀߦ͍ͬͯΔͱ͸͍͑ɺࣗαʔϏεͷ
    σʔλͰ΋ͦΕ͕Մೳͳͷ͔ݕূ͍ͨ͠
    ü  ࣮ݧ΍ԾઆݕূͷͨΊʹਓ͕ؒσʔλΛखܰʹར༻͍ͨ͠
    ü  ݶΒΕͨਓ͕ؒΫΤϦ΍ूܭΛ࣮ߦͰ͖Ε͹ྑ͍
    ü  ਺ඦϛϦඵͷԠ౴ੑೳ͸ٻΊͳ͍
    ü  σʔλετΞͷ؅ཧʹख͕͔͔ؒΒͳ͍ࣄ͕ॏཁ
    ü  σʔλྔ͸ 600GByte/day ఔ౓͕ͩɺ·ͩ·ͩ૿͑ͦ͏

    View Slide

  10. 10
    1ظ: ·ͣ͸σʔλαΠΤϯςΟετ͕ར༻Ͱ͖ΔΑ͏ʹ
    ²  ޿ࠂͷ഑৴ϩάΛBigQueryʹྲྀ͠ࠐΜͩ
    ²  MySQLͷϚελσʔλ΋BigQueryʹಉظ
    ²  WebUI΍PandasɺBigQuery Pythonܦ༝Ͱར༻
    ²  BigQueryͰαϒαϯϓϦϯάͯ͠ϩʔΧϧϚγϯͰֶश
    ²  AWS EMRୀ໾
    ²  Elastic Searchୀ໾
    ²  Cloud Datalab betaʹඈͼ͍ͭͯരࢮ (2016೥1݄)

    View Slide

  11. 11
    2ظ: όονॲཧ͔Βར༻Ͱ͖ΔΑ͏ʹ
    ü  ܧଓతʹճ͍࣮ͨ͠ݧ΍ɺ༧ଌॲཧͷόονΛcronͰ૸Β͍ͤͨ
    ü  ෼ੳλεΫʹݶΒͣɺ഑৴γεςϜଆͷόονॲཧ΋࢖͍͍ͨ
    ü  ػೳຖͷ࢖༻ঢ়گ(ΫΤϦίετ౳)͸೺Ѳ͍ͨ͠

    View Slide

  12. 12
    2ظ: όονॲཧ͔Βར༻Ͱ͖ΔΑ͏ʹ
    ²  CloudLoggingͷઃఆͰBigQueryͷ؂ࠪϩάΛBigQueryʹΤΫεϙʔτ
    ²  ػೳຖʹαʔϏεΞΧ΢ϯτΛ෷͍ग़ͯ͠ɺ࢖༻ঢ়گΛ೺Ѳ
    ²  ίετ͕௓ͶͨΒ௨஌
    ²  ೖࡳ୯Ձࣗಈௐ੔όονɺෆਖ਼ΫϦοΫ൑ఆόον͕Քಈ
    ²  ϧʔϧϕʔεɺҟৗݕ஌ϕʔεͷࣝผλεΫ͸SQLͰॻ͚Δ
    ²  ࣮ݧ݁Ռ͸Cloud Storage/BigQueryʹอଘ

    View Slide

  13. 13

    View Slide

  14. 14

    View Slide

  15. 15
    Audit Logͷ༻్
    ²  ػೳຖͷΫΤϦίετ
    ²  ೔ຖͷΫΤϦίετ
    ²  ςετ༻ͷςʔϒϧ࡞੒ऀௐࠪ
    ²  ࢖ΘΕ͍ͯͳ͍ςʔϒϧௐࠪ

    View Slide

  16. 16
    3ظ: ͢΂ͯͷ৬छͷϝϯόʔ͕σʔλΛར༻Ͱ͖ΔΑ͏ʹ
    ü  ఆܕͷௐࠪλεΫ͸ΤϯδχΞ๊͕͑ͨ͘ͳ͍
    ü  ίετ͕രൃ͠ͳ͍Α͏ʹར༻ऀΛ૿΍͍ͨ͠
    ü  SQLॻ͚Δਓ͕૿͑Δͱྑ͍ײ͡ʹͳΔͷͰ͸

    View Slide

  17. 17
    3ظ: ͢΂ͯͷ৬छͷϝϯόʔ͕σʔλΛར༻Ͱ͖ΔΑ͏ʹ
    ²  re:dashͰΫΤϦͰ͖ΔΑ͏ʹͨ͠
    ²  ΤϯδχΞ͕ཁ๬ΛݩʹςϯϓϨʔτͷΫΤϦΛ࡞੒
    ²  Ϩϙʔτը໘ͷϓϩτλΠϓʹ΋
    ²  ΫΤϦ୯ҐͷίετϦϛοτઃఆ(re:dashͷػೳ)ͰߴֹΫΤϦ࣮ߦ
    Λ཈ࢭ

    View Slide

  18. 18
    ཁٻ͞ΕΔσʔλ඼࣭Ϩϕϧ΋มΘΔ
    ü  Ϣʔεέʔε͕૿͑Δͱσʔλ඼࣭͕՝୊ʹ
    ü  23࣌୆ͷϩάऔΓࠐΈ͕ऴͬͨ௚ޙʹॲཧΛ૸Β͍ͤͨΜ͚ͩͲ?
    ²  Stream Insert, Batch Insert, ΫΤϦશͯϦτϥΠػߏ͸ඞਢ
    ²  ݄ʹ1౓͸BigQueryͷௐࢠͷѱ͍೔͕͋Δ
    ²  σʔλͷऔΓࠐΈ࿙ΕɺॏෳऔΓࠐΈνΣοΫͷόονΛՔಇ
    ²  σʔλͷऔΓࠐΈঢ়گ͕֎෦͔Β֬ೝͰ͖Δ࢓૊Έ

    View Slide

  19. 19
    ෭࣍త੒Ռ෺
    •  ΤϯδχΞ͕͍ͭͰ΋഑৴ϩάͷௐ͕ࠪՄೳʹ
    •  MySQLͰѻ͑ͳ͔ͬͨαΠζͷσʔλΛݩʹͨ͠ҙࢥܾఆ͕Մೳʹ
    •  ༷ʑͳόονॲཧ͕σʔλΛར༻Մೳʹ
    •  SQLΛॻ͚ͩ͘ͰϨϙʔτ͕ࣗ༝ʹ࡞੒Մೳʹ
    •  ϓϩδΣΫτͷϝϯόʔશһ͕σʔλʹΞΫηεՄೳʹ

    View Slide

  20. 20
    ͦͷଞ
    •  ΦϯϥΠϯͰ౎౓σʔλΛࢀর͢ΔΑ͏ͳॲཧʹBigQuery͸޲͔ͳ͍
    •  Key-ValueͰҾ͚ΔΑ͏ʹͯ͠BigtableΛ࢖ͬͨํ͕͍͍
    •  BigQueryͷલʹΩϟογϡϨΠϠΛ༻ҙ͢Δࣄྫ΋
    •  Cloud Dataproc or Cloud Dataflow……
    •  Spotify͸Spark͸ෳࡶ͗ͯ͢࢖͑ͳ͍ͱͷࣄͰDataflowΛscala͔Βར༻
    •  https://github.com/spotify/scio
    •  Cloud Datalab͕৽͘͠ͳͬͨͦ͏ͳͷͰظ଴
    •  Jupyter NotebookͷΫϥ΢υ൛

    View Slide

  21. 21
    ·ͱΊ
    •  ͍͖ͳΓ೉͍͠ॴΛૂ͏ͱ੒Ռ͕ग़Δ·Ͱ͕͔͔࣌ؒΔͨΊɺ஍ͳΒ͠Λ͠
    ͳ͕Βσʔλ׆༻ΛਐΊ͍ͯΔ
    •  SQLͰهड़Ͱ͖Δϧʔϧϕʔε΍ҟৗݕ஌ϕʔεͷॲཧ͸ػցֶशͱൺֱ͢
    Δͱૣ͘੒Ռ͕ग़ͤΔ
    •  Cloud Storage, Cloud Logging, Cloud Dataprocͱͷ࿈ܞ͕ڧԽ͞Εɺ
    BigQueryͷϢʔεέʔε͕૿͑ͨ
    •  ਺ඦmsecͷԠ౴ੑೳɺಉ࣌ΫΤϦ࣮ߦ਺ɺ҆ఆੑΛٻΊͳ͚Ε͹BigQuery͸
    Ϧʔζφϒϧʹ࢖͑Δ

    View Slide

  22. 22
    ิ଍
    BigQueryͰ౷ܭྔΛग़࣌͢ʹ࢖͏ΫΤϦϝϞ
    http://qiita.com/hagino3000/items/e9ed62638ebe54391188

    View Slide

  23. 23
    23
    Thank You

    View Slide