Ruby で作るデータ分析基盤

6115584ec4554ed71f17effd3e0b6988?s=47 Altech
July 14, 2018

Ruby で作るデータ分析基盤

Rails Developers Meetup 2018 Day 3 Extreme https://techplay.jp/event/679666

6115584ec4554ed71f17effd3e0b6988?s=128

Altech

July 14, 2018
Tweet

Transcript

  1. ©2018 Wantedly, Inc. 3VCZͰ࡞Δσʔλ෼ੳج൫ 
 3BJMTΞϓϦέʔγϣϯʹ͓͚Δσʔλ෼ੳͷมભ Rails Developer Meetup 2018

    Day 3 Extreme - Sohei Takeno
  2. ©2018 Wantedly, Inc. 4PIFJ5BLFOP !"MUFDI@  ೥d8BOUFEMZ  8BOUFEMZ7JTJUͷ։ൃΛ೥ؒ୲౰ 

    άϩʔεɺݕࡧɾਪનɺσʔλج൫ɺ
 ։ൃج൫ͳͲ  ݱࡏ͸8BOUFEMZ1FPQMFͷόοΫΤϯυΛ։ൃ  ޷͖ͳϓϩάϥϛϯάݴޠ͸3VCZ ࣗݾ঺հ
  3. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

  4. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ w ͳͥσʔλ෼ੳΛ͍͔ͨ͠ w αʔϏε։࢝౰ॳͷ࣮ݱํ๏ w

    ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ ⾣ σʔλ෼ੳج൫ ໨࣍
  5. ©2018 Wantedly, Inc. ⾣ ೥͔ΒαʔϏεఏڙ ⾣ ϞϊϦγοΫͳ3BJMTΞϓϦέʔγϣϯ w σʔλ෼ੳ΋ϑΣʔζͷมԽʹ߹ΘͤͯมΘ͍ͬͯͬͨ ࣄྫɿ8BOUFEMZ7JTJU

  6. ©2018 Wantedly, Inc. ⾣ ϢʔβʔͷߦಈΛݩʹΞϓϦέʔγϣϯ ͷಈ͖Λม͍͑ͨ w FHॏཁͳίϯόʔδϣϯɾεςοϓΛ௨ա͍ͯ͠Δ͔൱͔ w FHίϯόʔδϣϯʹࢸΔ·Ͱͷࡉ͔͍εςοϓ͝ͱͷέΞ

    w ʮ☓☓Λ/ճݟ͍ͯͨΒʯ w ػցֶशʹΑΔ࠷దԽ ⾣ Ϣʔβʔʹ෼ੳػೳΛఏڙ͍ͨ͠ w FHʮࡢ೔ͷ17ͱɺ͜Ε·Ͱͷ17ͷ߹ܭʯ w FHʮଞͱൺֱ͢ΔͨΊͷ17ϥϯΩϯάʯ ͳͥσʔλ෼ੳΛ͍͔ͨ͠ 
  7. ©2018 Wantedly, Inc. ⾣ σʔλΛݩʹҙࢥܾఆ͍ͨ͠ w FH,1*ͷܭࢉ w FH,1*ʹؔ࿈͢Δॏཁࢦඪͷܭࢉ w

    FH"#ςετ ͳͥσʔλ෼ੳΛ͍͔ͨ͠ 
  8. ©2018 Wantedly, Inc. αʔϏε։࢝౰ॳͷ࣮ݱํ๏ ೥d

  9. ©2018 Wantedly, Inc. ⾣ ॏཁͳίϯόʔδϣϯɾεςοϓΛ௨ա͍ͯ͠Δ͔൱͔
 ˠ౰વϦϨʔγϣφϧσʔλϕʔεʹอଘ͞Ε͍ͯΔ ⾣ ʮϢʔβʔ:͕ΞΠςϜΛ/ճݟ͍ͯͨΒʯ
 ˠࣄ৅͕ى͜Δ౓ʹΧ΢ϯτ͓ͯ͘͠PSϩά͔Βܭࢉ͢Δ ⾣

    ʮϢʔβʔ:͕ΞΠςϜ9Λݟ͍ͯͨΒʯ
 ˠϦϨʔγϣφϧσʔλϕʔεʹϩάΛอଘͯ͠ࢀর͢Δ ϢʔβʔͷߦಈΛݩʹΞϓϦέʔγϣϯͷಈ͖Λม͍͑ͨ
  10. ©2018 Wantedly, Inc. ⾣ ʮࡢ೔ͷ17ͱɺ͜Ε·Ͱͷ17ͷ߹ܭʯ
 ˠΞΫηεϩά͔Β೔͝ͱͷ17Λதؒσʔλͱͯ͠࡞੒ ⾣ ʮൺֱͰ͖ΔΑ͏ʹ17ϥϯΩϯάΛग़͢ʯ
 ˠதؒσʔλશମΛूܭ͢ΔΫΤϦΛॻ͖ɺ3BJMTΩϟο γϡͰܰ͘͢Δ

    Ϣʔβʔʹ෼ੳػೳΛఏڙ͍ͨ͠
  11. ©2018 Wantedly, Inc. ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ ೥d

  12. ©2018 Wantedly, Inc. ⾣ lϦϨʔγϣφϧσʔλϕʔεʹϩάΛอଘz w %#༰ྔΛѹഭɺू໿ܭࢉ΋͠ʹ͘͘த్൒୺ͳଘࡏʹ ⾣ lதؒσʔλશମΛूܭ͢ΔΫΤϦΛॻ͖ɺ3BJMTΩϟογϡz w

    ࣌ؒͷܦաʜதؒσʔλͩͱࢥ͍ͬͯͨ΋ͷࣗମ͕ॏ͘ͳΔ w Ϣʔβʔ਺ͷ૿Ճʜಉ࣌ΞΫηε਺͕૿Ճ͠ɺ3BJMTΩϟογϡ ͕ඈΜͩλΠϛϯάͰͷ࠶ܭࢉ͕ಉ࣌ฒߦͰ૸Δ w ্هͷֻ͚߹ΘͤͰ%#͕ϝϞϦΛ࢖͍Ռͨ͢ࣄ݅΋ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
  13. ©2018 Wantedly, Inc. ⾣ σʔλ΢ΣΞϋ΢εར༻ͷଅਐ w 5SFBTVSF%BUB೥d #JH2VFSZ೥d ⾣ ૊৫͕େ͖͘ͳΓɺαʔϏε͕੒௕͢Δʹ࿈Εͯҙࢥܾఆͷ

    ͨΊͷσʔλ෼ੳ͕૿͑Δ w ଟ਺ͷ෼ੳΫΤϦΛ8FC6*Ͱ؅ཧ ୭͕Ͳ͏͍͏ҙਤͰ࡞ΓͲ͏มߋ͞Ε͔ͨ෼͔Βͳ͍ ΫΤϦ͕ؒҧ͍ͬͯͯ൑அΛޡΔ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
  14. ©2018 Wantedly, Inc. ⾣ ෼ੳϩδοΫͷมߋͷ౓ʹେ͖͘ͳͬͨ3BJMTΞϓϦέʔ γϣϯશମΛEFQMPZ͢Δඞཁ͕͋Δ w ੜ࢈ੑ্͕͕Βͳ͍ ⾣ δϣϒͷ඼࣭͕όϥόϥ

    w ႈ౳ੑͷ͋Δδϣϒɺͳ͍δϣϒ w ࠶࣮ߦՄೳͳδϣϒɺෆՄೳͳδϣϒ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
  15. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

  16. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

  17. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

  18. ©2018 Wantedly, Inc. ⾣ #JH2VFSZͷΫΤϦͷ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ ࢖͍ํ  export do table

    :daily_page_views columns [:day, :pv] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, '+09:00') day, COUNT(*) AS pv FROM `log.accesses*` WHERE _TABLE_SUFFIX = FROMAT_TIMESTAMP(”%Y%m%d”, TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY) ) SQLn @85@4$)&%6-&%@5*.&δϣϒͷ࣮ߦ࣌ؒΛදٙ͢ࣅม਺
  19. ©2018 Wantedly, Inc. ⾣ δϣϒϑΝΠϧΛ௥Ճ͠ɺQVMMSFRVFTUʹ͢Δ ⾣ Ϛʔδ͢Δͱɺδϣϒͱͯ͠ొ࿥͞Εఆظ࣮ߦ͞ΕΔ ⾣ $-*͔Β೚ҙͷ೔ͷδϣϒΛ࣮ߦՄೳ ⾣

    ೚ҙͷ3VCZίʔυΛ࣮ߦ͢ΔSVOOFS ࢖͍ํ  $ ./bin/job run daily_page_view -s '2018-07-14' run :proc, -> (scheduled_time) { # … return [[:foo, 1], [:bar, 2]] }
  20. ©2018 Wantedly, Inc. ⾣ ΠγϡʔΛཱͯͯɺ1VMM3FRVFTUΛग़͠ɺϨϏϡʔ͢Δ ⾣ CMBNF͔ΒḷΕ͹എܠɾҙਤ͕෼͔Δ w ΫΤϦͷΦʔφʔΛ୳͠ճΔ࡞ۀ΋ඞཁͳ͍ ⾣

    ϫʔΫϑϩʔ͕͋Δ
 ͔Βෳࡶͳ΋ͷΛ
 ੵΈ্͛ΒΕΔ (JU(JU)VCͷϫʔΫϑϩʔʹ৐Δ
  21. ©2018 Wantedly, Inc. ⾣ DGΦϯϥΠϯͰॏ͍ूܭΛͯ͠%#ࢮΜͩࣄ݅ ΞʔΩςΫνϟΛݻఆ͢Δ 3BJMT"QQ XSJUF RVFSZ 3BJMT"QQ

    XSJUFPOMZ BOBMZUJDT SFBEPOMZ MPHHJOH RVFSZ αΠΫϧճ͢୯ํ޲ͷΞʔΩςΫνϟ
  22. ©2018 Wantedly, Inc. ⾣ ʮࡢ೔ΠϯϑϥϨϕϧͰෆ۩߹͕͋ͬͨͷͰʓʓؔ܎ͷδϣϒ Λػցతʹ࠶࣮ߦ͢Δʯ͕Մೳ͔ʁ ⾣ ΠϯλʔϑΣΠεɾϨϕϧͰ࠶࣮ߦՄೳੑΛଅਐ w FYQPSUͷࡍ͸VQEBUF͕σϑΥϧτ

    w SVO͸֎෦͔Β࣮ߦ࣌ؒΛҾ਺Λ༩͑Δ w TDIFEVMJOH͸EBJMZͷΑ͏ͳେ·͔ͳࢦఆ͕σϑΥϧτ ⾣ δϣϒϑΝΠϧ͸3VCZΦϒδΣΫτͱͯ͠ϩʔυ͞ΕΔͷͰɺ ϓϩάϥϚϒϧʹ΋ॲཧͰ͖Δ δϣϒͷ඼࣭ jobs = Analytics::Backend::JobLoader.load_all; daily_jobs = jobs.select {|job| job.schedule&.frequency == :daily } daily_jobs.each do |job| job.execute(scheduled_time: Time.new(2018,1,10), export: true) end
  23. ©2018 Wantedly, Inc. ⾣ ΫϥελϦϯά͞Εͨෳ਺ͷϚγϯ্Ͱ࣮ߦ͞ΕΔ
 ʢޮ཰Խɾ৑௕Խʣ ⾣ ,VCFSOFUFTͷίϚϯυͰొ࿥δϣϒͷҰཡ΍࣮ߦεςʔλ ε͕͋Δఔ౓෼͔Δ δϣϒ࣮ߦج൫ͱͯ͠ͷ,VCFSOFUFTͷ׆༻

    export do table :daily_page_views columns [:day, :pv] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, '+09:00') day, COUNT(*) AS pv FROM `log.accesses*` WHERE _TABLE_SUFFIX = Ruby ϑΝΠϧ apiVersion: batch/v1beta1 kind: CronJob metadata: name: visit--user-impressed-companies labels: namespace: visit basename: user_impressed_companies role: job namespace: analytics spec: schedule: "20 6,21 * * *" concurrencyPolicy: "Replace" suspend: false successfulJobsHistoryLimit: 10 failedJobsHistoryLimit: 3 jobTemplate: metadata: name: visit--user-impressed-companies labels: namespace: visit basename: user_impressed_companies role: job spec: backoffLimit: 5 template: metadata: name: visit--user-impressed-companies labels: YAML ϑΝΠϧ Kubernetes HFOFSBUF BQQMZ SVOBGUFSNFSHFE
  24. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

  25. ©2018 Wantedly, Inc. ⾣ ܧଓతʹ࢖ΘΕଓ͚͍ͯΔ w ೥݄ݱࡏɺδϣϒ ⾣ ఆظతͳػೳ֦ு w

    #JH2VFSZ#JH2VFSZ΁ͷॻ͖ग़͠ w JNQPSUKPC3%##JH2VFSZͷ&5- ಋೖޙ
  26. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷͨΊͷίʔυϕʔεΛ෼཭ͯ͠࢓૊ΈΛ࡞ͬͨ w ࢖͍΍͍͢*'ͷఏڙ w σϓϩΠ࣌ؒͷ୹ॖ w

    ϫʔΫϑϩʔͷ౷Ұ w ΑΓ҆શͳΞʔΩςΫνϟ΁ͷҠߦ w δϣϒͷ඼࣭޲্ w ֦ுՄೳͳ࢓૊Έ ⾣ ݁Ռͱͯ͠ɺ෼ੳܥΛ3BJMTΞϓϦ͔Β੾Γ཭͢͜ͱʹ੒ޭ ⾣ ґଘؔ܎ͷ؅ཧͳͲࠓޙཉ͘͠ͳΔՄೳੑͷ͋Δػೳ͸͋Δ ·ͱΊ