Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Ruby で作るデータ分析基盤

Altech
July 14, 2018

Ruby で作るデータ分析基盤

Rails Developers Meetup 2018 Day 3 Extreme https://techplay.jp/event/679666

Altech

July 14, 2018
Tweet

More Decks by Altech

Other Decks in Programming

Transcript

 1. ©2018 Wantedly, Inc. 3VCZͰ࡞Δσʔλ෼ੳج൫ 
 3BJMTΞϓϦέʔγϣϯʹ͓͚Δσʔλ෼ੳͷมભ Rails Developer Meetup 2018

  Day 3 Extreme - Sohei Takeno
 2. ©2018 Wantedly, Inc. 4PIFJ5BLFOP !"[email protected] ೥d8BOUFEMZ 8BOUFEMZ7JTJUͷ։ൃΛ೥ؒ୲౰ 

  άϩʔεɺݕࡧɾਪનɺσʔλج൫ɺ
 ։ൃج൫ͳͲ ݱࡏ͸8BOUFEMZ1FPQMFͷόοΫΤϯυΛ։ൃ ޷͖ͳϓϩάϥϛϯάݴޠ͸3VCZ ࣗݾ঺հ
 3. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

 4. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ w ͳͥσʔλ෼ੳΛ͍͔ͨ͠ w αʔϏε։࢝౰ॳͷ࣮ݱํ๏ w

  ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ ⾣ σʔλ෼ੳج൫ ໨࣍
 5. ©2018 Wantedly, Inc. ⾣ ೥͔ΒαʔϏεఏڙ ⾣ ϞϊϦγοΫͳ3BJMTΞϓϦέʔγϣϯ w σʔλ෼ੳ΋ϑΣʔζͷมԽʹ߹ΘͤͯมΘ͍ͬͯͬͨ ࣄྫɿ8BOUFEMZ7JTJU

 6. ©2018 Wantedly, Inc. ⾣ ϢʔβʔͷߦಈΛݩʹΞϓϦέʔγϣϯ ͷಈ͖Λม͍͑ͨ w FHॏཁͳίϯόʔδϣϯɾεςοϓΛ௨ա͍ͯ͠Δ͔൱͔ w FHίϯόʔδϣϯʹࢸΔ·Ͱͷࡉ͔͍εςοϓ͝ͱͷέΞ

  w ʮ☓☓Λ/ճݟ͍ͯͨΒʯ w ػցֶशʹΑΔ࠷దԽ ⾣ Ϣʔβʔʹ෼ੳػೳΛఏڙ͍ͨ͠ w FHʮࡢ೔ͷ17ͱɺ͜Ε·Ͱͷ17ͷ߹ܭʯ w FHʮଞͱൺֱ͢ΔͨΊͷ17ϥϯΩϯάʯ ͳͥσʔλ෼ੳΛ͍͔ͨ͠ 
 7. ©2018 Wantedly, Inc. ⾣ σʔλΛݩʹҙࢥܾఆ͍ͨ͠ w FH,1*ͷܭࢉ w FH,1*ʹؔ࿈͢Δॏཁࢦඪͷܭࢉ w

  FH"#ςετ ͳͥσʔλ෼ੳΛ͍͔ͨ͠ 
 8. ©2018 Wantedly, Inc. αʔϏε։࢝౰ॳͷ࣮ݱํ๏ ೥d

 9. ©2018 Wantedly, Inc. ⾣ ॏཁͳίϯόʔδϣϯɾεςοϓΛ௨ա͍ͯ͠Δ͔൱͔
 ˠ౰વϦϨʔγϣφϧσʔλϕʔεʹอଘ͞Ε͍ͯΔ ⾣ ʮϢʔβʔ:͕ΞΠςϜΛ/ճݟ͍ͯͨΒʯ
 ˠࣄ৅͕ى͜Δ౓ʹΧ΢ϯτ͓ͯ͘͠PSϩά͔Βܭࢉ͢Δ ⾣

  ʮϢʔβʔ:͕ΞΠςϜ9Λݟ͍ͯͨΒʯ
 ˠϦϨʔγϣφϧσʔλϕʔεʹϩάΛอଘͯ͠ࢀর͢Δ ϢʔβʔͷߦಈΛݩʹΞϓϦέʔγϣϯͷಈ͖Λม͍͑ͨ
 10. ©2018 Wantedly, Inc. ⾣ ʮࡢ೔ͷ17ͱɺ͜Ε·Ͱͷ17ͷ߹ܭʯ
 ˠΞΫηεϩά͔Β೔͝ͱͷ17Λதؒσʔλͱͯ͠࡞੒ ⾣ ʮൺֱͰ͖ΔΑ͏ʹ17ϥϯΩϯάΛग़͢ʯ
 ˠதؒσʔλશମΛूܭ͢ΔΫΤϦΛॻ͖ɺ3BJMTΩϟο γϡͰܰ͘͢Δ

  Ϣʔβʔʹ෼ੳػೳΛఏڙ͍ͨ͠
 11. ©2018 Wantedly, Inc. ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ ೥d

 12. ©2018 Wantedly, Inc. ⾣ lϦϨʔγϣφϧσʔλϕʔεʹϩάΛอଘz w %#༰ྔΛѹഭɺू໿ܭࢉ΋͠ʹ͘͘த్൒୺ͳଘࡏʹ ⾣ lதؒσʔλશମΛूܭ͢ΔΫΤϦΛॻ͖ɺ3BJMTΩϟογϡz w

  ࣌ؒͷܦաʜதؒσʔλͩͱࢥ͍ͬͯͨ΋ͷࣗମ͕ॏ͘ͳΔ w Ϣʔβʔ਺ͷ૿Ճʜಉ࣌ΞΫηε਺͕૿Ճ͠ɺ3BJMTΩϟογϡ ͕ඈΜͩλΠϛϯάͰͷ࠶ܭࢉ͕ಉ࣌ฒߦͰ૸Δ w ্هͷֻ͚߹ΘͤͰ%#͕ϝϞϦΛ࢖͍Ռͨ͢ࣄ݅΋ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
 13. ©2018 Wantedly, Inc. ⾣ σʔλ΢ΣΞϋ΢εར༻ͷଅਐ w 5SFBTVSF%BUB೥d #JH2VFSZ೥d ⾣ ૊৫͕େ͖͘ͳΓɺαʔϏε͕੒௕͢Δʹ࿈Εͯҙࢥܾఆͷ

  ͨΊͷσʔλ෼ੳ͕૿͑Δ w ଟ਺ͷ෼ੳΫΤϦΛ8FC6*Ͱ؅ཧ ୭͕Ͳ͏͍͏ҙਤͰ࡞ΓͲ͏มߋ͞Ε͔ͨ෼͔Βͳ͍ ΫΤϦ͕ؒҧ͍ͬͯͯ൑அΛޡΔ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
 14. ©2018 Wantedly, Inc. ⾣ ෼ੳϩδοΫͷมߋͷ౓ʹେ͖͘ͳͬͨ3BJMTΞϓϦέʔ γϣϯશମΛEFQMPZ͢Δඞཁ͕͋Δ w ੜ࢈ੑ্͕͕Βͳ͍ ⾣ δϣϒͷ඼࣭͕όϥόϥ

  w ႈ౳ੑͷ͋Δδϣϒɺͳ͍δϣϒ w ࠶࣮ߦՄೳͳδϣϒɺෆՄೳͳδϣϒ ੒௕͖ͯͨ͜͠ͱͰى͖ͨ໰୊ 
 15. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

 16. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

 17. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

 18. ©2018 Wantedly, Inc. ⾣ #JH2VFSZͷΫΤϦͷ݁ՌΛ೔࣌Ͱςʔϒϧʹॻ͖ग़͢ ࢖͍ํ export do table

  :daily_page_views columns [:day, :pv] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, '+09:00') day, COUNT(*) AS pv FROM `log.accesses*` WHERE _TABLE_SUFFIX = FROMAT_TIMESTAMP(”%Y%m%d”, TIMESTAMP_SUB(_WT_SCHEDULED_TIME, INTERVAL 1 DAY) ) SQLn @[email protected]$)&%6-&%@5*.&δϣϒͷ࣮ߦ࣌ؒΛදٙ͢ࣅม਺
 19. ©2018 Wantedly, Inc. ⾣ δϣϒϑΝΠϧΛ௥Ճ͠ɺQVMMSFRVFTUʹ͢Δ ⾣ Ϛʔδ͢Δͱɺδϣϒͱͯ͠ొ࿥͞Εఆظ࣮ߦ͞ΕΔ ⾣ $-*͔Β೚ҙͷ೔ͷδϣϒΛ࣮ߦՄೳ ⾣

  ೚ҙͷ3VCZίʔυΛ࣮ߦ͢ΔSVOOFS ࢖͍ํ $ ./bin/job run daily_page_view -s '2018-07-14' run :proc, -> (scheduled_time) { # … return [[:foo, 1], [:bar, 2]] }
 20. ©2018 Wantedly, Inc. ⾣ ΠγϡʔΛཱͯͯɺ1VMM3FRVFTUΛग़͠ɺϨϏϡʔ͢Δ ⾣ CMBNF͔ΒḷΕ͹എܠɾҙਤ͕෼͔Δ w ΫΤϦͷΦʔφʔΛ୳͠ճΔ࡞ۀ΋ඞཁͳ͍ ⾣

  ϫʔΫϑϩʔ͕͋Δ
 ͔Βෳࡶͳ΋ͷΛ
 ੵΈ্͛ΒΕΔ (JU(JU)VCͷϫʔΫϑϩʔʹ৐Δ
 21. ©2018 Wantedly, Inc. ⾣ DGΦϯϥΠϯͰॏ͍ूܭΛͯ͠%#ࢮΜͩࣄ݅ ΞʔΩςΫνϟΛݻఆ͢Δ 3BJMT"QQ XSJUF RVFSZ 3BJMT"QQ

  XSJUFPOMZ BOBMZUJDT SFBEPOMZ MPHHJOH RVFSZ αΠΫϧճ͢୯ํ޲ͷΞʔΩςΫνϟ
 22. ©2018 Wantedly, Inc. ⾣ ʮࡢ೔ΠϯϑϥϨϕϧͰෆ۩߹͕͋ͬͨͷͰʓʓؔ܎ͷδϣϒ Λػցతʹ࠶࣮ߦ͢Δʯ͕Մೳ͔ʁ ⾣ ΠϯλʔϑΣΠεɾϨϕϧͰ࠶࣮ߦՄೳੑΛଅਐ w FYQPSUͷࡍ͸VQEBUF͕σϑΥϧτ

  w SVO͸֎෦͔Β࣮ߦ࣌ؒΛҾ਺Λ༩͑Δ w TDIFEVMJOH͸EBJMZͷΑ͏ͳେ·͔ͳࢦఆ͕σϑΥϧτ ⾣ δϣϒϑΝΠϧ͸3VCZΦϒδΣΫτͱͯ͠ϩʔυ͞ΕΔͷͰɺ ϓϩάϥϚϒϧʹ΋ॲཧͰ͖Δ δϣϒͷ඼࣭ jobs = Analytics::Backend::JobLoader.load_all; daily_jobs = jobs.select {|job| job.schedule&.frequency == :daily } daily_jobs.each do |job| job.execute(scheduled_time: Time.new(2018,1,10), export: true) end
 23. ©2018 Wantedly, Inc. ⾣ ΫϥελϦϯά͞Εͨෳ਺ͷϚγϯ্Ͱ࣮ߦ͞ΕΔ
 ʢޮ཰Խɾ৑௕Խʣ ⾣ ,VCFSOFUFTͷίϚϯυͰొ࿥δϣϒͷҰཡ΍࣮ߦεςʔλ ε͕͋Δఔ౓෼͔Δ δϣϒ࣮ߦج൫ͱͯ͠ͷ,VCFSOFUFTͷ׆༻

  export do table :daily_page_views columns [:day, :pv] mode :update, [:day] end schedule do frequency :daily end run :bq, <<SQL SELECT DATE(_WT_SCHEDULED_TIME, '+09:00') day, COUNT(*) AS pv FROM `log.accesses*` WHERE _TABLE_SUFFIX = Ruby ϑΝΠϧ apiVersion: batch/v1beta1 kind: CronJob metadata: name: visit--user-impressed-companies labels: namespace: visit basename: user_impressed_companies role: job namespace: analytics spec: schedule: "20 6,21 * * *" concurrencyPolicy: "Replace" suspend: false successfulJobsHistoryLimit: 10 failedJobsHistoryLimit: 3 jobTemplate: metadata: name: visit--user-impressed-companies labels: namespace: visit basename: user_impressed_companies role: job spec: backoffLimit: 5 template: metadata: name: visit--user-impressed-companies labels: YAML ϑΝΠϧ Kubernetes HFOFSBUF BQQMZ SVOBGUFSNFSHFE
 24. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷܦҢ ⾣ σʔλ෼ੳج൫ ໨࣍

 25. ©2018 Wantedly, Inc. ⾣ ܧଓతʹ࢖ΘΕଓ͚͍ͯΔ w ೥݄ݱࡏɺδϣϒ ⾣ ఆظతͳػೳ֦ு w

  #JH2VFSZ#JH2VFSZ΁ͷॻ͖ग़͠ w JNQPSUKPC3%##JH2VFSZͷ&5- ಋೖޙ
 26. ©2018 Wantedly, Inc. ⾣ σʔλ෼ੳͷͨΊͷίʔυϕʔεΛ෼཭ͯ͠࢓૊ΈΛ࡞ͬͨ w ࢖͍΍͍͢*'ͷఏڙ w σϓϩΠ࣌ؒͷ୹ॖ w

  ϫʔΫϑϩʔͷ౷Ұ w ΑΓ҆શͳΞʔΩςΫνϟ΁ͷҠߦ w δϣϒͷ඼࣭޲্ w ֦ுՄೳͳ࢓૊Έ ⾣ ݁Ռͱͯ͠ɺ෼ੳܥΛ3BJMTΞϓϦ͔Β੾Γ཭͢͜ͱʹ੒ޭ ⾣ ґଘؔ܎ͷ؅ཧͳͲࠓޙཉ͘͠ͳΔՄೳੑͷ͋Δػೳ͸͋Δ ·ͱΊ