Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
iQONを支えるデータ分析基盤/iqon-bigquery
Search
Masayuki Imamura
April 23, 2015
Programming
3
10k
iQONを支えるデータ分析基盤/iqon-bigquery
2015.04.23 えびスタ vol2での発表内容です @kyuns
Masayuki Imamura
April 23, 2015
Tweet
Share
More Decks by Masayuki Imamura
See All by Masayuki Imamura
経営視点から捉えた開発生産性 / Development productivity from a management perspective
kyuns
11
7.8k
Qiita:Teamをハックして成果をあげるための情報共有方法/Qiita:Team
kyuns
6
3.3k
3年連続ベストアプリ受賞のプロダクトを支える裏側/The way to Achieve The Best App 3 years in a row
kyuns
1
1.6k
機械学習とデータ分析を支えるマルチクラウドなアーキテクチャの紹介/Multi Cloud Architecture Supporting Machine Learning and Data Analysis
kyuns
4
9.6k
日本最大級のファッションDBを支える裏側/how to manage the complex web service
kyuns
4
820
iQONを支えるクローラー/iQON Crawler
kyuns
12
4.1k
iQON Tools
kyuns
1
3.8k
プッシュ通知大戦争/effective push notification by iQON
kyuns
28
8.2k
VASILY流エンジニアドリブン / vasily engineer driven way
kyuns
2
2.3k
Other Decks in Programming
See All in Programming
VS Code extension: ドラッグ&ドロップでファイルを並び替える
ttrace
0
170
Pythonによるイベントソーシングへの挑戦と現状に対する考察 / Challenging Event Sourcing with Python and Reflections on the Current State
nrslib
3
1.3k
2024-10-02 dev2next - Application Observability like you've never heard before
jonatan_ivanov
0
180
実践Dash - 手を抜きながら本気で作るデータApplicationの基本と応用 / Dash for Python and Baseball
shinyorke
2
560
UnJSで簡単に始めるCLIツール開発 / cli-tool-development-with-unjs
aoseyuu
2
310
Jakarta EE as Seen Trough the Lens of the ASF
ivargrimstad
0
220
Cohesion in Modeling and Design
mploed
3
210
sqlcを利用してsqlに型付けを
kamiyam
0
240
クラウドサービスの 利用コストを削減する技術 - 円安の真南風を感じて -
pyama86
3
400
Повторное использование кода в ML: почему ML-пайплайны могут помочь?
lamodatech
0
190
Introduce dRuby
ledsun
0
120
Progressive Web Apps for Rails developers
siaw23
2
550
Featured
See All Featured
Optimizing for Happiness
mojombo
375
69k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.6k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
How to Think Like a Performance Engineer
csswizardry
16
1k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
225
22k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
191
16k
What the flash - Photography Introduction
edds
67
11k
The Brand Is Dead. Long Live the Brand.
mthomps
53
38k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
Clear Off the Table
cherdarchuk
91
320k
Visualization
eitanlees
143
15k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
27
1.9k
Transcript
J20/Λࢧ͑Δ σʔλੳج൫ͱ#JH2VFSZ ͑ͼελWPM 7"4*-:!LZVOT
ࠓଜխ !LZVOT 7"4*-: *OD औక$50$P'PVOEFS ʹ:BIPP+"1"/ʹ৽ଔೖࣾ :BIPP'"4)*0/ɺ9#3"/%ͳͲͷϝσΟΞͷ্ཱͪ͛ ʹಠཱɺ7"4*-:Λۀɺऔక$50ʹब ͖ͳ༏ਫथಸʑ
ঁੑͷϑΝογϣϯϥΠϑΛΦγϟϨʹً͔ͤΔͨΊͷɺ ண͍ͨίʔσ͕ݟ͔ͭΔɺങ͑ΔɺஷΊΒΕΔϑΝογϣϯΞϓϦɻ ݱࡏສͷձһొ ٕज़ސ·ͭͱΏ͖ͻΖ
J20/ͷσʔλͱʁ
w ΞϓϦΠϕϯτϩά w ϓογϡ௨ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ શͯ#JH2VFSZʹ͍Εͨ
ͳͥσʔλΛूΊͨͷ͔ w ࣾͷ৭ʑͳσʔλ͕༷ʑͳετϨʔδʹࢄ ͍ͯ͠Δʢ.Z42- .POHP%# ֤छ4BB4ʜ w σʔλΛՕॴʹूΊͯޮతʹੳ͍ͨ͠ w
#*πʔϧͰΰϦΰϦੳ͍ͨ͠ w μογϡϘʔυ࡞Γ͍ͨ
ͳͥ#JH2VFSZͳͷ͔ʁ w JNQPSUͷܗ͕ࣜॊೈͰ͋Δ w #*πʔϧରԠ UBCMFBV w ԯϨίʔυͰߴʹಈ͘ w
qVFOUEͷQMVHJO͕ἧͬͯΔ w TUSFBNJOHJOTFSUʹରԠ͍ͯ͠Δ
w ΞϓϦΠϕϯτϩά w ϓογϡ௨ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ
ΞϓϦߦಈϩά ΞϓϦղੳπʔϧʮ-PDBMZUJDTʯͷੜσʔλ ϢʔβʔͷશΠϕϯτΛهͨ͠+40/ ԯߦ݄
ऩूΞʔΩςΫνϟ Amazon S3 "844 &$ (PPHMF$MPVE4UPSBHF #JH2VFSZ ΞϓϦ JNQPSUCBUDI -PDBMZUJDTͷTCVDLFU͔Β($4ܦ༝Ͱ#JH2VFSZʹDPQZ
ͷͰҰ(PPHMF$MPVE4UPSBHFʹDPQZͨ͠΄͏͕͍ ఆظతʹ#BUDIͰJNQPSU
w ΞϓϦΠϕϯτϩά w ϓογϡ௨ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ
ϓογϡ௨ϩά w ΞϓϦͷϓογϡ௨ͷϩά w શͯͷϓογϡͷૹ৴ɺ։෧ϩά ສRQT w ϓογϡͷ$53ޮՌݕূλʔήςΟϯάΞϧ ΰϦζϜͳͲͷ࠷దԽʹར༻
w qVFOUE͕େ׆༂
ϩάϑΥʔϚοτ ͍ͭͲͷVTFSʹରͯ͠ͲΜͳϓογϡΛૹ͔ͬͨɺ։͍͔ͨ
ϓογϡϩάΞʔΩςΫνϟ #JH2VFSZ "1*4FSWFST qVFOUE "1/T ($. -PH4FSWFS qVFOUE qVFOUEܦ༝Ͱ#JH2VFSZʹ 4USFBNJOH*OTFSU
qVFOUQMVHJOCJHRVFSZΛར༻ ΞϓϦ Amazon S3 "844 ૹ৴ϩά ΫϦοΫϩά ૹ৴ˍΫϦοΫϩά
<case event.notification.click> <store> type bigquery method insert auth_method private_key email
[email protected]
private_key_path /home/vasily/fluentd/XXXXXX.p12 buffer_type file flush_interval 0 try_flush_interval 0.05 queued_chunk_flush_interval 0.01 buffer_chunk_records_limit 250 buffer_chunk_limit 512k bugger_queue_limit 1024 retry_limit 5 retry_wait 0.5 num_threads 32 dataset app project iqon-data-mining auto_create_table true table event_notification_click_%Y%m schema_path /home/vasily/fluentd/schema_event_notification_click.json buffer_path /var/log/td-agent/buffer/bigquery/event.notification.click </store> </case> ສRQTҎ্ʹ͑ΔͨΊʹ CV⒎FS@UZQFΛpMFʹ͠ͳ͍ͱ͍͚ͳ͔ͬͨ
w ΞϓϦΠϕϯτϩά w ϓογϡ௨ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ
Ϛελʔσʔλ w .Z42- 3%4 ʹ͋ΔJ20/ͷதͷϚελʔσʔλ 6TFST *UFNT 4FUT #SBOET 4IPQʜFUD
w ϢʔβʔͷΞΫςΟϏςΟσʔλ ΞΠςϜ-*,& ίʔσ-*,& ϑΥϩʔʜFUD ԯϨίʔυͱ͔͋Δςʔϒϧୡ w "84্ͷ3%4ʹ֨ೲ͞Ε͍ͯΔͷΛ#JH2VFSZʹ 4ZOD
TZODํ๏ w ಠࣗεΫϦϓτͰҰׅJNQPSU w .Z42-ͷςʔϒϧఆ͔ٛΒࣗಈతʹ#JH2VFSZ ͷ4DIFNBΛੜ w ·Δ͝ͱςʔϒϧΛEVNQͯ͠·Δ͝ͱDPQZ w ઍສϨίʔυͰͦΜͳʹ͔͔࣌ؒΒͳ͍
ఔ
3%4UP#JH2VFSZ "843%4 &$ (PPHMF$MPVE4UPSBHF #JH2VFSZ JNQPSUCBUDI EFTDUBCMF@OBNFͨ͠༰͔Β#JH2VFSZͷTDIFNBKTPOΛੜ TFMFDU GSPNUBCMF@OBNFUBCMFUTWͨ͠༰Λ($4ʹVQMPBEͨ͠ޙɺ #JH2VFSZʹλϒ۠ΓUTWͱͯ͠DPQZΧϯϚ۠ΓͷσʔλʹରԠ͢ΔͨΊ
bq load --max_bad_record=1000 --project_id=iqon-data-mining —source_format=CSV --field_delimiter='\t' --skip_leading_rows=1 iqon-data- mining:app.#{@table_name} gs://iqon-rds/#{@table_name}.tsv schema_#{@table_name}.json
w ΞϓϦΠϕϯτϩά w ϓογϡ௨ϩά w Ϛελʔσʔλ w ηʔϧεσʔλ
ηʔϧεσʔλ w ΞΠςϜͷߪೖΫϦοΫɺߪೖྃσʔλ w ֨ೲઌ.POHP%# w ͱͯେ͖͍ίϨΫγϣϯʢे(
w NPOHPCRΛ༻͍ͯ#JH2VFSZʹҰׅJNQPSU !IBLPCFSB IUUQTHJUIVCDPNIBLPCFSBNPOHPCR
.POHP%#UP#JH2VFSZ &$ (PPHMF$MPVE4UPSBHF #JH2VFSZ JNQPSUCBUDI NPOHPCRίϚϯυΛ༻͍ͯTUSFBNͰॲཧͯ͘͠ΕΔ TDIFNBࣗಈఆػೳ͋Δ͕ɺಠࣗͰTDIFNBΛࢦఆͰ͖Δ ࢦఆ͢ΔLFZpMFQͰͳ͘KTPOܗࣜͷͷͳͷͰҙ mongobq --host
mongo02c --port 27017 --database iqon_conversion --collection click_log -q '{"date": "#{date}"}' --project iqon-data-mining --dataset app --keyfile /home/vasily/XXXX.json -B iqon-mongo -T click_log_test --schema ./ schema_mongo_iqon_conversion_click_log.json --autoclean
ΞΫηεσʔλ ൪֎ฤ w (PPHMF"OBMZUJDTͷσʔλ (PPHMF"OBMZUJDTϓϨϛΞϜͳΒ#JH2VFSZʹ ੜσʔλΛࣗಈతʹJNQPSUͯ͘͠ΕΔ w ສԁ݄͙Β͍ͱ͍͏ᷚ
·ͱΊ w શͯͷσʔλϩάΛ#JH2VFSZՕॴʹूΊͨ ͓͔͛Ͱੳ͕֨ஈʹޮతʹͳͬͨ ϚελʔσʔλͱϩάσʔλΛ+0*/ͯ͠ੳ w #JH2VFSZ҆ͯ͘͏·͍ 5#ετϨʔδ݄ͬͯ ԁఔ εΩϟϯྉ͕5#͋ͨΓ݄ԁఔ
ݱࡏ
8F`SF)JSJOH J20/ຊͰҰ൪ϑΝογϣϯʹ ؔ͢Δσʔλ͕ू·ΔॴͰ͢ɻ σʔλΛͬͯνϟϨϯδͯ͠Έ͍ͨਓΛ ͓͓ͪͯ͠Γ·͢ɻ JOGP!WBTJMZKQ