Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DigdagでETL処理をする
Search
tosametal
July 19, 2019
Technology
0
4.2k
DigdagでETL処理をする
データとML周辺エンジニアリングを考える会 #2
https://data-engineering.connpass.com/event/136756/
#data_ml_engineering
tosametal
July 19, 2019
Tweet
Share
More Decks by tosametal
See All by tosametal
マイクロアドのアドテクを支える技術
tosametal
1
200
Qiita Career Meetup for Server Side Engineers
tosametal
4
4.2k
Other Decks in Technology
See All in Technology
Digitization部 紹介資料
sansan33
PRO
1
6.6k
SwiftDataを覗き見る
akidon0000
0
270
「アウトプット脳からユーザー価値脳へ」がそんなに簡単にできたら苦労しない #RSGT2026
aki_iinuma
11
5.5k
Oracle Database@AWS:サービス概要のご紹介
oracle4engineer
PRO
2
920
マーケットプレイス版Oracle WebCenter Content For OCI
oracle4engineer
PRO
5
1.5k
Contract One Engineering Unit 紹介資料
sansan33
PRO
0
12k
AWSと生成AIで学ぶ!実行計画の読み解き方とSQLチューニングの実践
yakumo
2
590
ALB「証明書上限問題」からの脱却
nishiokashinji
0
220
2025年 山梨の技術コミュニティを振り返る
yuukis
0
160
Node vs Deno vs Bun 〜推しランタイムを見つけよう〜
kamekyame
1
520
20260114_データ横丁 新年LT大会:2026年の抱負
taromatsui_cccmkhd
0
310
田舎で20年スクラム(後編):一個人が企業で長期戦アジャイルに挑む意味
chinmo
1
1.6k
Featured
See All Featured
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
22k
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
170
The Director’s Chair: Orchestrating AI for Truly Effective Learning
tmiket
1
79
BBQ
matthewcrist
89
10k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
0
120
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
KATA
mclloyd
PRO
33
15k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Leveraging Curiosity to Care for An Aging Population
cassininazir
1
150
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
1
360
The Hidden Cost of Media on the Web [PixelPalooza 2025]
tammyeverts
2
140
The SEO identity crisis: Don't let AI make you average
varn
0
52
Transcript
DigdagͰETLॲཧΛ͢Δ σʔλͱMLपลΤϯδχΞϦϯάΛߟ͑Δձ #2 2019.07.19 தᠳଠ(@tosametal) גࣜձࣾϚΠΫϩΞυ ΞϓϦέʔγϣϯΤϯδχΞ
ϚΠΫϩΞυʹ͓͚Δػցֶश ࠂ৴γεςϜʹ͓͚ΔCTR༧ଌɺCVR༧ଌɺෆਖ਼ΫϦοΫͷݕग़ͳͲ
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫)
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫) at least once ϢχʔΫͳIDʹΑΔॏෳഉআ sessionͰཧ ႈͳॲཧ Kafka secondaryͰ kafkaΛࢦఆ jsonܗࣜͷ ߏԽσʔλ
Digdagͱ digϑΝΠϧʹએݴతʹϫʔΫϑϩʔΛهड़ Workflow as code εέδϡʔϧ࣮ߦɺϦΧόϦ UI͔Βਐḿͷ֬ೝ࠶࣮ߦ͕Մೳ ΦϖϨʔλΛࣗ࡞Մೳ
PostgreSQL ࣮ߦཤྺͳͲΛอଘ Task͝ͱʹhadoopΫϥΠΞϯτ ͱͳΔίϯςφΛ্ཱͪ͛Δ εέʔϧΞτՄೳ όον࣮ߦج൫ߏ
ෳࡶͳґଘؔΛ੍ޚͭͭ͠ ϫʔΫϑϩʔͷՄಡੑΛอͭ
ϓϩδΣΫτΛػೳ୯ҐͰׂ ϓϩδΣΫτͱ In Digdag, workflows are packaged together with other
files used in the workflows. The files can be anything such as SQL scripts, Python/Ruby/Shell scripts, configuration files, etc. This set of the workflow definitions is called project. ެࣜυΩϡϝϯτ(http://docs.digdag.io/)ΑΓҾ༻ ϚΠΫϩΞυͰݱࡏ60ݸͷϓϩδΣΫτ͕ಈ͍͍ͯΔ
ϓϩδΣΫτͷґଘؔ schedule: daily>: 12:00:00 +task1: _parallel: true +subtask1: call>: subtask1.dig
+subtask2: call>: subtask2.dig +task2: echo>: task finished successfully •callΦϖϨʔλΛ͏͜ͱͰdigϑΝΠϧ ͷׂΛߦ͏͜ͱ͕Մೳ •requireΛ͏ͱ͏গ͠ෳࡶͳDAGͷ දݱՄೳ subtask1 subtask2 task2
ϓϩδΣΫτؒͷґଘؔ ϓϩδΣΫτA ϓϩδΣΫτB ଞͷϓϩδΣ Ϋτͷ݁ՌΛݟΔ ͜ͱग़དྷͳ͍
ϓϩδΣΫτؒͷґଘؔ +touch_task: s3_touch>: bucket/flag/fileX +wait_task: s3_wait>: bucket/flag/fileX ϓϩδΣΫτB ϓϩδΣΫτA fileX
ࣗ࡞ΦϖϨʔλ ࢀߟ:https://github.com/ tosametal/digdag-plugins
ͦͷଞ ϫʔΫϑϩʔશମΛႈʹ͢Δ • hiveΫΤϦinsert overwrite • distcpoverwrite deleteΦϓγϣϯΛࢦఆ ϦτϥΠΛઃఆ͢Δ •
exponential interval
·ͱΊ • ϓϩδΣΫτංେԽ͠ͳ͍Α͏ʹػೳͰׂ • ϓϩδΣΫτؒͷґଘs3_waitͰղܾ • Α͘͏ػೳϓϥάΠϯΛ࡞Ζ͏
None