Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
DigdagでETL処理をする
Search
tosametal
July 19, 2019
Technology
0
4.2k
DigdagでETL処理をする
データとML周辺エンジニアリングを考える会 #2
https://data-engineering.connpass.com/event/136756/
#data_ml_engineering
tosametal
July 19, 2019
Tweet
Share
More Decks by tosametal
See All by tosametal
マイクロアドのアドテクを支える技術
tosametal
1
200
Qiita Career Meetup for Server Side Engineers
tosametal
4
4.3k
Other Decks in Technology
See All in Technology
1,000 にも届く AWS Organizations 組織のポリシー運用をちゃんとしたい、という話
kazzpapa3
0
150
ファインディの横断SREがTakumi byGMOと取り組む、セキュリティと開発スピードの両立
rvirus0817
1
1.6k
Codex 5.3 と Opus 4.6 にコーポレートサイトを作らせてみた / Codex 5.3 vs Opus 4.6
ama_ch
0
200
SREチームをどう作り、どう育てるか ― Findy横断SREのマネジメント
rvirus0817
0
350
AWS Network Firewall Proxyを触ってみた
nagisa53
1
240
Context Engineeringの取り組み
nutslove
0
380
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
10Xにおける品質保証活動の全体像と改善 #no_more_wait_for_test
nihonbuson
PRO
2
330
ランサムウェア対策としてのpnpm導入のススメ
ishikawa_satoru
0
220
今こそ学びたいKubernetesネットワーク ~CNIが繋ぐNWとプラットフォームの「フラッと」な対話
logica0419
5
390
Frontier Agents (Kiro autonomous agent / AWS Security Agent / AWS DevOps Agent) の紹介
msysh
3
180
広告の効果検証を題材にした因果推論の精度検証について
zozotech
PRO
0
210
Featured
See All Featured
Bioeconomy Workshop: Dr. Julius Ecuru, Opportunities for a Bioeconomy in West Africa
akademiya2063
PRO
1
55
HU Berlin: Industrial-Strength Natural Language Processing with spaCy and Prodigy
inesmontani
PRO
0
230
Exploring anti-patterns in Rails
aemeredith
2
250
My Coaching Mixtape
mlcsv
0
49
The Curious Case for Waylosing
cassininazir
0
240
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
67
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
47
7.9k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
57
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
240
Testing 201, or: Great Expectations
jmmastey
46
8.1k
From π to Pie charts
rasagy
0
130
Building Flexible Design Systems
yeseniaperezcruz
330
40k
Transcript
DigdagͰETLॲཧΛ͢Δ σʔλͱMLपลΤϯδχΞϦϯάΛߟ͑Δձ #2 2019.07.19 தᠳଠ(@tosametal) גࣜձࣾϚΠΫϩΞυ ΞϓϦέʔγϣϯΤϯδχΞ
ϚΠΫϩΞυʹ͓͚Δػցֶश ࠂ৴γεςϜʹ͓͚ΔCTR༧ଌɺCVR༧ଌɺෆਖ਼ΫϦοΫͷݕग़ͳͲ
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫)
ϩάج൫ͷߏ Imp Server Click Server RTB Server Kafka Hadoop (σʔλΣΞϋε)
Digdag Hadoop (ੳج൫) at least once ϢχʔΫͳIDʹΑΔॏෳഉআ sessionͰཧ ႈͳॲཧ Kafka secondaryͰ kafkaΛࢦఆ jsonܗࣜͷ ߏԽσʔλ
Digdagͱ digϑΝΠϧʹએݴతʹϫʔΫϑϩʔΛهड़ Workflow as code εέδϡʔϧ࣮ߦɺϦΧόϦ UI͔Βਐḿͷ֬ೝ࠶࣮ߦ͕Մೳ ΦϖϨʔλΛࣗ࡞Մೳ
PostgreSQL ࣮ߦཤྺͳͲΛอଘ Task͝ͱʹhadoopΫϥΠΞϯτ ͱͳΔίϯςφΛ্ཱͪ͛Δ εέʔϧΞτՄೳ όον࣮ߦج൫ߏ
ෳࡶͳґଘؔΛ੍ޚͭͭ͠ ϫʔΫϑϩʔͷՄಡੑΛอͭ
ϓϩδΣΫτΛػೳ୯ҐͰׂ ϓϩδΣΫτͱ In Digdag, workflows are packaged together with other
files used in the workflows. The files can be anything such as SQL scripts, Python/Ruby/Shell scripts, configuration files, etc. This set of the workflow definitions is called project. ެࣜυΩϡϝϯτ(http://docs.digdag.io/)ΑΓҾ༻ ϚΠΫϩΞυͰݱࡏ60ݸͷϓϩδΣΫτ͕ಈ͍͍ͯΔ
ϓϩδΣΫτͷґଘؔ schedule: daily>: 12:00:00 +task1: _parallel: true +subtask1: call>: subtask1.dig
+subtask2: call>: subtask2.dig +task2: echo>: task finished successfully •callΦϖϨʔλΛ͏͜ͱͰdigϑΝΠϧ ͷׂΛߦ͏͜ͱ͕Մೳ •requireΛ͏ͱ͏গ͠ෳࡶͳDAGͷ දݱՄೳ subtask1 subtask2 task2
ϓϩδΣΫτؒͷґଘؔ ϓϩδΣΫτA ϓϩδΣΫτB ଞͷϓϩδΣ Ϋτͷ݁ՌΛݟΔ ͜ͱग़དྷͳ͍
ϓϩδΣΫτؒͷґଘؔ +touch_task: s3_touch>: bucket/flag/fileX +wait_task: s3_wait>: bucket/flag/fileX ϓϩδΣΫτB ϓϩδΣΫτA fileX
ࣗ࡞ΦϖϨʔλ ࢀߟ:https://github.com/ tosametal/digdag-plugins
ͦͷଞ ϫʔΫϑϩʔશମΛႈʹ͢Δ • hiveΫΤϦinsert overwrite • distcpoverwrite deleteΦϓγϣϯΛࢦఆ ϦτϥΠΛઃఆ͢Δ •
exponential interval
·ͱΊ • ϓϩδΣΫτංେԽ͠ͳ͍Α͏ʹػೳͰׂ • ϓϩδΣΫτؒͷґଘs3_waitͰղܾ • Α͘͏ػೳϓϥάΠϯΛ࡞Ζ͏
None