Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tech x Marketing #4 Airflowでもサブワークフロー単位で分割開発したい!
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Naoki Matsuda
August 27, 2020
Programming
220
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Tech x Marketing #4 Airflowでもサブワークフロー単位で分割開発したい!
Naoki Matsuda
August 27, 2020
More Decks by Naoki Matsuda
See All by Naoki Matsuda
[PyCon JP 2019] 新米Pythonistaが贈るAirflow入門&活用事例紹介
matsudan
2
7k
Other Decks in Programming
See All in Programming
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.2k
3Dシーンの圧縮
fadis
1
770
ふつうのFeature Flag実践入門
irof
7
3.8k
JavaDoc 再入門
nagise
0
330
例外の正しい扱い方 そのエラー try-catchして大丈夫?
jinwatanabe
0
230
Hunting Vulnerabilities in Symfony with LLMs
vinceamstoutz
0
540
The NotImplementedError Problem in Ruby
koic
1
740
jQueryをバージョンアップする前に使いたいjQuery Migrate
matsuo_atsushi
0
410
肥大化するレガシーコードに立ち向かうためのインターフェース分離と依存の逆転 / JJUG CCC 2026 Spring
hirokunimaeta
0
540
Technical Debt: Understanding it Rightly, Engaging it Rightly #LaravelLiveJP
shogogg
0
220
エージェンティックRAGにAWSで入門しよう!
har1101
8
1.5k
ADKを使って簡単にAIエージェントを作ってみよう
k1mu21
0
260
Featured
See All Featured
Technical Leadership for Architectural Decision Making
baasie
3
410
RailsConf 2023
tenderlove
30
1.5k
Paper Plane (Part 1)
katiecoart
PRO
0
8.9k
Leading Effective Engineering Teams in the AI Era
addyosmani
9
2k
Measuring Dark Social's Impact On Conversion and Attribution
stephenakadiri
2
220
B2B Lead Gen: Tactics, Traps & Triumph
marketingsoph
0
140
Sam Torres - BigQuery for SEOs
techseoconnect
PRO
0
290
Product Roadmaps are Hard
iamctodd
PRO
55
12k
WCS-LA-2024
lcolladotor
0
630
Agile that works and the tools we love
rasmusluckow
331
21k
How to audit for AI Accessibility on your Front & Back End
davetheseo
0
420
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.7k
Transcript
AirflowͰαϒϫʔΫϑϩʔ ୯ҐͰׂ։ൃ͍ͨ͠ʂ Tech x Marketing meetup #4 גࣜձࣾ ి௨σδλϧ দా
थ 2020/08/27
ࣗݾհ - ໊લɿদా थ (·ͭͩ ͳ͓͖) - ॴଐɿגࣜձࣾ ి௨σδλϧ (2018ೖࣾ)
- ۀɿόοΫΤϯυαʔϏεAPI࣮ ETLߏஙͳͲ (༻ݴޠɿGo / Python) - ֶੜ࣌ɿےܹ࣌ͷΤωϧΪʔফඅ/࢈ੜͷݚڀ AirflowͷTalk & Blogࣥචɿ - ৽ถPythonista͕ଃΔAirflowೖ&׆༻ࣄྫհ(PyConJP2019) - AirflowͷλεΫ࣮ߦڥΛ͢Δ(DentsuDigital Tech Blog)
༰ Γ͍ͨ͜ͱ AirflowͷDAG (ϫʔΫϑϩʔ)ΛෳਓͰ։ൃ͍ͨ͠ʂ ݕ౼ͨ͠ํ๏ X SubDagOperatorʢฒྻ࣮ߦͷ੍ޚෆՄɺϚωʔδυαʔϏεͰ༻͕ਪ͞Εͯͳ͍ʣ X TriggerDagRunOperator /
ExternalTaskSensorʢґଘ͕ؔΘ͔ΓͮΒ͘ͳΔʣ ̋ λεΫΛ࡞ΔؔΛ࡞ׂͬͯʢఏҊख๏ʣ
Apache Airflowͱ - PythonͰهड़͞ΕͨϫʔΫϑϩʔ(DAG)ͷ࣮ߦɾࢹπʔϧ - ApacheτοϓϨϕϧϓϩδΣΫτͷͻͱͭ ίϯτϦϏϡʔλɿ1000 + ελʔɿ17000 +
Apache Airflowͱ - DAGɿTaskͷ࣮ߦॱংΛܾఆ͢Δάϥϑ - OperatorɿςϯϓϨʔτԽ͞Ε࣮ͨߦ୯Ґ - Taskɿύϥϝʔλ͕༩͑ΒΕͨOperator Operators (Python,
HTTP, MySQL, KubernetesPod…) A B C D https://www.slideshare.net/potix2_jp/airflow-224004058 DAG - Taskґଘؔɿ >> Ͱఆٛ ྫ) A >> [B, C] >> D Task
Γ͍ͨ͜ͱ AirflowͷDAG (ϫʔΫϑϩʔ)ΛෳਓͰ։ൃ͍ͨ͠ʂ ݕ౼ͨ͠ํ๏ X SubDagOperatorʢฒྻ࣮ߦͷ੍ޚෆՄɺϚωʔδυαʔϏεͰ༻͕ਪ͞Εͯͳ͍ʣ X TriggerDagRunOperator / ExternalTaskSensorʢґଘ͕ؔΘ͔ΓͮΒ͘ͳΔʣ
̋ λεΫΛ࡞ΔؔΛ࡞ׂͬͯʢఏҊख๏ʣ ͷഎܠ ༰
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DAGׂ(ػೳ)͕ҟͳΔϑϩʔʹΑΓߏ͞Ε͏Δ
DAGΛෳਓͰ։ൃ͍ͨ͠ʂ ɾɾɾ DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ
σʔλϚʔτΛߋ৽͢ΔDAG ׂ(ػೳ)͕ҟͳΔϑϩʔ֤ʑҟͳΔγεςϜͱ࿈ܞ͢Δ ࿈ܞઌγεςϜ
DAGΛෳਓͰ։ൃ͍ͨ͠ʂ DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG
1ਓͰΔͱେมʂʂ ɾɾɾ ׂ(ػೳ)͕ҟͳΔϑϩʔ֤ʑҟͳΔγεςϜͱ࿈ܞ͢Δ ࿈ܞઌγεςϜ
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
࿈ܞઌγεςϜ༷ʹৄ͍͠ਓʹ͓ئ͍͍ͨ͠ DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ αϒϫʔΫϑϩʔ ࿈ܞઌγεςϜ༷ʹৄ͍͠ਓʹ͓ئ͍͍ͨ͠
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DAGجຊతʹ1ͭͷϑΝΠϧ DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ
ϫʔΫϑϩʔΛׂ։ൃ͢ΔͨΊͷ݅
ϫʔΫϑϩʔΛׂ։ൃ͢ΔͨΊͷ݅ • ϫʔΫϑϩʔ(DAG)Λߏ͢ΔαϒϫʔΫϑϩʔ͕ϑΝΠϧͰ͞ΕΔ • ϑΝΠϧؒͰґଘ͕ؔఆٛͰ͖Δ • ϑΝΠϧؒͰఆٛ͞Εͨґଘ͕ؔUI্ͰՄࢹԽ͞ΕΔ • ฒྻ࣮ߦ੍͕ޚͰ͖Δ αϒϫʔΫϑϩʔ1
αϒϫʔΫϑϩʔ2 αϒϫʔΫϑϩʔ3 ґଘؔ
ఏҊํ๏ λεΫΛ࡞ΔؔΛ࡞ׂͬͯ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ • ֤αϒϫʔΫϑϩʔͷఆٛϑΝΠϧͰɺͦͷαϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛ ฦ͢ → αϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫ͕͔ΕɺαϒϫʔΫϑϩʔؒͷґଘؔఆٛ Ͱ͖Δ ˍ ͦͷґଘؔఆٛʹ͓͍ͯαϒϫʔΫϑϩʔͷதϒϥοΫϘοΫεʹͰ͖Δ ίϯηϓτ
αϒϫʔΫϑϩʔ1 αϒϫʔΫϑϩʔ2 αϒϫʔΫϑϩʔ3
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔͷఆٛ - DAGΦϒδΣΫτΛड͚औΓαϒϫʔΫϑ ϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛฦؔ͢Λ࡞ - ͜ͷؔͰαϒϫʔΫϑϩʔͷλεΫͱͦ ͷґଘؔΛఆٛ sw1.py αϒϫʔΫϑϩʔ1ɿsw1.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔͷఆٛ sw1.py ؔͰฦ͢༻ʹ࠷ॳͱ࠷ޙͷλεΫఆٛ αϒϫʔΫϑϩʔͷػೳ෦ͷλεΫఆٛ ґଘؔఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ ֤αϒϫʔΫϑϩʔͷbuild_tasks͔Β࠷ॳͱ࠷ޙͷλ εΫ͕ฦΔ main.py αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ ֤αϒϫʔΫϑϩʔͷbuild_tasks͔Β࠷ॳͱ࠷ޙͷλ εΫ͕ฦΔ ্هͰฦͬͨλεΫΛͬͯґଘؔΛఆٛ main.py αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py
·ͱΊ - ػೳͷҟͳΔϑϩʔͰߏ͞ΕΔDAGଓઌ༷ʑͳͷͰ1ਓͰେมͳ ߹͕͋ͬͨ - λεΫΛ࡞ΔؔΛ༻͍ͯαϒϫʔΫϑϩʔ୯ҐͰϑΝΠϧׂͰ͖։ൃ͕ ָʹͳΓ·ͨ͠ - αϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛ͔ͭͬͯґଘؔఆٛ -
αϒϫʔΫϑϩʔؒґଘؔఆٛͰαϒϫʔΫϑϩʔͷதϒϥοΫϘοΫεʹͰ͖Δ - UIͷGraph viewͰDAGશମΛݟΔͱ͖ʹগ͠ݟ௨͕͠ѱ͍͔ʁ - SubDagͰ͋Ε·ͱΊͯදࣔͯ͘͠ΕΔ
Appendix
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ Cloud Composer Astronomer
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ Α͍ - ϑΝΠϧΛՄೳ - αϒϫʔΫϑϩʔΛ·ͱΊͯදࣔͰ͖Δ
- ґଘ͕ؔՄࢹԽ͞ΕΔ Α͘ͳ͍ - ݱঢ়SubDagΛ͏ʹҙ͕ଟ͘ɺ ༻͕ਪ͞Εͯͳ͍
TriggerDagRunOperator / ExternalTaskSensor Α͍ - TriggerSensorͰґଘؔΛఆٛՄೳ - (ผͷDAGͱͯ͠)ϑΝΠϧΛՄೳ Α͘ͳ͍ -
DAGؒͷґଘ͕ؔUI্ͰՄࢹԽ͞Εͳ͍ - αϒϫʔΫϑϩʔ͝ͱʹDAG࡞ʹͳΔ task 1-1 sensor task 2-1 task 2-2 DAG 1 DAG 2 DAG 1 DAG 2 trigger
ݕ౼ͨ͠ํ๏ͷൺֱ