Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tech x Marketing #4 Airflowでもサブワークフロー単位で分割開発したい!
Search
Naoki Matsuda
August 27, 2020
Programming
0
200
Tech x Marketing #4 Airflowでもサブワークフロー単位で分割開発したい!
Naoki Matsuda
August 27, 2020
Tweet
Share
More Decks by Naoki Matsuda
See All by Naoki Matsuda
[PyCon JP 2019] 新米Pythonistaが贈るAirflow入門&活用事例紹介
matsudan
2
6.8k
Other Decks in Programming
See All in Programming
Package Management Learnings from Homebrew
mikemcquaid
0
230
AIで開発はどれくらい加速したのか?AIエージェントによるコード生成を、現場の評価と研究開発の評価の両面からdeep diveしてみる
daisuketakeda
1
2.5k
SourceGeneratorのススメ
htkym
0
200
React Native × React Router v7 API通信の共通化で考えるべきこと
suguruooki
0
100
IFSによる形状設計/デモシーンの魅力 @ 慶應大学SFC
gam0022
1
310
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
620
Claude Codeと2つの巻き戻し戦略 / Two Rewind Strategies with Claude Code
fruitriin
0
150
ぼくの開発環境2026
yuzneri
0
250
Best-Practices-for-Cortex-Analyst-and-AI-Agent
ryotaroikeda
1
110
CSC307 Lecture 03
javiergs
PRO
1
490
Oxlint JS plugins
kazupon
1
1k
Raku Raku Notion 20260128
hareyakayuruyaka
0
370
Featured
See All Featured
Leadership Guide Workshop - DevTernity 2021
reverentgeek
1
200
How to Talk to Developers About Accessibility
jct
2
140
4 Signs Your Business is Dying
shpigford
187
22k
Lessons Learnt from Crawling 1000+ Websites
charlesmeaden
PRO
1
1.1k
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
68
AI Search: Implications for SEO and How to Move Forward - #ShenzhenSEOConference
aleyda
1
1.1k
Heart Work Chapter 1 - Part 1
lfama
PRO
5
35k
Designing for humans not robots
tammielis
254
26k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
750
Between Models and Reality
mayunak
1
200
Building Applications with DynamoDB
mza
96
6.9k
ラッコキーワード サービス紹介資料
rakko
1
2.3M
Transcript
AirflowͰαϒϫʔΫϑϩʔ ୯ҐͰׂ։ൃ͍ͨ͠ʂ Tech x Marketing meetup #4 גࣜձࣾ ి௨σδλϧ দా
थ 2020/08/27
ࣗݾհ - ໊લɿদా थ (·ͭͩ ͳ͓͖) - ॴଐɿגࣜձࣾ ి௨σδλϧ (2018ೖࣾ)
- ۀɿόοΫΤϯυαʔϏεAPI࣮ ETLߏஙͳͲ (༻ݴޠɿGo / Python) - ֶੜ࣌ɿےܹ࣌ͷΤωϧΪʔফඅ/࢈ੜͷݚڀ AirflowͷTalk & Blogࣥචɿ - ৽ถPythonista͕ଃΔAirflowೖ&׆༻ࣄྫհ(PyConJP2019) - AirflowͷλεΫ࣮ߦڥΛ͢Δ(DentsuDigital Tech Blog)
༰ Γ͍ͨ͜ͱ AirflowͷDAG (ϫʔΫϑϩʔ)ΛෳਓͰ։ൃ͍ͨ͠ʂ ݕ౼ͨ͠ํ๏ X SubDagOperatorʢฒྻ࣮ߦͷ੍ޚෆՄɺϚωʔδυαʔϏεͰ༻͕ਪ͞Εͯͳ͍ʣ X TriggerDagRunOperator /
ExternalTaskSensorʢґଘ͕ؔΘ͔ΓͮΒ͘ͳΔʣ ̋ λεΫΛ࡞ΔؔΛ࡞ׂͬͯʢఏҊख๏ʣ
Apache Airflowͱ - PythonͰهड़͞ΕͨϫʔΫϑϩʔ(DAG)ͷ࣮ߦɾࢹπʔϧ - ApacheτοϓϨϕϧϓϩδΣΫτͷͻͱͭ ίϯτϦϏϡʔλɿ1000 + ελʔɿ17000 +
Apache Airflowͱ - DAGɿTaskͷ࣮ߦॱংΛܾఆ͢Δάϥϑ - OperatorɿςϯϓϨʔτԽ͞Ε࣮ͨߦ୯Ґ - Taskɿύϥϝʔλ͕༩͑ΒΕͨOperator Operators (Python,
HTTP, MySQL, KubernetesPod…) A B C D https://www.slideshare.net/potix2_jp/airflow-224004058 DAG - Taskґଘؔɿ >> Ͱఆٛ ྫ) A >> [B, C] >> D Task
Γ͍ͨ͜ͱ AirflowͷDAG (ϫʔΫϑϩʔ)ΛෳਓͰ։ൃ͍ͨ͠ʂ ݕ౼ͨ͠ํ๏ X SubDagOperatorʢฒྻ࣮ߦͷ੍ޚෆՄɺϚωʔδυαʔϏεͰ༻͕ਪ͞Εͯͳ͍ʣ X TriggerDagRunOperator / ExternalTaskSensorʢґଘ͕ؔΘ͔ΓͮΒ͘ͳΔʣ
̋ λεΫΛ࡞ΔؔΛ࡞ׂͬͯʢఏҊख๏ʣ ͷഎܠ ༰
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DAGׂ(ػೳ)͕ҟͳΔϑϩʔʹΑΓߏ͞Ε͏Δ
DAGΛෳਓͰ։ൃ͍ͨ͠ʂ ɾɾɾ DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ
σʔλϚʔτΛߋ৽͢ΔDAG ׂ(ػೳ)͕ҟͳΔϑϩʔ֤ʑҟͳΔγεςϜͱ࿈ܞ͢Δ ࿈ܞઌγεςϜ
DAGΛෳਓͰ։ൃ͍ͨ͠ʂ DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG
1ਓͰΔͱେมʂʂ ɾɾɾ ׂ(ػೳ)͕ҟͳΔϑϩʔ֤ʑҟͳΔγεςϜͱ࿈ܞ͢Δ ࿈ܞઌγεςϜ
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
࿈ܞઌγεςϜ༷ʹৄ͍͠ਓʹ͓ئ͍͍ͨ͠ DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ αϒϫʔΫϑϩʔ ࿈ܞઌγεςϜ༷ʹৄ͍͠ਓʹ͓ئ͍͍ͨ͠
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DAGجຊతʹ1ͭͷϑΝΠϧ DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ
ϫʔΫϑϩʔΛׂ։ൃ͢ΔͨΊͷ݅
ϫʔΫϑϩʔΛׂ։ൃ͢ΔͨΊͷ݅ • ϫʔΫϑϩʔ(DAG)Λߏ͢ΔαϒϫʔΫϑϩʔ͕ϑΝΠϧͰ͞ΕΔ • ϑΝΠϧؒͰґଘ͕ؔఆٛͰ͖Δ • ϑΝΠϧؒͰఆٛ͞Εͨґଘ͕ؔUI্ͰՄࢹԽ͞ΕΔ • ฒྻ࣮ߦ੍͕ޚͰ͖Δ αϒϫʔΫϑϩʔ1
αϒϫʔΫϑϩʔ2 αϒϫʔΫϑϩʔ3 ґଘؔ
ఏҊํ๏ λεΫΛ࡞ΔؔΛ࡞ׂͬͯ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ • ֤αϒϫʔΫϑϩʔͷఆٛϑΝΠϧͰɺͦͷαϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛ ฦ͢ → αϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫ͕͔ΕɺαϒϫʔΫϑϩʔؒͷґଘؔఆٛ Ͱ͖Δ ˍ ͦͷґଘؔఆٛʹ͓͍ͯαϒϫʔΫϑϩʔͷதϒϥοΫϘοΫεʹͰ͖Δ ίϯηϓτ
αϒϫʔΫϑϩʔ1 αϒϫʔΫϑϩʔ2 αϒϫʔΫϑϩʔ3
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔͷఆٛ - DAGΦϒδΣΫτΛड͚औΓαϒϫʔΫϑ ϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛฦؔ͢Λ࡞ - ͜ͷؔͰαϒϫʔΫϑϩʔͷλεΫͱͦ ͷґଘؔΛఆٛ sw1.py αϒϫʔΫϑϩʔ1ɿsw1.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔͷఆٛ sw1.py ؔͰฦ͢༻ʹ࠷ॳͱ࠷ޙͷλεΫఆٛ αϒϫʔΫϑϩʔͷػೳ෦ͷλεΫఆٛ ґଘؔఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ ֤αϒϫʔΫϑϩʔͷbuild_tasks͔Β࠷ॳͱ࠷ޙͷλ εΫ͕ฦΔ main.py αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ ֤αϒϫʔΫϑϩʔͷbuild_tasks͔Β࠷ॳͱ࠷ޙͷλ εΫ͕ฦΔ ্هͰฦͬͨλεΫΛͬͯґଘؔΛఆٛ main.py αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py
·ͱΊ - ػೳͷҟͳΔϑϩʔͰߏ͞ΕΔDAGଓઌ༷ʑͳͷͰ1ਓͰେมͳ ߹͕͋ͬͨ - λεΫΛ࡞ΔؔΛ༻͍ͯαϒϫʔΫϑϩʔ୯ҐͰϑΝΠϧׂͰ͖։ൃ͕ ָʹͳΓ·ͨ͠ - αϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛ͔ͭͬͯґଘؔఆٛ -
αϒϫʔΫϑϩʔؒґଘؔఆٛͰαϒϫʔΫϑϩʔͷதϒϥοΫϘοΫεʹͰ͖Δ - UIͷGraph viewͰDAGશମΛݟΔͱ͖ʹগ͠ݟ௨͕͠ѱ͍͔ʁ - SubDagͰ͋Ε·ͱΊͯදࣔͯ͘͠ΕΔ
Appendix
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ Cloud Composer Astronomer
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ Α͍ - ϑΝΠϧΛՄೳ - αϒϫʔΫϑϩʔΛ·ͱΊͯදࣔͰ͖Δ
- ґଘ͕ؔՄࢹԽ͞ΕΔ Α͘ͳ͍ - ݱঢ়SubDagΛ͏ʹҙ͕ଟ͘ɺ ༻͕ਪ͞Εͯͳ͍
TriggerDagRunOperator / ExternalTaskSensor Α͍ - TriggerSensorͰґଘؔΛఆٛՄೳ - (ผͷDAGͱͯ͠)ϑΝΠϧΛՄೳ Α͘ͳ͍ -
DAGؒͷґଘ͕ؔUI্ͰՄࢹԽ͞Εͳ͍ - αϒϫʔΫϑϩʔ͝ͱʹDAG࡞ʹͳΔ task 1-1 sensor task 2-1 task 2-2 DAG 1 DAG 2 DAG 1 DAG 2 trigger
ݕ౼ͨ͠ํ๏ͷൺֱ