Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Tech x Marketing #4 Airflowでもサブワークフロー単位で分割開発したい!
Search
Naoki Matsuda
August 27, 2020
Programming
0
150
Tech x Marketing #4 Airflowでもサブワークフロー単位で分割開発したい!
Naoki Matsuda
August 27, 2020
Tweet
Share
More Decks by Naoki Matsuda
See All by Naoki Matsuda
[PyCon JP 2019] 新米Pythonistaが贈るAirflow入門&活用事例紹介
matsudan
2
6.1k
Other Decks in Programming
See All in Programming
Why Jakarta EE Matters to Spring - and Vice Versa
ivargrimstad
0
1.2k
Jakarta EE meets AI
ivargrimstad
0
700
受け取る人から提供する人になるということ
little_rubyist
0
250
初めてDefinitelyTypedにPRを出した話
syumai
0
420
subpath importsで始めるモック生活
10tera
0
320
AI時代におけるSRE、 あるいはエンジニアの生存戦略
pyama86
6
1.2k
Arm移行タイムアタック
qnighy
0
340
Nurturing OpenJDK distribution: Eclipse Temurin Success History and plan
ivargrimstad
0
1k
Amazon Qを使ってIaCを触ろう!
maruto
0
420
OnlineTestConf: Test Automation Friend or Foe
maaretp
0
120
OSSで起業してもうすぐ10年 / Open Source Conference 2024 Shimane
furukawayasuto
0
110
Remix on Hono on Cloudflare Workers
yusukebe
1
300
Featured
See All Featured
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
131
33k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.1k
GraphQLの誤解/rethinking-graphql
sonatard
67
10k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
246
1.3M
The Illustrated Children's Guide to Kubernetes
chrisshort
48
48k
The World Runs on Bad Software
bkeepers
PRO
65
11k
The Power of CSS Pseudo Elements
geoffreycrofte
73
5.3k
Adopting Sorbet at Scale
ufuk
73
9.1k
Build your cross-platform service in a week with App Engine
jlugia
229
18k
What's in a price? How to price your products and services
michaelherold
243
12k
Code Review Best Practice
trishagee
64
17k
Transcript
AirflowͰαϒϫʔΫϑϩʔ ୯ҐͰׂ։ൃ͍ͨ͠ʂ Tech x Marketing meetup #4 גࣜձࣾ ి௨σδλϧ দా
थ 2020/08/27
ࣗݾհ - ໊લɿদా थ (·ͭͩ ͳ͓͖) - ॴଐɿגࣜձࣾ ి௨σδλϧ (2018ೖࣾ)
- ۀɿόοΫΤϯυαʔϏεAPI࣮ ETLߏஙͳͲ (༻ݴޠɿGo / Python) - ֶੜ࣌ɿےܹ࣌ͷΤωϧΪʔফඅ/࢈ੜͷݚڀ AirflowͷTalk & Blogࣥචɿ - ৽ถPythonista͕ଃΔAirflowೖ&׆༻ࣄྫհ(PyConJP2019) - AirflowͷλεΫ࣮ߦڥΛ͢Δ(DentsuDigital Tech Blog)
༰ Γ͍ͨ͜ͱ AirflowͷDAG (ϫʔΫϑϩʔ)ΛෳਓͰ։ൃ͍ͨ͠ʂ ݕ౼ͨ͠ํ๏ X SubDagOperatorʢฒྻ࣮ߦͷ੍ޚෆՄɺϚωʔδυαʔϏεͰ༻͕ਪ͞Εͯͳ͍ʣ X TriggerDagRunOperator /
ExternalTaskSensorʢґଘ͕ؔΘ͔ΓͮΒ͘ͳΔʣ ̋ λεΫΛ࡞ΔؔΛ࡞ׂͬͯʢఏҊख๏ʣ
Apache Airflowͱ - PythonͰهड़͞ΕͨϫʔΫϑϩʔ(DAG)ͷ࣮ߦɾࢹπʔϧ - ApacheτοϓϨϕϧϓϩδΣΫτͷͻͱͭ ίϯτϦϏϡʔλɿ1000 + ελʔɿ17000 +
Apache Airflowͱ - DAGɿTaskͷ࣮ߦॱংΛܾఆ͢Δάϥϑ - OperatorɿςϯϓϨʔτԽ͞Ε࣮ͨߦ୯Ґ - Taskɿύϥϝʔλ͕༩͑ΒΕͨOperator Operators (Python,
HTTP, MySQL, KubernetesPod…) A B C D https://www.slideshare.net/potix2_jp/airflow-224004058 DAG - Taskґଘؔɿ >> Ͱఆٛ ྫ) A >> [B, C] >> D Task
Γ͍ͨ͜ͱ AirflowͷDAG (ϫʔΫϑϩʔ)ΛෳਓͰ։ൃ͍ͨ͠ʂ ݕ౼ͨ͠ํ๏ X SubDagOperatorʢฒྻ࣮ߦͷ੍ޚෆՄɺϚωʔδυαʔϏεͰ༻͕ਪ͞Εͯͳ͍ʣ X TriggerDagRunOperator / ExternalTaskSensorʢґଘ͕ؔΘ͔ΓͮΒ͘ͳΔʣ
̋ λεΫΛ࡞ΔؔΛ࡞ׂͬͯʢఏҊख๏ʣ ͷഎܠ ༰
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DAGׂ(ػೳ)͕ҟͳΔϑϩʔʹΑΓߏ͞Ε͏Δ
DAGΛෳਓͰ։ൃ͍ͨ͠ʂ ɾɾɾ DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ
σʔλϚʔτΛߋ৽͢ΔDAG ׂ(ػೳ)͕ҟͳΔϑϩʔ֤ʑҟͳΔγεςϜͱ࿈ܞ͢Δ ࿈ܞઌγεςϜ
DAGΛෳਓͰ։ൃ͍ͨ͠ʂ DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG
1ਓͰΔͱେมʂʂ ɾɾɾ ׂ(ػೳ)͕ҟͳΔϑϩʔ֤ʑҟͳΔγεςϜͱ࿈ܞ͢Δ ࿈ܞઌγεςϜ
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
࿈ܞઌγεςϜ༷ʹৄ͍͠ਓʹ͓ئ͍͍ͨ͠ DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ αϒϫʔΫϑϩʔ ࿈ܞઌγεςϜ༷ʹৄ͍͠ਓʹ͓ئ͍͍ͨ͠
DWHͷσʔλΛՃ͢ Δϑϩʔ εϓϨουγʔτ͔Β σʔλΛऔಘͯ͠Ճ͢ Δϑϩʔ ՃσʔλΛಥ߹ͯ͠ σʔλϚʔτΛߋ৽͢Δ ϑϩʔ σʔλϚʔτΛߋ৽͢ΔDAG DAGΛෳਓͰ։ൃ͍ͨ͠ʂ
DAGجຊతʹ1ͭͷϑΝΠϧ DWHৄ͍͠ਓ εϓϨουγʔ τपΓৄ͍͠ਓ
ϫʔΫϑϩʔΛׂ։ൃ͢ΔͨΊͷ݅
ϫʔΫϑϩʔΛׂ։ൃ͢ΔͨΊͷ݅ • ϫʔΫϑϩʔ(DAG)Λߏ͢ΔαϒϫʔΫϑϩʔ͕ϑΝΠϧͰ͞ΕΔ • ϑΝΠϧؒͰґଘ͕ؔఆٛͰ͖Δ • ϑΝΠϧؒͰఆٛ͞Εͨґଘ͕ؔUI্ͰՄࢹԽ͞ΕΔ • ฒྻ࣮ߦ੍͕ޚͰ͖Δ αϒϫʔΫϑϩʔ1
αϒϫʔΫϑϩʔ2 αϒϫʔΫϑϩʔ3 ґଘؔ
ఏҊํ๏ λεΫΛ࡞ΔؔΛ࡞ׂͬͯ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ • ֤αϒϫʔΫϑϩʔͷఆٛϑΝΠϧͰɺͦͷαϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛ ฦ͢ → αϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫ͕͔ΕɺαϒϫʔΫϑϩʔؒͷґଘؔఆٛ Ͱ͖Δ ˍ ͦͷґଘؔఆٛʹ͓͍ͯαϒϫʔΫϑϩʔͷதϒϥοΫϘοΫεʹͰ͖Δ ίϯηϓτ
αϒϫʔΫϑϩʔ1 αϒϫʔΫϑϩʔ2 αϒϫʔΫϑϩʔ3
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔͷఆٛ - DAGΦϒδΣΫτΛड͚औΓαϒϫʔΫϑ ϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛฦؔ͢Λ࡞ - ͜ͷؔͰαϒϫʔΫϑϩʔͷλεΫͱͦ ͷґଘؔΛఆٛ sw1.py αϒϫʔΫϑϩʔ1ɿsw1.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔͷఆٛ sw1.py ؔͰฦ͢༻ʹ࠷ॳͱ࠷ޙͷλεΫఆٛ αϒϫʔΫϑϩʔͷػೳ෦ͷλεΫఆٛ ґଘؔఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py main.py - αϒϫʔΫϑϩʔΛఆٛ͢ΔͨΊͷϑΝΠϧΛ࡞ - αϒϫʔΫϑϩʔؒͷґଘؔΛఆٛ
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ ֤αϒϫʔΫϑϩʔͷbuild_tasks͔Β࠷ॳͱ࠷ޙͷλ εΫ͕ฦΔ main.py αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py
λεΫΛ࡞ΔؔΛ࡞ׂͬͯɿαϒϫʔΫϑϩʔؒͷґଘؔఆٛ ֤αϒϫʔΫϑϩʔͷbuild_tasks͔Β࠷ॳͱ࠷ޙͷλ εΫ͕ฦΔ ্هͰฦͬͨλεΫΛͬͯґଘؔΛఆٛ main.py αϒϫʔΫϑϩʔ1ɿsw1.py αϒϫʔΫϑϩʔ2ɿsw2.py αϒϫʔΫϑϩʔ3ɿsw3.py
·ͱΊ - ػೳͷҟͳΔϑϩʔͰߏ͞ΕΔDAGଓઌ༷ʑͳͷͰ1ਓͰେมͳ ߹͕͋ͬͨ - λεΫΛ࡞ΔؔΛ༻͍ͯαϒϫʔΫϑϩʔ୯ҐͰϑΝΠϧׂͰ͖։ൃ͕ ָʹͳΓ·ͨ͠ - αϒϫʔΫϑϩʔͷ࠷ॳͱ࠷ޙͷλεΫΛ͔ͭͬͯґଘؔఆٛ -
αϒϫʔΫϑϩʔؒґଘؔఆٛͰαϒϫʔΫϑϩʔͷதϒϥοΫϘοΫεʹͰ͖Δ - UIͷGraph viewͰDAGશମΛݟΔͱ͖ʹগ͠ݟ௨͕͠ѱ͍͔ʁ - SubDagͰ͋Ε·ͱΊͯදࣔͯ͘͠ΕΔ
Appendix
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ Cloud Composer Astronomer
SubDagOperator - ओʹ܁Γฦ͠ύλʔϯͰར༻͞ΕΔɻ - ฒྻλεΫΛSubDagΛ͑·ͱΊΒΕΔ Α͍ - ϑΝΠϧΛՄೳ - αϒϫʔΫϑϩʔΛ·ͱΊͯදࣔͰ͖Δ
- ґଘ͕ؔՄࢹԽ͞ΕΔ Α͘ͳ͍ - ݱঢ়SubDagΛ͏ʹҙ͕ଟ͘ɺ ༻͕ਪ͞Εͯͳ͍
TriggerDagRunOperator / ExternalTaskSensor Α͍ - TriggerSensorͰґଘؔΛఆٛՄೳ - (ผͷDAGͱͯ͠)ϑΝΠϧΛՄೳ Α͘ͳ͍ -
DAGؒͷґଘ͕ؔUI্ͰՄࢹԽ͞Εͳ͍ - αϒϫʔΫϑϩʔ͝ͱʹDAG࡞ʹͳΔ task 1-1 sensor task 2-1 task 2-2 DAG 1 DAG 2 DAG 1 DAG 2 trigger
ݕ౼ͨ͠ํ๏ͷൺֱ