Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
[PyCon JP 2019] 新米Pythonistaが贈るAirflow入門&活用事例紹介
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Naoki Matsuda
September 17, 2019
Technology
2
6.8k
[PyCon JP 2019] 新米Pythonistaが贈るAirflow入門&活用事例紹介
PyCon JP 2019の発表資料です。
Naoki Matsuda
September 17, 2019
Tweet
Share
More Decks by Naoki Matsuda
See All by Naoki Matsuda
Tech x Marketing #4 Airflowでもサブワークフロー単位で分割開発したい!
matsudan
0
200
Other Decks in Technology
See All in Technology
The Engineer with a Three-Year Cycle
e99h2121
0
170
2026年はチャンキングを極める!
shibuiwilliam
2
360
3分でわかる!新機能 AWS Transform custom
sato4mi
1
210
いよいよ仕事を奪われそうな波が来たぜ
kazzpapa3
3
250
AWS Amplify Conference 2026 - 仕様からリリースまで一気通貫生成 AI 時代のフルスタック開発
inariku
3
390
EventBridge API Destination × AgentCore Runtimeで実現するLambdaレスなイベント駆動エージェント
har1101
7
260
AI開発の落とし穴 〜馬には乗ってみよAIには添うてみよ〜
sansantech
PRO
9
3.9k
Lambda Durable FunctionsでStep Functionsの代わりはできるのかを試してみた
smt7174
2
140
SOC2は、取った瞬間よりその後が面白い
3flower
1
200
Kaggleコンペティション「MABe Challenge - Social Action Recognition in Mice」振り返り
yu4u
1
760
Regional_NAT_Gatewayについて_basicとの違い_試した内容スケールアウト_インについて_IPv6_dual_networkでの使い分けなど.pdf
cloudevcode
1
140
Azure SQL Databaseでベクター検索を活用しよう
nakasho
0
100
Featured
See All Featured
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
380
Leading Effective Engineering Teams in the AI Era
addyosmani
9
1.5k
Tips & Tricks on How to Get Your First Job In Tech
honzajavorek
0
420
4 Signs Your Business is Dying
shpigford
187
22k
Into the Great Unknown - MozCon
thekraken
40
2.2k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
16k
Beyond borders and beyond the search box: How to win the global "messy middle" with AI-driven SEO
davidcarrasco
1
43
Typedesign – Prime Four
hannesfritz
42
2.9k
How to optimise 3,500 product descriptions for ecommerce in one day using ChatGPT
katarinadahlin
PRO
0
3.4k
So, you think you're a good person
axbom
PRO
2
1.9k
The innovator’s Mindset - Leading Through an Era of Exponential Change - McGill University 2025
jdejongh
PRO
1
84
KATA
mclloyd
PRO
34
15k
Transcript
৽ถPythonista͕ଃΔAirflowೖ & ׆༻ࣄྫհ PyCon JP 2019 2019.9.17 Naoki Matsuda
Agenda 0. ࣗݾհ 1. Airflowͷ֓ཁ 2. Airflowͷࣾࣄྫհ - ։ൃϓϩμΫτ֓ཁͱ՝, Airflowͷڥ
3. AirflowͰͭ·͍ͮͨ - λεΫؒͷσʔλͷΓͱΓ - DAGಈ࡞֬ೝ ~ dockerͰϩʔΧϧ։ൃڥߏங
ࣗݾհ দా थ (·ͭͩ ͳ͓͖) - ॴଐɿגࣜձࣾ ి௨σδλϧ - ۀɿόοΫΤϯυαʔϏεɺETLपΓͷ։ൃ
- 2018ೖࣾ
1. Airflowͷ֓ཁ
Apache Airflow֓ཁ όονॲཧ͔ΒͳΔϫʔΫϑϩʔͷεέδϡʔϦϯάˍϞχλ Ϧϯά͕ՄೳͳϓϥοτϑΥʔϜ - Airbnbࣾ - Φʔϓϯιʔε (Apache software
foundationͷincubation project)1,2 - PythonͰ࣮͞Ε͍ͯΔ3 - ։ൃίϛϡχςΟ͕׆ൃ3 IUUQTJODVCBUPSBQBDIFPSHQSPKFDUTBJSGMPXIUNM IUUQTBJSGMPXBQBDIFPSHMJDFOTFIUNM IUUQTHJUIVCDPNBQBDIFBJSGMPX
։ൃίϛϡχςΟͷ׆ൃ͞(2019.9.9࣌)
Apache AirflowͰͰ͖Δ͜ͱ - PythonίʔυͰϫʔΫϑϩʔ(DAG)Λఆٛ - ґଘؔʹج͍ͮͨλεΫͷ࣮ߦ - ϫʔΫϑϩʔͷεέδϡʔϦϯά - ϦονͳWeb
UI - DAG࣮ߦεςʔλεͷϞχλϦϯά - λεΫͷϩά֬ೝ - ґଘؔͷՄࢹԽ ͳͲ
PythonίʔυͰϫʔΫϑϩʔఆٛ(DAGͷ࡞) λεΫؒґଘؔͷఆٛ λεΫ1 λεΫ2 DAGͷڞ௨ઃఆ ࣮ߦස, ࣮ߦظؒ, λΠϜΞτ࣌ؒͳͲ
ϫʔΫϑϩʔΛߏ͢ΔλεΫͷ࡞ ≈ - ϫʔΫϑϩʔOperatorͱݺΕΔλεΫʹΑΓߏ͞ΕΔ1 - 1ͭͷOperatorͰ1ͭͷλεΫΛهड़ OperatorͷҾɻ ֤Operator͕ԿͷҾΛ ͱΔ͔υΩϡϝϯτࢀর2 IUUQTBJSGMPXBQBDIFPSHDPODFQUTIUNMPQFSBUPST
IUUQTBJSGMPXBQBDIFPSH@BQJBJSGMPXPQFSBUPSTJOEFYIUNM
ϫʔΫϑϩʔΛߏ͢ΔλεΫͷ࡞ - BashίϚϯυ࣮ߦ: BashOperator - Python࣮ؔߦ: PythonOperator - SQL࣮ߦ: MySqlOperator,
PostgresOperator, … - HTTPϦΫΤετૹ৴: SimpleHttpOperator - ͦͷଞΫϥυܥͳͲ: BigQueryOperator, AWSAthenaOperator, … - ಛఆ݅Ληϯγϯά: Sensor IUUQTBJSGMPXBQBDIFPSHDPODFQUTIUNMPQFSBUPST IUUQTBJSGMPXBQBDIFPSH@BQJBJSGMPXPQFSBUPSTJOEFYIUNM
2. ࣾࣄྫհ - ։ൃϓϩμΫτ֓ཁͱ՝, Airflowͷߏ
։ൃ৫ͱϓϩμΫτʹ͍ͭͯ ɾɾɾ ࠂ৴ σʔλ ϓϥϯφʔ Ӧۀ - ڈ7~9݄, - GoogleDispla
y - ҿྉۀք imp 100000 clicks 5000 cv 700 cost ɾɾɾ ϝσΟΞ ։ൃ৫ νʔϜنɿσʔλΤϯδχΞ ໊ σʔλαΠΤϯςΟετͳͲ໊ d ϓϩμΫτ ͚ࣾσδλϧࠂϓϥϯχϯάπʔϧ ࣾͰѻ͏ϝσΟΞɾΫϥΠΞϯτͷࠂ৴࣮σʔλΛՄࢹԽˍ༧ଌ ϓϩμΫτ (PPHMF :BIPP 5XJUUFS 'BDFCPPL -*/&
։ൃϓϩμΫτʹ͓͚Δ՝ - σʔλ͕RDBʹೖͬͯͳ͍ ৴Ϩϙʔτσʔλ͕ੳ༻ͷྻࢤσʔλϕʔεʹ ͋ͬͨΓɺϚελσʔλ͕εϓϨουγʔτʹ͋ͬͨΓ… - ඞཁͳใΛՃ͢ΔͨΊʹଟ͘ͷϦϨʔγϣϯΛͨͲΔ
։ൃϓϩμΫτʹ͓͚Δ՝ - σʔλ͕RDBʹೖͬͯͳ͍ ৴Ϩϙʔτσʔλ͕ੳ༻ͷྻࢤσʔλϕʔεʹ ͋ͬͨΓɺϚελσʔλ͕εϓϨουγʔτʹ͋ͬͨΓ… - ඞཁͳใΛՃ͢ΔͨΊʹଟ͘ͷϦϨʔγϣϯΛͨͲΔ → ։ൃϓϩμΫτ༻ʹσʔλϚʔτ࡞ RDBʹϑΝΫτ,
σΟϝϯγϣϯςʔϒϧΛETLͰ࡞
Airflowߏ apache-airflow 1.10.2 web worker scheduler Amazon S3 Amazon RDS
Airflow AWS Fargate Amazon ElastiCache Redis Elastic Load Balancing flower DAGs - AWS FargateʹAirflowΛσϓϩΠ ≈ ≈
docker-airflow https://github.com/puckel/docker-airflow
ߏஙͨ͠σʔλϑϩʔ ֤ϝσΟΞࠂ৴σʔλ ϦϨʔγϣϯςʔϒϧܥ JOIN ΫϥΠΞϯτใܥ ʜ ΧϥϜ໊ دͤͳͲ ≈ Amazon
Athena Backend service INSERT "JSGMPX͕࣮ߦ͢ΔλεΫ INSERT
3. AirflowͰͭ·͍ͮͨ - λεΫؒͷσʔλͷΓͱΓ - DAGಈ࡞֬ೝ ~dockerͰϩʔΧϧ։ൃڥߏங
λεΫؒͷσʔλͷΓͱΓ λεΫؒͷσʔλΓͱΓXComΛ͏ - XComͷ͍ํ - XComσʔλΛpush - ؔͰreturn - ؔͰkwargs['task_instance’].
xcom_push(value=hoge, key=‘huga’) - Λฦ͢Operator ྫ: BigqueryGetDataOperator - XCom͔ΒσʔλΛpull - kwargs['task_instance’].xcom_pull() metadata database
λεΫؒͷσʔλͷΓͱΓ BigQuery͔ΒςʔϒϧσʔλΛऔಘͯͦ͠ͷσʔλΛՃ͢Δྫ
λεΫؒͷσʔλͷΓͱΓ BQςʔϒϧ͔Βσʔλऔಘ # XComʹpush͞ΕΔ BigQuery͔ΒςʔϒϧσʔλΛऔಘͯͦ͠ͷσʔλΛՃ͢Δྫ
λεΫؒͷσʔλͷΓͱΓ BQςʔϒϧ͔Βσʔλऔಘ # XComʹpush͞ΕΔ transpose_dataؔΛ࣮ߦ BigQuery͔ΒςʔϒϧσʔλΛऔಘͯͦ͠ͷσʔλΛՃ͢Δྫ
λεΫؒͷσʔλͷΓͱΓ 1. task1Ͱpush͞ΕͨXcomͷ σʔλΛpullͯ͠ 2. ςʔϒϧͷσʔλΛసஔ BQςʔϒϧ͔Βσʔλऔಘ # XComʹpush͞ΕΔ transpose_dataؔΛ࣮ߦ
BigQuery͔ΒςʔϒϧσʔλΛऔಘͯͦ͠ͷσʔλΛՃ͢Δྫ
λεΫؒͷσʔλͷΓͱΓ provide_contextΛTrueʹ͠ͳ͍ͱkwargs[‘task_intance’]ͰKeyError - provide_context=False (default) kwargs : {} - provide_context=True
kwargs: { 'dag': <DAG: sample>, 'ds': '2019-09-10’, 'next_ds': '2019-09-10’, … 'task_instance’: <TaskInstance: sample.task1_2 …> … }
λεΫؒͷσʔλͷΓͱΓ - PythonOperatorͷҾͰTrue OR - default_argsͰઃఆ
DAGಈ࡞֬ೝ ~ϩʔΧϧ։ൃڥߏங - ࡞ͨ͠ϫʔΫϑϩʔ(DAG)ͷςετͲ͏Δʁ എܠɿ - Ϋϥυ্devڥͰͷDAGಈ࡞֬ೝͰS3upload͢Δखؒ - ଞͷਓ͕ಉ͡λΠϛϯάͰ։ൃ͍ͯ͠ΔͱΓͮΒ͍… →
ϩʔΧϧͰDAGͷಈ࡞֬ೝ͍ͨ͠ʂ
DAGಈ࡞֬ೝ ~ϩʔΧϧ։ൃڥߏங → dockerͰAirflowΛϩʔΧϧʹ্ཱͪ͛Δ - ࡞ͨ͠ϫʔΫϑϩʔ(DAG)ͷςετͲ͏Δʁ എܠɿ - Ϋϥυ্devڥͰͷDAGಈ࡞֬ೝͰS3upload͢Δखؒ -
ଞͷਓ͕ಉ͡λΠϛϯάͰ։ൃ͍ͯ͠ΔͱΓͮΒ͍… → ϩʔΧϧͰDAGͷಈ࡞֬ೝ͍ͨ͠ʂ
DAGಈ࡞֬ೝ ~ϩʔΧϧ։ൃڥߏங
DAGಈ࡞֬ೝ ~ϩʔΧϧ։ൃڥߏங LocalExecutorΛ༻
DAGಈ࡞֬ೝ ~ϩʔΧϧ։ൃڥߏங LocalExecutorΛ༻ dagsσΟϨΫτϦΛvolumeͱ͠ ͯϚϯτ
DAGಈ࡞֬ೝ ~ϩʔΧϧ։ൃڥߏங - dockerͷvolumeͱͯ͠dagsσΟϨΫτϦΛϚϯ τ͍ͯ͠ΔͷͰॻ͖͑ͨΒ͙͢ʹө - Web UIʹ͕ࣗ࡞ͨ͠DAGͷΈ͕දࣔ͞ΕΔ - ECR͔ΒimageΛऔͬͯ͘ΔΑ͏ʹͯ͠ຊ൪ͱಉ͡
ڥͰಈ࡞֬ೝͰ͖Δ
·ͱΊ - ETL͕ඞཁͳࣾ։ൃϓϩμΫτʹ͓͍ͯAirflowΛ ͍·ͨ͠ɻ - ຊ൪ڥͷAirflowECS FargateʹσϓϩΠ͠·ͨ͠ɻ - ϩʔΧϧ։ൃڥʹdockerΛ༻ͯ͠։ൃָ͕ʹͳΓ ·ͨ͠ɻ
We are hiring ! https://bit.ly/2UqWPGO
supplementary information
λεΫؒґଘؔͷఆٛ - Ϗοτγϑτԋࢉࢠ(>>, <<)Λ͍λεΫͷґଘؔΛද͢ - ޙଓλεΫͷ࣮ߦ݅શͯͷઌߦλεΫޭ͕σϑΥϧτઃఆ1 - શOperator͕࣋ͭtrigger_ruleҾͰ࣮ߦ݅ΛมߋՄೳ1 IUUQTBJSGMPXBQBDIFPSHDPODFQUTIUNMUSJHHFSSVMFT
- γϯϓϧͳґଘؔ task1 >> task2 - λεΫάϧʔϓ͕͋Δґଘؔ task1 >> [task2-1,task2-2] >> task3
Web UI: ϞχλϦϯά - Tree View - Gantt Chart
Web UI: Variable
ฒྻઃఆ Configuring parallelism in airflow.cfg - parallelism : ࢄॲཧΫϥελશମͰ࣮ߦՄೳͳϓϩηε -
dag_concurrency : ҰͭͷϫʔΧͰಉ࣮࣌ߦՄೳͳ࠷େϓϩηε - max_active_runs_per_dag : DAG෦Ͱಉ࣮࣌ߦՄೳͳ࠷େλε Ϋ - worker_concurrency : ҰͭͷCeleryϫʔΧͰಉ࣮࣌ߦՄೳͳ࠷େ ϓϩηε IUUQTBOBMZUJDTMJWFTFOTFDPKQFOUSZ