Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
End-to-end automated data science process using...
Search
Keerthi
October 31, 2018
Education
2
260
End-to-end automated data science process using Airflow.
End-to-end automated data science process using Airflow.
Keerthi
October 31, 2018
Tweet
Share
Other Decks in Education
See All in Education
Introduction - Lecture 1 - Human-Computer Interaction (1023841ANR)
signer
PRO
0
2.5k
2025年度春学期 統計学 第15回 分布についての仮説を検証する ー 仮説検定(2) (2025. 7. 17)
akiraasano
PRO
0
110
2025年度春学期 統計学 第11回 分布の「型」を考える ー 確率分布モデルと正規分布 (2025. 6. 19)
akiraasano
PRO
0
180
小学校女性教員向け プログラミング教育研修プログラム「SteP」の実践と課題
codeforeveryone
0
130
American Airlines® USA Contact Numbers: The Ultimate 2025 Guide
lievliev
0
250
理想の英語力に一直線!最高効率な英語学習のすゝめ
logica0419
6
410
ÉTICA, INCLUSIÓN, EDUCACIÓN INTEGRAL Y NEURODERECHOS EN EL CONTEXTO DEL NEUROMANAGEMENT
jvpcubias
0
110
Open Source Summit Japan 2025のボランティアをしませんか
kujiraitakahiro
0
880
授業レポート:共感と協調のリーダーシップ(2025年上期)
jibunal
0
110
Avoin jakaminen ja Creative Commons -lisenssit
matleenalaakso
0
2k
チーム開発における責任と感謝の話
ssk1991
0
310
2025年度春学期 統計学 第14回 分布についての仮説を検証する ー 仮説検定(1) (2025. 7. 10)
akiraasano
PRO
0
150
Featured
See All Featured
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.5k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3.1k
Navigating Team Friction
lara
189
15k
Site-Speed That Sticks
csswizardry
11
870
Code Reviewing Like a Champion
maltzj
525
40k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Done Done
chrislema
185
16k
Optimizing for Happiness
mojombo
379
70k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.5k
GraphQLとの向き合い方2022年版
quramy
49
14k
A designer walks into a library…
pauljervisheath
208
24k
[RailsConf 2023] Rails as a piece of cake
palkan
57
5.9k
Transcript
End-to-end automated data science process using Airflow. Evive
About Evive • Data Driven benefit navigator • Founded in
2006 • 400 + employees
Evive Data 15 2.5M 400 Data team Evive Employee Total
Active members
Data Usage 500+GB 50+ 30+ Total data per day Number
of data channels Number of models running daily
Why Airflow THE WORKFLOW Ingestion Merge data from multiple sources
Standardise Verify Publish
Airflow workers Data Sources Scheduler Database
Airflow Architecture
Functionalities • Scheduling • Dependency management • Error recovery •
Monitoring • Versioning • Mailing and alerting
Creating a dag and an operator
Scheduling tasks
File sensor • Operator that listens to a particular directory
and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
Monitoring using airflow dashboard
Versioning • Versioning can be easily incorporated in airflow as
the entire dag execution happens as one instance. • You can version your data as well as model outputs.
Mailing and alerting system
Future work • Integrating with the existing database architecture and
ETL pipeline • Airflow Kubernetes executors