Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
End-to-end automated data science process using...
Search
Keerthi
October 31, 2018
Education
2
250
End-to-end automated data science process using Airflow.
End-to-end automated data science process using Airflow.
Keerthi
October 31, 2018
Tweet
Share
Other Decks in Education
See All in Education
万博マニアックマップを支えるオープンデータとその裏側
barsaka2
0
790
GitHubとAzureを使って開発者になろう
ymd65536
1
160
日本の情報系社会人院生のリアル -JAIST 修士編-
yurikomium
1
110
Tutorial: Foundations of Blind Source Separation and Its Advances in Spatial Self-Supervised Learning
yoshipon
1
150
『会社を知ってもらう』から『安心して活躍してもらう』までの プロセスとフロー
sasakendayo
0
260
生態系ウォーズ - ルールブック
yui_itoshima
1
250
Pythonパッケージ管理 [uv] 完全入門
mickey_kubo
23
23k
2025年度春学期 統計学 第10回 分布の推測とは ー 標本調査,度数分布と確率分布 (2025. 6. 12)
akiraasano
PRO
0
220
OJTに夢を見すぎていませんか? ロールプレイ研修の試行錯誤/tryanderror-in-roleplaying-training
takipone
1
220
データで見る赤ちゃんの成長
syuchimu
0
250
アントレプレナーシップ教育 ~ 自分で自分の幸せを決めるために ~
yoshizaki
0
160
みんなのコードD&I推進レポート2025 テクノロジー分野のジェンダーギャップとその取り組みについて
codeforeveryone
0
180
Featured
See All Featured
How GitHub (no longer) Works
holman
315
140k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.5k
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.4k
Art, The Web, and Tiny UX
lynnandtonic
302
21k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
139
34k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Docker and Python
trallard
45
3.5k
Building Applications with DynamoDB
mza
96
6.6k
Unsuck your backbone
ammeep
671
58k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
110
20k
4 Signs Your Business is Dying
shpigford
184
22k
Transcript
End-to-end automated data science process using Airflow. Evive
About Evive • Data Driven benefit navigator • Founded in
2006 • 400 + employees
Evive Data 15 2.5M 400 Data team Evive Employee Total
Active members
Data Usage 500+GB 50+ 30+ Total data per day Number
of data channels Number of models running daily
Why Airflow THE WORKFLOW Ingestion Merge data from multiple sources
Standardise Verify Publish
Airflow workers Data Sources Scheduler Database
Airflow Architecture
Functionalities • Scheduling • Dependency management • Error recovery •
Monitoring • Versioning • Mailing and alerting
Creating a dag and an operator
Scheduling tasks
File sensor • Operator that listens to a particular directory
and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
Monitoring using airflow dashboard
Versioning • Versioning can be easily incorporated in airflow as
the entire dag execution happens as one instance. • You can version your data as well as model outputs.
Mailing and alerting system
Future work • Integrating with the existing database architecture and
ETL pipeline • Airflow Kubernetes executors