Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
End-to-end automated data science process using...
Search
Keerthi
October 31, 2018
Education
2
260
End-to-end automated data science process using Airflow.
End-to-end automated data science process using Airflow.
Keerthi
October 31, 2018
Tweet
Share
Other Decks in Education
See All in Education
Analysis and Validation - Lecture 4 - Information Visualisation (4019538FNR)
signer
PRO
0
2.6k
Activité_5_-_Les_indicateurs_du_climat_global.pdf
bernhardsvt
0
210
JAPAN AI CUP Prediction Tutorial
upura
2
930
Lenguajes de Programacion (Ingresantes UNI 2026)
robintux
0
150
SSH公開鍵認証 / 02-b-ssh
kaityo256
PRO
0
120
Introduction - Lecture 1 - Information Visualisation (4019538FNR)
signer
PRO
0
5.3k
応募課題(’25広島)
forget1900
0
950
Data Representation - Lecture 3 - Information Visualisation (4019538FNR)
signer
PRO
1
2.9k
Postcards
gabrielramirezv
0
160
Padlet opetuksessa
matleenalaakso
12
15k
The browser strikes back
jonoalderson
0
840
Gluon Recruit Deck
gluon
0
160
Featured
See All Featured
Abbi's Birthday
coloredviolet
2
5.8k
For a Future-Friendly Web
brad_frost
183
10k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.7k
A brief & incomplete history of UX Design for the World Wide Web: 1989–2019
jct
1
330
The Mindset for Success: Future Career Progression
greggifford
PRO
0
290
Amusing Abliteration
ianozsvald
0
140
Product Roadmaps are Hard
iamctodd
PRO
55
12k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
420
Visualization
eitanlees
150
17k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
1
210
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
10k
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
490
Transcript
End-to-end automated data science process using Airflow. Evive
About Evive • Data Driven benefit navigator • Founded in
2006 • 400 + employees
Evive Data 15 2.5M 400 Data team Evive Employee Total
Active members
Data Usage 500+GB 50+ 30+ Total data per day Number
of data channels Number of models running daily
Why Airflow THE WORKFLOW Ingestion Merge data from multiple sources
Standardise Verify Publish
Airflow workers Data Sources Scheduler Database
Airflow Architecture
Functionalities • Scheduling • Dependency management • Error recovery •
Monitoring • Versioning • Mailing and alerting
Creating a dag and an operator
Scheduling tasks
File sensor • Operator that listens to a particular directory
and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
Monitoring using airflow dashboard
Versioning • Versioning can be easily incorporated in airflow as
the entire dag execution happens as one instance. • You can version your data as well as model outputs.
Mailing and alerting system
Future work • Integrating with the existing database architecture and
ETL pipeline • Airflow Kubernetes executors