Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
End-to-end automated data science process using...
Search
Keerthi
October 31, 2018
Education
2
260
End-to-end automated data science process using Airflow.
End-to-end automated data science process using Airflow.
Keerthi
October 31, 2018
Tweet
Share
Other Decks in Education
See All in Education
10分で学ぶ すてきなモナド
soukouki
1
150
Cifrado asimétrico
irocho
0
380
【洋書和訳:さよならを待つふたりのために】第1章 出会いとメタファー
yaginumatti
0
240
TinyGoをWebブラウザで動かすための方法+アルファ_20260201
masakiokuda
2
220
2025年の本当に大事なAI動向まとめ
frievea
0
170
TeXで変える教育現場
doratex
1
13k
HyRead2526
cbtlibrary
0
200
多様なメンター、多様な基準
yasulab
PRO
5
19k
HTML5 and the Open Web Platform - Lecture 3 - Web Technologies (1019888BNR)
signer
PRO
2
3.2k
AWS re_Invent に全力で参加したくて筋トレを頑張っている話
amarelo_n24
2
120
XML and Related Technologies - Lecture 7 - Web Technologies (1019888BNR)
signer
PRO
0
3.2k
【洋書和訳:さよならを待つふたりのために】第2章 ガン特典と実存的フリースロー
yaginumatti
0
230
Featured
See All Featured
[RailsConf 2023] Rails as a piece of cake
palkan
59
6.3k
What's in a price? How to price your products and services
michaelherold
247
13k
Scaling GitHub
holman
464
140k
The Invisible Side of Design
smashingmag
302
51k
The SEO Collaboration Effect
kristinabergwall1
0
350
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
330
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
420
Mind Mapping
helmedeiros
PRO
0
88
jQuery: Nuts, Bolts and Bling
dougneiner
65
8.4k
Fashionably flexible responsive web design (full day workshop)
malarkey
408
66k
SERP Conf. Vienna - Web Accessibility: Optimizing for Inclusivity and SEO
sarafernandez
1
1.3k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
249
1.3M
Transcript
End-to-end automated data science process using Airflow. Evive
About Evive • Data Driven benefit navigator • Founded in
2006 • 400 + employees
Evive Data 15 2.5M 400 Data team Evive Employee Total
Active members
Data Usage 500+GB 50+ 30+ Total data per day Number
of data channels Number of models running daily
Why Airflow THE WORKFLOW Ingestion Merge data from multiple sources
Standardise Verify Publish
Airflow workers Data Sources Scheduler Database
Airflow Architecture
Functionalities • Scheduling • Dependency management • Error recovery •
Monitoring • Versioning • Mailing and alerting
Creating a dag and an operator
Scheduling tasks
File sensor • Operator that listens to a particular directory
and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
Monitoring using airflow dashboard
Versioning • Versioning can be easily incorporated in airflow as
the entire dag execution happens as one instance. • You can version your data as well as model outputs.
Mailing and alerting system
Future work • Integrating with the existing database architecture and
ETL pipeline • Airflow Kubernetes executors