Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
End-to-end automated data science process using Airflow.
Search
Keerthi
October 31, 2018
Education
2
180
End-to-end automated data science process using Airflow.
End-to-end automated data science process using Airflow.
Keerthi
October 31, 2018
Tweet
Share
Other Decks in Education
See All in Education
Data Processing and Visualisation Frameworks - Lecture 6 - Information Visualisation (4019538FNR)
signer
PRO
1
1.7k
人生の転機からチャンスを掴む「シュロスバーグの4Sモデル」/4s-models
yuko_yokouchi
2
730
令和6年度 無料トライアルキャンペーン説明会
asial_edu
0
850
Introduction - Lecture 1 - Information Visualisation (4019538FNR)
signer
PRO
0
3.5k
第33回 JAWS-UG札幌 クラウド女子会コラボ 勉強会
nagisa53
2
360
AWS試験全冠したら新しい道が開けた話
nagisa53
3
1.1k
Digijulkaisut
matleenalaakso
1
8.5k
経験に複利を効かせろ!ふりかえり研修2024
pokotyamu
22
7.7k
Monaca Educationを活用した課題解決型の探究学習の実践
asial_edu
0
200
[SemanaX-UFCG-2024] Guia descomplicado de entrevistas FAANG
hugaomarques
2
450
Avoin jakaminen ja Creative Commons -lisenssit
matleenalaakso
0
1.1k
Tips for the Presentation - Lecture 2 - Advanced Topics in Big Data (4023256FNR)
signer
PRO
0
130
Featured
See All Featured
Mobile First: as difficult as doing things right
swwweet
216
8.6k
How to train your dragon (web standard)
notwaldorf
73
5.2k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
322
20k
Building Adaptive Systems
keathley
31
1.9k
Designing for Performance
lara
601
67k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
78
43k
Code Review Best Practice
trishagee
55
15k
Done Done
chrislema
178
15k
ParisWeb 2013: Learning to Love: Crash Course in Emotional UX Design
dotmariusz
104
6.6k
Building Effective Engineering Teams - LeadDev
addyosmani
28
1.8k
Designing on Purpose - Digital PM Summit 2013
jponch
110
6.5k
Product Roadmaps are Hard
iamctodd
44
9.7k
Transcript
End-to-end automated data science process using Airflow. Evive
About Evive • Data Driven benefit navigator • Founded in
2006 • 400 + employees
Evive Data 15 2.5M 400 Data team Evive Employee Total
Active members
Data Usage 500+GB 50+ 30+ Total data per day Number
of data channels Number of models running daily
Why Airflow THE WORKFLOW Ingestion Merge data from multiple sources
Standardise Verify Publish
Airflow workers Data Sources Scheduler Database
Airflow Architecture
Functionalities • Scheduling • Dependency management • Error recovery •
Monitoring • Versioning • Mailing and alerting
Creating a dag and an operator
Scheduling tasks
File sensor • Operator that listens to a particular directory
and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
Monitoring using airflow dashboard
Versioning • Versioning can be easily incorporated in airflow as
the entire dag execution happens as one instance. • You can version your data as well as model outputs.
Mailing and alerting system
Future work • Integrating with the existing database architecture and
ETL pipeline • Airflow Kubernetes executors