Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
End-to-end automated data science process using...
Search
Keerthi
October 31, 2018
Education
2
210
End-to-end automated data science process using Airflow.
End-to-end automated data science process using Airflow.
Keerthi
October 31, 2018
Tweet
Share
Other Decks in Education
See All in Education
"数学" をプログラミングしてもらう際に気をつけていること / Key Considerations When Programming "Mathematics"
guvalif
0
570
Blogit opetuksessa
matleenalaakso
0
1.6k
cbt2324
cbtlibrary
0
110
HTML5 and the Open Web Platform - Lecture 3 - Web Technologies (1019888BNR)
signer
PRO
1
2.6k
学習指導要領から職場の学びを考えてみる / Thinking about workplace learning from learning guidelines
aki_moon
1
710
JavaScript - Lecture 6 - Web Technologies (1019888BNR)
signer
PRO
0
2.5k
オープンソース防災教育ARアプリの開発と地域防災での活用
nro2daisuke
0
170
Ch2_-_Partie_1.pdf
bernhardsvt
0
110
Tableau トレーニング【株式会社ニジボックス】
nbkouhou
0
19k
Semantic Web and Web 3.0 - Lecture 9 - Web Technologies (1019888BNR)
signer
PRO
1
2.5k
1113
cbtlibrary
0
260
不登校予防・再登校支援プログラムを提供するToCo (トーコ) の会社紹介資料 toco.mom
toco3week
0
410
Featured
See All Featured
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
169
50k
Statistics for Hackers
jakevdp
796
220k
Visualization
eitanlees
145
15k
Product Roadmaps are Hard
iamctodd
PRO
49
11k
Become a Pro
speakerdeck
PRO
25
5k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
665
120k
Happy Clients
brianwarren
98
6.7k
Navigating Team Friction
lara
183
14k
StorybookのUI Testing Handbookを読んだ
zakiyama
27
5.3k
Speed Design
sergeychernyshev
25
620
Testing 201, or: Great Expectations
jmmastey
38
7.1k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
8
900
Transcript
End-to-end automated data science process using Airflow. Evive
About Evive • Data Driven benefit navigator • Founded in
2006 • 400 + employees
Evive Data 15 2.5M 400 Data team Evive Employee Total
Active members
Data Usage 500+GB 50+ 30+ Total data per day Number
of data channels Number of models running daily
Why Airflow THE WORKFLOW Ingestion Merge data from multiple sources
Standardise Verify Publish
Airflow workers Data Sources Scheduler Database
Airflow Architecture
Functionalities • Scheduling • Dependency management • Error recovery •
Monitoring • Versioning • Mailing and alerting
Creating a dag and an operator
Scheduling tasks
File sensor • Operator that listens to a particular directory
and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
Monitoring using airflow dashboard
Versioning • Versioning can be easily incorporated in airflow as
the entire dag execution happens as one instance. • You can version your data as well as model outputs.
Mailing and alerting system
Future work • Integrating with the existing database architecture and
ETL pipeline • Airflow Kubernetes executors