Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
End-to-end automated data science process using...
Search
Keerthi
October 31, 2018
Education
270
2
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
End-to-end automated data science process using Airflow.
End-to-end automated data science process using Airflow.
Keerthi
October 31, 2018
Other Decks in Education
See All in Education
0415
cbtlibrary
0
210
[2026前期火5] 論理学(京都大学文学部 前期 第1回)「ハルシネーションを外部世界との対応を考えずに見分ける方法」
yatabe
0
1.1k
0513
cbtlibrary
0
190
Dashboards - Lecture 11 - Information Visualisation (4019538FNR)
signer
PRO
1
2.7k
Modern Data Fetching Techniques in Angular
debug_mode
0
210
共感から、つくる: 変わり続ける自分と、誰かのための創造
micknerd
1
380
Gitがない時代 インターネットがない時代の 開発話
sapi_kawahara
0
270
事業紹介資料(トレーナー養成講座)
kentaro1981
0
440
Curso de Consagração ao Sagrado Coração de Jesus - O Sagrado Coração na História (Aula 01)
cm_manaus
0
210
AI-Based Speaking Assessment of a Short-Term Study Abroad Program
uranoken
0
220
Info Session MSc Computer Science & MSc Applied Informatics
signer
PRO
0
290
AIには考えられないことを考えられる人になるために
iqbocchi
1
140
Featured
See All Featured
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
310
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.3k
Evolving SEO for Evolving Search Engines
ryanjones
0
210
How to Talk to Developers About Accessibility
jct
2
230
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
46
2.9k
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
1.4k
Optimising Largest Contentful Paint
csswizardry
37
3.7k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
62k
We Have a Design System, Now What?
morganepeng
55
8.2k
Statistics for Hackers
jakevdp
799
230k
Designing for Performance
lara
611
70k
Transcript
End-to-end automated data science process using Airflow. Evive
About Evive • Data Driven benefit navigator • Founded in
2006 • 400 + employees
Evive Data 15 2.5M 400 Data team Evive Employee Total
Active members
Data Usage 500+GB 50+ 30+ Total data per day Number
of data channels Number of models running daily
Why Airflow THE WORKFLOW Ingestion Merge data from multiple sources
Standardise Verify Publish
Airflow workers Data Sources Scheduler Database
Airflow Architecture
Functionalities • Scheduling • Dependency management • Error recovery •
Monitoring • Versioning • Mailing and alerting
Creating a dag and an operator
Scheduling tasks
File sensor • Operator that listens to a particular directory
and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
Monitoring using airflow dashboard
Versioning • Versioning can be easily incorporated in airflow as
the entire dag execution happens as one instance. • You can version your data as well as model outputs.
Mailing and alerting system
Future work • Integrating with the existing database architecture and
ETL pipeline • Airflow Kubernetes executors