more operating - Less automation more manual work - More fuss - Less fun - (Other harsh comments you can imagine) Search for the perfect tool …. Conclusion
platform and/or pipeline framework to work with, I can create my own data pipeline! - Turns out there are several companies that built those for us. (Pinterest, Spotify, Airbnb, and etc.) Why just settle for an ETL tool? Let’s code it up!
Does not have a scheduler (Relies on CRON jobs). - Still somewhat active developing going on. - Each task is a class - Do I have to build 100+ classes separately? lolz Why just settle for an ETL tool? Let’s code it up!
Does not have a scheduler (Relies on CRON jobs). - Still somewhat active developing going on. - Each task is a class - Do I have to build 100+ classes separately? lolz Why just settle for an ETL tool? Let’s code it up!
good UI - Dynamic tasks and dag creation! (Will show you this) - Active society - Modular architecture - Dockerized and Helm chartified! Why just settle for an ETL tool? Let’s code it up!
publish data into data warehouse - Growth analytics: - compute metrics around guest and host engagement as well as growth accounting - Experimentation: - Compute A/B testing experimentation frameworks logic and aggregates Airflow - How AirBnB uses it
using the prebuilt hooks or create one if you need - Operator - Simple, functional modules that you can - Operate on any data that you can access using hooks. Airflow - Key Themes/Principles
built with Python. This allows for writing code that instantiates pipelines dynamically. - Combine python with YAML to author pipelines Airflow - Key Themes/Principles
definition - By default, downstream does not execute if upstream fails. - Complex logic can be applied (Ex. When 60% of upstream tasks succeed, execute the downstream)