People don't want data - what they really want is insight. Or even better, actionable insight. Now the road from data to insights can be a bit of a beast. Take Airbnb as an example - it started as a scrappy social hack and grew into a large and data-driven company. When they were small so was their data, but as the company and technical architecture grew in scale and complexity leveraging that data became a challenge. It became more and more necessary to combine multiple messy data-sources in novel ways, in the right order and on a strict schedule... using distributed computing... with proper logging and error recovery... gosh. Batch jobs, cron, sticky tape and bits of string soon proved insufficient.
Enter Airflow.
Airflow is an Apache top-level project that was open-sourced by Airbnb. It's a seriously powerful tool that's all about defining, scheduling, running, monitoring and distributing complicated workflows.
In this talk I'll give you a bit of a tour of airflow's moving parts. I'll also talk a little bit about how we are leveraging Airflow at Umuzi