Upgrade to Pro — share decks privately, control downloads, hide ads and more …

End-to-end automated data science process using Airflow.

Keerthi
October 31, 2018

End-to-end automated data science process using Airflow.

End-to-end automated data science process using Airflow.

Keerthi

October 31, 2018
Tweet

Other Decks in Education

Transcript

  1. Data Usage 500+GB 50+ 30+ Total data per day Number

    of data channels Number of models running daily
  2. Functionalities • Scheduling • Dependency management • Error recovery •

    Monitoring • Versioning • Mailing and alerting
  3. File sensor • Operator that listens to a particular directory

    and triggers the downstream task once the file lands on the corresponding directory. • Pynotify as operator.
  4. Versioning • Versioning can be easily incorporated in airflow as

    the entire dag execution happens as one instance. • You can version your data as well as model outputs.
  5. Future work • Integrating with the existing database architecture and

    ETL pipeline • Airflow Kubernetes executors