Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Newcomer's Guide To Airflow's Architecture

A Newcomer's Guide To Airflow's Architecture

A talk I gave at Airflow Summit 2021.

Andrew Godwin

July 12, 2021
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Hi, I’m Andrew Godwin • Principal Engineer at • Also

    a Django core developer, ASGI author • Using Airflow since March 2021
  2. High-Level Concepts What exactly is going on? The Good and

    the Bad Or, How I Learned To Stop Worrying And Love The Scheduler Problems, Fixes & The Future Where we go from here
  3. "Real-time" versus batch The availability versus consistency tradeoff is different!

    Simple concepts, hard to master In Django, it's the ORM. In Airflow, scheduling. It's all still distributed systems Which is fortunate, after fifteen years of doing them
  4. DAG ➡ DagRun One per scheduled run, as the run

    starts Operator ➡ Task When you call an operator in a DAG Task ➡ TaskInstance When a Task needs to run as part of a DagRun
  5. Tasks are the core part of the model DAGs are

    more of a grouping/trigger mechanism
  6. I do like the database, though There's a lot of

    benefit in proven technology
  7. Software Engineering is not just coding Any large-scale project needs

    documentation, architecture, and coordination