Slide 1

Slide 1 text

Why Airflow? & What's new in Airflow 2.3? Kaxil Naik ODSC 2022

Slide 2

Slide 2 text

Who am I? ● Committer & PMC Member of Apache Airflow ● Director of Airflow Engineering @ Astronomer @kaxil

Slide 3

Slide 3 text

What is Apache Airflow?

Slide 4

Slide 4 text

A platform to programmatically author, schedule, and monitor workflows

Slide 5

Slide 5 text

A platform to programmatically author, schedule, and monitor workflows

Slide 6

Slide 6 text

A platform to programmatically author, schedule, and monitor workflows

Slide 7

Slide 7 text

A platform to programmatically author, schedule, and monitor workflows

Slide 8

Slide 8 text

Example DAG

Slide 9

Slide 9 text

Why Apache Airflow?

Slide 10

Slide 10 text

2k+ 26.3k 6.8m+ Monthly Downloads GitHub Stars Contributors 23k+ Slack Members The Community

Slide 11

Slide 11 text

Under …

Slide 12

Slide 12 text

Governed by 48 Committers 27 PMC Members Project Management Committee Project Management Committee

Slide 13

Slide 13 text

Integrations And ……

Slide 14

Slide 14 text

75+ Providers

Slide 15

Slide 15 text

Docker Image docker pull apache/airflow

Slide 16

Slide 16 text

Helm Chart helm repo add apache-airflow https://airflow.apache.org/ helm install my-airflow apache-airflow/airflow

Slide 17

Slide 17 text

Conference & Meetups 13 Local Groups 3 years with min 6k-10k registrants every year

Slide 18

Slide 18 text

Managed Airflow Vendors

Slide 19

Slide 19 text

What’s new in Airflow 2.3?

Slide 20

Slide 20 text

Biggest Airflow Release since 2.0 700+ commits! with 50 new features

Slide 21

Slide 21 text

Dynamic Task Mapping Highlight feature of 2.3 First-class support for common ETL pattern around dynamic tasks Run same set of tasks for N number of files in a bucket, DB records, ML models where N is unpredictable.

Slide 22

Slide 22 text

Dynamic Task Mapping Before After

Slide 23

Slide 23 text

Dynamic Task Mapping

Slide 24

Slide 24 text

Grid View replaces Tree View!!

Slide 25

Slide 25 text

Grid View replaces Tree View!! Better support for Task Groups & Task Mapping Grid lines and hover effects to see which task you are inspecting Show durations of dag runs to quickly see performance changes Paves way for DAG Versioning

Slide 26

Slide 26 text

Create Connection in native JSON format

Slide 27

Slide 27 text

Create Connection in native JSON format

Slide 28

Slide 28 text

Create Connection in native JSON format

Slide 29

Slide 29 text

Create Connection in native JSON format

Slide 30

Slide 30 text

DB downgrades First class support Downgrades to a - Airflow version - or to a specific Alembic revision id

Slide 31

Slide 31 text

DB downgrades First class support

Slide 32

Slide 32 text

Generate SQL for DB upgrade & downgrade Allows DBA to run the DB Migrations ("--show-sql-only" flag)

Slide 33

Slide 33 text

Purge DB history First class support Helps reduce time when running DB Migrations when updating Airflow version Removes need of Maintenance DAGs! ‘--dry-run’ option to print the row counts in the tables to be cleaned Backup your DB before running this!

Slide 34

Slide 34 text

LocalKubernetesExecutor Speed, Isolation & Simplicity packed in one! Allows users to simultaneously run a LocalExecutor and KubernetesExecutor. An executor is chosen to run a task based on the task's queue Tasks just calling APIs + Tasks requiring isolation due to dependencies or computation-heavy Slide from Jed’s Airflow’s Summit talk: https://www.crowdcast.io/e/airflowsummit2022/35

Slide 35

Slide 35 text

DAG Processor separation Standalone process for DAG parsing “airflow dag-processor” CLI Command Code Parsing and Callbacks (Sla + DAG’s on_{success,failure}_callbacks) Makes scheduler not run any user code* First step towards multi-tenancy Disabled by default, can be enabled by Images from AIP-43 AIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR=True

Slide 36

Slide 36 text

Events Timetable Run DAGs at arbitrary dates Built-in Timetable Useful for events which can’t be expressed by Cron or Timedelta

Slide 37

Slide 37 text

Smooth Operator

Slide 38

Slide 38 text

Other Minor features Minor but very handy! ● A new REST API endpoint (‘/dags’) that lets you bulk-pause/resume DAGs ● airflow dags reserialize command to delete serialized dags & reparse them ● A new listener plugin API that tracks TaskInstance state changes (used by OpenLineage) ● New Trigger Rule: all_skipped ● Doc: Single page to check Changelog & Updating Guide -> ‘Release Notes’ ● (Experimental) Support for ARM Docker Images

Slide 39

Slide 39 text

Upgrade Now to Airflow 2.3!

Slide 40

Slide 40 text

Thank You @kaxil