Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why Airflow? & What's new in Airflow 2.3?

Why Airflow? & What's new in Airflow 2.3?

Talk: https://odsc.com/speakers/whats-new-in-apache-airflow-2-3/

This session talks about Why to use Apache Airflow & the awesome new features the community has built that were recently released in Apache Airflow 2.3.

Highlights:
- Dynamic Task Mapping
- First-class support for DB Downgrades
- Pruning old DB records (No need of using Maintenance DAGs anymore)
- Building Connections using JSON
- UI Improvements

The talk will also cover the growth of Airflow Community over years and why Airflow is still the defacto tool for Workflow Orchestration.

4114559062197ddc69b311ea6e6207d0?s=128

Kaxil Naik

June 16, 2022
Tweet

More Decks by Kaxil Naik

Other Decks in Programming

Transcript

  1. Why Airflow? & What's new in Airflow 2.3? Kaxil Naik

    ODSC 2022
  2. Who am I? • Committer & PMC Member of Apache

    Airflow • Director of Airflow Engineering @ Astronomer @kaxil
  3. What is Apache Airflow?

  4. A platform to programmatically author, schedule, and monitor workflows

  5. A platform to programmatically author, schedule, and monitor workflows

  6. A platform to programmatically author, schedule, and monitor workflows

  7. A platform to programmatically author, schedule, and monitor workflows

  8. Example DAG

  9. Why Apache Airflow?

  10. 2k+ 26.3k 6.8m+ Monthly Downloads GitHub Stars Contributors 23k+ Slack

    Members The Community
  11. Under …

  12. Governed by 48 Committers 27 PMC Members Project Management Committee

    Project Management Committee
  13. Integrations And ……

  14. 75+ Providers

  15. Docker Image docker pull apache/airflow

  16. Helm Chart helm repo add apache-airflow https://airflow.apache.org/ helm install my-airflow

    apache-airflow/airflow
  17. Conference & Meetups 13 Local Groups 3 years with min

    6k-10k registrants every year
  18. Managed Airflow Vendors

  19. What’s new in Airflow 2.3?

  20. Biggest Airflow Release since 2.0 700+ commits! with 50 new

    features
  21. Dynamic Task Mapping Highlight feature of 2.3 First-class support for

    common ETL pattern around dynamic tasks Run same set of tasks for N number of files in a bucket, DB records, ML models where N is unpredictable.
  22. Dynamic Task Mapping Before After

  23. Dynamic Task Mapping

  24. Grid View replaces Tree View!!

  25. Grid View replaces Tree View!! Better support for Task Groups

    & Task Mapping Grid lines and hover effects to see which task you are inspecting Show durations of dag runs to quickly see performance changes Paves way for DAG Versioning
  26. Create Connection in native JSON format

  27. Create Connection in native JSON format

  28. Create Connection in native JSON format

  29. Create Connection in native JSON format

  30. DB downgrades First class support Downgrades to a - Airflow

    version - or to a specific Alembic revision id
  31. DB downgrades First class support

  32. Generate SQL for DB upgrade & downgrade Allows DBA to

    run the DB Migrations ("--show-sql-only" flag)
  33. Purge DB history First class support Helps reduce time when

    running DB Migrations when updating Airflow version Removes need of Maintenance DAGs! ‘--dry-run’ option to print the row counts in the tables to be cleaned Backup your DB before running this!
  34. LocalKubernetesExecutor Speed, Isolation & Simplicity packed in one! Allows users

    to simultaneously run a LocalExecutor and KubernetesExecutor. An executor is chosen to run a task based on the task's queue Tasks just calling APIs + Tasks requiring isolation due to dependencies or computation-heavy Slide from Jed’s Airflow’s Summit talk: https://www.crowdcast.io/e/airflowsummit2022/35
  35. DAG Processor separation Standalone process for DAG parsing “airflow dag-processor”

    CLI Command Code Parsing and Callbacks (Sla + DAG’s on_{success,failure}_callbacks) Makes scheduler not run any user code* First step towards multi-tenancy Disabled by default, can be enabled by Images from AIP-43 AIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR=True
  36. Events Timetable Run DAGs at arbitrary dates Built-in Timetable Useful

    for events which can’t be expressed by Cron or Timedelta
  37. Smooth Operator

  38. Other Minor features Minor but very handy! • A new

    REST API endpoint (‘/dags’) that lets you bulk-pause/resume DAGs • airflow dags reserialize command to delete serialized dags & reparse them • A new listener plugin API that tracks TaskInstance state changes (used by OpenLineage) • New Trigger Rule: all_skipped • Doc: Single page to check Changelog & Updating Guide -> ‘Release Notes’ • (Experimental) Support for ARM Docker Images
  39. Upgrade Now to Airflow 2.3!

  40. Thank You @kaxil