works for DAGs with no schedule_interval • Useful for companies working on several timezones and relying on External Triggers • Enable this feature with the following Env variable: AIRFLOW__SCHEDULER__ALLOW_TRIGGER_IN_FUTURE=True
your DAG files! • Use Airflow connections to store any kind of sensitive data like Passwords, private keys, etc • Airflow stores the connections data in Airflow MetadataDB • If you install “crypto” package (“pip install apache-airflow[crypto]”), password field in Connections would be encrypted in DB too.
Any call to variables would mean a connection to Metadata DB. • Your DAG files are parsed every X seconds. Using a large number of variable in your DAG may mean you might end up saturating the number of allowed connections to your database.
• Airflow will parse your DAG many times over and over (and more often than your schedule interval), and any code at the top level of the file will get run. • Can cause Scheduler to be slow and hence task might end up being delayed
• Airflow ships with a set of roles by default: Admin, User, Op, Viewer, and Public • Creating custom roles is possible • DAG Level Access Control. User can declare the read or write permission inside the DAG file as shown below
LocalExecutor - Runs tasks parallely on same machine using subprocesses • CeleryExecutor - Runs tasks parallely on different worker machines • KubernetesExecutor - Runs tasks on separate Kubernetes pods
to your PYTHONPATH that defines this policy function. • It receives a TaskInstance object and can alter it where needed. • Example Usages: ◦ Enforce a specific queue (say the spark queue) for tasks using the SparkOperator to make sure that these task instances get wired to the right workers ◦ Force all task instances running on an execution_date older than a week old to run in a backfil` pool.
Scheduler and stored in DB from where Webserver reads • Phase-1 implemented and released in Airflow >= 1.10.7 • For Airflow 2.0 we want the Scheduler to read from DB as well and pass on the responsibility of parsing DAG and saving it to DB to “Serializer” or some other component.
solution is “puckel-airflow” docker image and the stable Airflow chart in Helm Repo. • However, we want to support all features and make the official image and Helm chart and support it.