Slide 1

Slide 1 text

Contributing to Apache Airflow Airflow Summit 8 July 2021 Kaxil Naik Airflow Committer and PMC Member OSS Airflow Team @ Astronomer

Slide 2

Slide 2 text

Who am I? ● Airflow Committer & PMC Member ● Manager of Airflow Engineering team @ Astronomer ○ Work full-time on Airflow ● Previously worked at DataReply ● Masters in Data Science & Analytics from Royal Holloway, University of London ● Twitter: https://twitter.com/kaxil ● Github: https://github.com/kaxil/ ● LinkedIn: https://www.linkedin.com/in/kaxil/

Slide 3

Slide 3 text

Agenda ● My Journey ● How to start contributing ● Communication channels ● Guidelines to become a committer http://gph.is/1VBGIPv

Slide 4

Slide 4 text

My Journey

Slide 5

Slide 5 text

Motivation to contribute ! https://stackoverflow.com/q/47452879/5691525

Slide 6

Slide 6 text

Motivation to contribute ! https://stackoverflow.com/a/47452939/5691525

Slide 7

Slide 7 text

But it didn’t work …

Slide 8

Slide 8 text

Fixed it - My First PR

Slide 9

Slide 9 text

My First PR - Fixes Typo

Slide 10

Slide 10 text

My First PR - I didn’t follow Guidelines !! https://media.giphy.com/media/KSKvdT1YGCpUIonvSq/giphy.gif

Slide 11

Slide 11 text

My First “Merged” PR/commit http://gph.is/15RTH5O

Slide 12

Slide 12 text

Slowly & Steadily started adding more contributions

Slide 13

Slide 13 text

Became Airflow Committer & (P)PMC Member https://twitter.com/ApacheAirflow/status/993950478785490945

Slide 14

Slide 14 text

Steered Release for Airflow 1.10.2

Slide 15

Slide 15 text

Became Leading Airflow Committer in Feb 2021

Slide 16

Slide 16 text

What did I learn by working on Airflow? ● Writing unit-tests ● Improved Coding skills ● Got to know many companies & devs across the globe ● Improved communication skills ○ Commit messages & PR descriptions ○ Email threads on dev list ○ Presentations (Public Speaking was one of my fears !!)

Slide 17

Slide 17 text

You are next !!

Slide 18

Slide 18 text

How to start contributing?

Slide 19

Slide 19 text

How to start contributing? ● Contributing Guidelines: CONTRIBUTING.rst ● Contributing Quick Start Guide: CONTRIBUTORS_QUICK_START.rst ● Good First Issues: https://github.com/apache/airflow/contribute https://gph.is/g/ZWdK71X

Slide 20

Slide 20 text

Contribution Workflow 1. Find the issue you want to work on 2. Setup a local dev environment 3. Understand the codebase 4. Write Code & add tests 5. Run tests locally 6. Create PR and wait for reviews 7. Address any suggestions by reviewers 8. Nudge politely if your PR is pending reviews for a while

Slide 21

Slide 21 text

Finding issues to work on

Slide 22

Slide 22 text

Finding issues to work on ● Start small: the aim should be to understand the process ● Bugs / features impacting you or your work ● Documentation Issues (including Contribution Guides) ○ Missing or outdated info, typos, formatting issues, broken links etc ● Good First Issues: https://github.com/apache/airflow/contribute ● Other open GitHub Issues: https://github.com/apache/airflow/issues

Slide 23

Slide 23 text

Finding issues to work on - Open Unassigned Issues If the issue is open and un-assigned, comment you want to work on it. A committer will assign that issue to you. Then it is all yours.

Slide 24

Slide 24 text

Finding issues to work on - Improving Documentation ● If you faced an issue with docs, fix it for future readers ● Documentation PRs are the great first contributions ● Missing or outdated info, typos, formatting issues, broken links etc ● No need of writing unit tests ● Examples: ○ https://github.com/apache/airflow/pull/16275 ○ https://github.com/apache/airflow/pull/13462 ○ https://github.com/apache/airflow/pull/15265

Slide 25

Slide 25 text

Setup a local dev environment

Slide 26

Slide 26 text

SetUp Local Development Environment ● Fork Apache Airflow repo & clone it locally ● Install pre-commit hooks (link) to detect minor issues before creating a PR ○ Some of them even automatically fix issues e.g ‘black’ formats python code ○ Install pre-commit framework: pip install pre-commit ○ Install pre-commit hooks: pre-commit install ● Use breeze - a wrapper around docker-compose for Airflow development. ○ Mac Users: Increase resources available to Docker for Mac ○ Check Prerequisites: https://github.com/apache/airflow/blob/main/BREEZE.rst#prerequisites ○ Setup autocomplete: ./breeze setup-autocomplete

Slide 27

Slide 27 text

SetUp Local Development Environment - Breeze ● Airflow CI uses breeze too so it allows reproduction locally ● Allows running Airflow with different environments (different Python versions, different Metadata db, etc): ○ ./breeze --python 3.6 --backend postgres --postgres-version 12 ● You can also run a local instance of Airflow using: ○ ./breeze start-airflow --python 3.6 --backend postgres ● You can then access the Webserver on http://localhost:28080

Slide 28

Slide 28 text

SetUp Local Development Environment - Breeze

Slide 29

Slide 29 text

Understand the Codebase

Slide 30

Slide 30 text

Understand the Codebase ● apache/airflow is mono-repo containing code for: ○ Apache Airflow Python package ○ More than 60 Providers (Google, Amazon, Postgres, etc) ○ Container image ○ Helm Chart ● Each of these items are released and versioned separately ● Contribution process for the entire repo is same

Slide 31

Slide 31 text

Understand the Codebase ● Do not try to understand the entire codebase at once ● Get familiar with the directory structure first ● Dive into the source code related to your issue ● Similar to: If you are moving to a new house, you would try to first get familiar with your immediate neighbours and then others. (unless you have memory like Sheldon Cooper !!!) http://gph.is/2F2nUVb

Slide 32

Slide 32 text

Understand the Codebase - Directory Structure Area Paths (relative to the repository root) Core Airflow Docs docs/apache-airflow Stable API airflow/api_connexion CLI airflow/cli Webserver / UI airflow/www Scheduler airflow/jobs/scheduler_job.py Dag Parsing airflow/dag_processing Executors airflow/executors DAG Serialization airflow/serialization Helm Chart (& it’s tests) chart Container Image Dockerfile Tests tests

Slide 33

Slide 33 text

Understand the Codebase - Directory Structure Area Paths (relative to the repository root) Providers airflow/providers Core Operators airflow/operators Core Hooks airflow/hooks Core Sensors airflow/sensors DB Migrations airflow/migrations ORM Models (Python Class -> DB Tables) airflow/models Secrets Backend airflow/secrets Configuration airflow/configuration.py Permission Model airflow/www/security.py All Docs (incl. docs for Chart & Container image) docs

Slide 34

Slide 34 text

Understand the Codebase - Areas ● Get expertise in a certain area before diving into a different one. Easy Medium Complex (core) Docs Webserver Scheduler CLI Helm Chart Executors Operators / Hooks / Sensors (Providers) Dockerfile Configuration Stable API Secrets Backend Permission Model DB Migrations Dag Parsing

Slide 35

Slide 35 text

Write Code, add docs & tests

Slide 36

Slide 36 text

Write code ● Take inspiration from existing code ● E.g. when writing a hook, look at: ○ Code for other similar hooks ○ PRs that added other hooks to see everything that changed including docs & tests ● Check out Coding style and best practices in CONTRIBUTING.rst

Slide 37

Slide 37 text

Add tests and docs ● The tests directory has same structure as airflow. ● E.g If code file is airflow/providers/google/cloud/operators/bigquery.py ; tests for it should be added at tests/providers/google/cloud/operators/test_bigquery.py ● Docs for it would be at docs/apache-airflow-providers-google/operators/cloud/bigquery.rst

Slide 38

Slide 38 text

Run tests locally

Slide 39

Slide 39 text

Run tests locally - Single Test ● Start breeze: ./breeze --backend postgres --python 3.7 ● Run a single test from a file: pytest tests/secrets/test_secrets.py -k test_backends_kwargs

Slide 40

Slide 40 text

Run tests locally - Multiple Tests ● Start breeze: ./breeze --backend postgres --python 3.7 ● Run all test in a file: pytest tests/secrets/test_secrets.py

Slide 41

Slide 41 text

Run tests locally ● Similarly, you can run various different tests locally: ○ Integration Tests (with Celery, Redis, etc) ○ Kubernetes Tests with the Helm Chart ○ System Tests (useful for testing providers) ● Check TESTING.rst for more details on how you can run them

Slide 42

Slide 42 text

Build docs locally ● If you have updated docs including docstrings, build docs locally ● Two types of tests for docs: 1. Docs are built successfully with Sphinx 2. Spelling Checks

Slide 43

Slide 43 text

Build docs locally Example: If you updated Helm Chart docs (docs/helm-chart), build docs using ./breeze build-docs -- --package-filter helm-chart

Slide 44

Slide 44 text

Ready to commit - Static Code Checks ● Once you are happy with your code, commit it ● Pre-commit hooks will run as you as you run git commit ● ~90 pre-commit hooks (flake8, black, mypy, trim trailing whitespaces etc) ● All these hooks are documented in STATIC_CODE_CHECKS.rst ● Fix any failing hooks and run git add . && git commit again until all pass ● These checks will be run on CI too when you create PR

Slide 45

Slide 45 text

Ready to commit - Static Code Checks

Slide 46

Slide 46 text

Write a good git commit message (Very Important) 1. Separate subject from body with a blank line 2. Limit the subject line to 50 characters 3. Capitalize the subject line 4. Do not end the subject line with a period 5. Use the imperative mood in the subject line 6. Wrap the body at 72 characters 7. Use the body to explain what and why vs. how Source: https://chris.beams.io/posts/git-commit/ Example: https://github.com/apache/airflow/commit/73b9163a8f55ce3d5bf6aec0a558952c27dd1b55

Slide 47

Slide 47 text

Create PR and wait for reviews

Slide 48

Slide 48 text

Create PR ● Finally create a PR from your fork to apache/airflow repo ● Make sure to add PR description and title appropriately (similar to commit messages) ● You can add commits to your branch after creating the PR too ● Wait for one of the Committers to review the PR ● Reviewers of the PR might leave suggestions or ask clarifications ● Ask for help on the PR itself if you have any questions by tagging Committers

Slide 49

Slide 49 text

Wait for Reviews ● Be Patient, sometimes it may take multiple days or weeks before you get a review ● If you don’t get any reviews after a couple of weeks, you can ping on #development channel in Airflow Slack Workspace.

Slide 50

Slide 50 text

Tests on CI ● Tests will run via GitHub Actions as soon as you create PR ● Fix any failing tests

Slide 51

Slide 51 text

Tests on CI ● Sometimes you might see CI failures unrelated to your PRs ● It can be due to one of the following reasons: ○ Flaky tests ○ Tests/Code on “main” branch might be broken ○ GitHub Runner failures -- these are transient errors ○ Timeouts due to no available slot to run on Workers ● Failure of “Quarantined Tests” can be ignored -- those are expected to fail randomly

Slide 52

Slide 52 text

When and who will merge the PR? ● One approved vote from a committer is needed before a PR can be merged ● One of the committers will merge the PR once the tests are completed ● Mention the committer who reviewed if your PR is approved but not merged for a while

Slide 53

Slide 53 text

Communication Channels

Slide 54

Slide 54 text

Communication channels ● Mailing Lists ○ Dev List - dev@airflow.apache.org (Public Archive Link) ■ official source for any decisions, discussions & announcements ■ "If it didn't happen on the dev list, it didn't happen" ■ Subscribe by sending email to dev-subscribe@airflow.apache.org ○ User List - users@airflow.apache.org (Public Archive Link) ● Airflow Slack Workspace: https://s.apache.org/airflow-slack (Public Archive Link) ● GitHub Discussions: https://github.com/apache/airflow/discussions

Slide 55

Slide 55 text

Guidelines to become a committer

Slide 56

Slide 56 text

Roles ● Contributors: Anyone who contributes code, documentation etc by creating PRs ● Committers: Community members that have ‘write access’ to the project’s repositories ● PMC Members: Members who are responsible for governance of the project ○ Binding votes on releases ○ Responsible for voting in new committers and PMC members to the project ○ Making sure code licenses and all ASF’s legal policies & brand are complied with ○ Dealing with vulnerability reports

Slide 57

Slide 57 text

How to become a Committer - Prerequisites ● Guidelines are documented at https://github.com/apache/airflow/blob/main/COMMITTERS.rst ● You can become committer either by (1) Code Contributions or (2) Community Contributions ● Prerequisites ○ Consistent contribution over last few months ○ Visibility on discussions on the dev mailing list, Slack channels or GitHub issues/discussions ○ Contributions to community health and project's sustainability for the long-term ○ Understands contributor/committer guidelines: Contributors' Guide

Slide 58

Slide 58 text

How to become a Committer - Code Contributions 1. High-quality commits (especially commit messages), including upgrade paths or deprecation policies 2. Testing Release Candidates 3. Proposed and led to completion Airflow Improvement Proposal(s) - AIPs 4. Champions one of the areas in the codebase like Airflow Core, API, Docker Image, Helm Chart, etc 5. Made a significant improvement or added an integration that is important to the Airflow Ecosystem

Slide 59

Slide 59 text

How to become a Committer - Community contributions 1. Instrumental in triaging issues 2. Improved documentation of Airflow in a significant way 3. Lead change and improvements in the “community” processes and tools 4. Actively spreads the word about Airflow, for example organising Airflow summit, workshops for community members, giving and recording talks in Meetups & conference, writing blogs 5. Reporting bugs with detailed reproduction steps

Slide 60

Slide 60 text

Airflow Improvement Proposal (AIP) ● The purpose of an AIP is to introduce any major change to Apache Airflow, mostly the ones that require architectural changes after planning and discussing with the community ● Details on https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals ● Proposal lifecycle: ○ Discuss - discussions on the dev mailing list ○ Draft - create a proposal on the WIKI ○ Vote - vote on dev mailing list (only Committers & PMC Members have a binding vote) ○ Accepted - work is started if vote passes ○ Completed - once all PRs related to the AIPs are merged

Slide 61

Slide 61 text

Links / References

Slide 62

Slide 62 text

Links ● Airflow ○ Repo: https://github.com/apache/airflow ○ Website: https://airflow.apache.org/ ○ Blog: https://airflow.apache.org/blog/ ○ Documentation: https://airflow.apache.org/docs/ ○ Slack: https://s.apache.org/airflow-slack ○ Twitter: https://twitter.com/apacheairflow ● Contact Me: ○ Twitter: https://twitter.com/kaxil ○ Github: https://github.com/kaxil/ ○ LinkedIn: https://www.linkedin.com/in/kaxil/

Slide 63

Slide 63 text

Thank You!