Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Contributing to Apache Airflow | Journey to bec...

Contributing to Apache Airflow | Journey to becoming Airflow's leading contributor

From not knowing Python (let alone Airflow), and from submitting the first PR that fixes typo to becoming Airflow Committer, PMC Member, Release Manager, and #1 Committer this year, this talk walks through Kaxil’s journey in the Airflow World.

The second part of this talk explains:

- How you can also start your OSS journey by contributing to Apache Airflow
- Expanding familiarity with a different part of the Airflow codebase
- Continue committing regularly & steadily to become Airflow Committer. (including talking about current Guidelines of becoming a Committer)
- Different mediums of communication (Dev list, users list, Slack channel, Github Discussions etc)

Airflow Summit 2021: https://airflowsummit.org/sessions/2021/contributing-journey-becoming-leading-contributor/

Kaxil Naik

July 09, 2021
Tweet

More Decks by Kaxil Naik

Other Decks in Programming

Transcript

  1. Contributing to Apache Airflow Airflow Summit 8 July 2021 Kaxil

    Naik Airflow Committer and PMC Member OSS Airflow Team @ Astronomer
  2. Who am I? • Airflow Committer & PMC Member •

    Manager of Airflow Engineering team @ Astronomer ◦ Work full-time on Airflow • Previously worked at DataReply • Masters in Data Science & Analytics from Royal Holloway, University of London • Twitter: https://twitter.com/kaxil • Github: https://github.com/kaxil/ • LinkedIn: https://www.linkedin.com/in/kaxil/
  3. Agenda • My Journey • How to start contributing •

    Communication channels • Guidelines to become a committer http://gph.is/1VBGIPv
  4. What did I learn by working on Airflow? • Writing

    unit-tests • Improved Coding skills • Got to know many companies & devs across the globe • Improved communication skills ◦ Commit messages & PR descriptions ◦ Email threads on dev list ◦ Presentations (Public Speaking was one of my fears !!)
  5. How to start contributing? • Contributing Guidelines: CONTRIBUTING.rst • Contributing

    Quick Start Guide: CONTRIBUTORS_QUICK_START.rst • Good First Issues: https://github.com/apache/airflow/contribute https://gph.is/g/ZWdK71X
  6. Contribution Workflow 1. Find the issue you want to work

    on 2. Setup a local dev environment 3. Understand the codebase 4. Write Code & add tests 5. Run tests locally 6. Create PR and wait for reviews 7. Address any suggestions by reviewers 8. Nudge politely if your PR is pending reviews for a while
  7. Finding issues to work on • Start small: the aim

    should be to understand the process • Bugs / features impacting you or your work • Documentation Issues (including Contribution Guides) ◦ Missing or outdated info, typos, formatting issues, broken links etc • Good First Issues: https://github.com/apache/airflow/contribute • Other open GitHub Issues: https://github.com/apache/airflow/issues
  8. Finding issues to work on - Open Unassigned Issues If

    the issue is open and un-assigned, comment you want to work on it. A committer will assign that issue to you. Then it is all yours.
  9. Finding issues to work on - Improving Documentation • If

    you faced an issue with docs, fix it for future readers • Documentation PRs are the great first contributions • Missing or outdated info, typos, formatting issues, broken links etc • No need of writing unit tests • Examples: ◦ https://github.com/apache/airflow/pull/16275 ◦ https://github.com/apache/airflow/pull/13462 ◦ https://github.com/apache/airflow/pull/15265
  10. SetUp Local Development Environment • Fork Apache Airflow repo &

    clone it locally • Install pre-commit hooks (link) to detect minor issues before creating a PR ◦ Some of them even automatically fix issues e.g ‘black’ formats python code ◦ Install pre-commit framework: pip install pre-commit ◦ Install pre-commit hooks: pre-commit install • Use breeze - a wrapper around docker-compose for Airflow development. ◦ Mac Users: Increase resources available to Docker for Mac ◦ Check Prerequisites: https://github.com/apache/airflow/blob/main/BREEZE.rst#prerequisites ◦ Setup autocomplete: ./breeze setup-autocomplete
  11. SetUp Local Development Environment - Breeze • Airflow CI uses

    breeze too so it allows reproduction locally • Allows running Airflow with different environments (different Python versions, different Metadata db, etc): ◦ ./breeze --python 3.6 --backend postgres --postgres-version 12 • You can also run a local instance of Airflow using: ◦ ./breeze start-airflow --python 3.6 --backend postgres • You can then access the Webserver on http://localhost:28080
  12. Understand the Codebase • apache/airflow is mono-repo containing code for:

    ◦ Apache Airflow Python package ◦ More than 60 Providers (Google, Amazon, Postgres, etc) ◦ Container image ◦ Helm Chart • Each of these items are released and versioned separately • Contribution process for the entire repo is same
  13. Understand the Codebase • Do not try to understand the

    entire codebase at once • Get familiar with the directory structure first • Dive into the source code related to your issue • Similar to: If you are moving to a new house, you would try to first get familiar with your immediate neighbours and then others. (unless you have memory like Sheldon Cooper !!!) http://gph.is/2F2nUVb
  14. Understand the Codebase - Directory Structure Area Paths (relative to

    the repository root) Core Airflow Docs docs/apache-airflow Stable API airflow/api_connexion CLI airflow/cli Webserver / UI airflow/www Scheduler airflow/jobs/scheduler_job.py Dag Parsing airflow/dag_processing Executors airflow/executors DAG Serialization airflow/serialization Helm Chart (& it’s tests) chart Container Image Dockerfile Tests tests
  15. Understand the Codebase - Directory Structure Area Paths (relative to

    the repository root) Providers airflow/providers Core Operators airflow/operators Core Hooks airflow/hooks Core Sensors airflow/sensors DB Migrations airflow/migrations ORM Models (Python Class -> DB Tables) airflow/models Secrets Backend airflow/secrets Configuration airflow/configuration.py Permission Model airflow/www/security.py All Docs (incl. docs for Chart & Container image) docs
  16. Understand the Codebase - Areas • Get expertise in a

    certain area before diving into a different one. Easy Medium Complex (core) Docs Webserver Scheduler CLI Helm Chart Executors Operators / Hooks / Sensors (Providers) Dockerfile Configuration Stable API Secrets Backend Permission Model DB Migrations Dag Parsing
  17. Write code • Take inspiration from existing code • E.g.

    when writing a hook, look at: ◦ Code for other similar hooks ◦ PRs that added other hooks to see everything that changed including docs & tests • Check out Coding style and best practices in CONTRIBUTING.rst
  18. Add tests and docs • The tests directory has same

    structure as airflow. • E.g If code file is airflow/providers/google/cloud/operators/bigquery.py ; tests for it should be added at tests/providers/google/cloud/operators/test_bigquery.py • Docs for it would be at docs/apache-airflow-providers-google/operators/cloud/bigquery.rst
  19. Run tests locally - Single Test • Start breeze: ./breeze

    --backend postgres --python 3.7 • Run a single test from a file: pytest tests/secrets/test_secrets.py -k test_backends_kwargs
  20. Run tests locally - Multiple Tests • Start breeze: ./breeze

    --backend postgres --python 3.7 • Run all test in a file: pytest tests/secrets/test_secrets.py
  21. Run tests locally • Similarly, you can run various different

    tests locally: ◦ Integration Tests (with Celery, Redis, etc) ◦ Kubernetes Tests with the Helm Chart ◦ System Tests (useful for testing providers) • Check TESTING.rst for more details on how you can run them
  22. Build docs locally • If you have updated docs including

    docstrings, build docs locally • Two types of tests for docs: 1. Docs are built successfully with Sphinx 2. Spelling Checks
  23. Build docs locally Example: If you updated Helm Chart docs

    (docs/helm-chart), build docs using ./breeze build-docs -- --package-filter helm-chart
  24. Ready to commit - Static Code Checks • Once you

    are happy with your code, commit it • Pre-commit hooks will run as you as you run git commit • ~90 pre-commit hooks (flake8, black, mypy, trim trailing whitespaces etc) • All these hooks are documented in STATIC_CODE_CHECKS.rst • Fix any failing hooks and run git add . && git commit again until all pass • These checks will be run on CI too when you create PR
  25. Write a good git commit message (Very Important) 1. Separate

    subject from body with a blank line 2. Limit the subject line to 50 characters 3. Capitalize the subject line 4. Do not end the subject line with a period 5. Use the imperative mood in the subject line 6. Wrap the body at 72 characters 7. Use the body to explain what and why vs. how Source: https://chris.beams.io/posts/git-commit/ Example: https://github.com/apache/airflow/commit/73b9163a8f55ce3d5bf6aec0a558952c27dd1b55
  26. Create PR • Finally create a PR from your fork

    to apache/airflow repo • Make sure to add PR description and title appropriately (similar to commit messages) • You can add commits to your branch after creating the PR too • Wait for one of the Committers to review the PR • Reviewers of the PR might leave suggestions or ask clarifications • Ask for help on the PR itself if you have any questions by tagging Committers
  27. Wait for Reviews • Be Patient, sometimes it may take

    multiple days or weeks before you get a review • If you don’t get any reviews after a couple of weeks, you can ping on #development channel in Airflow Slack Workspace.
  28. Tests on CI • Tests will run via GitHub Actions

    as soon as you create PR • Fix any failing tests
  29. Tests on CI • Sometimes you might see CI failures

    unrelated to your PRs • It can be due to one of the following reasons: ◦ Flaky tests ◦ Tests/Code on “main” branch might be broken ◦ GitHub Runner failures -- these are transient errors ◦ Timeouts due to no available slot to run on Workers • Failure of “Quarantined Tests” can be ignored -- those are expected to fail randomly
  30. When and who will merge the PR? • One approved

    vote from a committer is needed before a PR can be merged • One of the committers will merge the PR once the tests are completed • Mention the committer who reviewed if your PR is approved but not merged for a while
  31. Communication channels • Mailing Lists ◦ Dev List - [email protected]

    (Public Archive Link) ▪ official source for any decisions, discussions & announcements ▪ "If it didn't happen on the dev list, it didn't happen" ▪ Subscribe by sending email to [email protected] ◦ User List - [email protected] (Public Archive Link) • Airflow Slack Workspace: https://s.apache.org/airflow-slack (Public Archive Link) • GitHub Discussions: https://github.com/apache/airflow/discussions
  32. Roles • Contributors: Anyone who contributes code, documentation etc by

    creating PRs • Committers: Community members that have ‘write access’ to the project’s repositories • PMC Members: Members who are responsible for governance of the project ◦ Binding votes on releases ◦ Responsible for voting in new committers and PMC members to the project ◦ Making sure code licenses and all ASF’s legal policies & brand are complied with ◦ Dealing with vulnerability reports
  33. How to become a Committer - Prerequisites • Guidelines are

    documented at https://github.com/apache/airflow/blob/main/COMMITTERS.rst • You can become committer either by (1) Code Contributions or (2) Community Contributions • Prerequisites ◦ Consistent contribution over last few months ◦ Visibility on discussions on the dev mailing list, Slack channels or GitHub issues/discussions ◦ Contributions to community health and project's sustainability for the long-term ◦ Understands contributor/committer guidelines: Contributors' Guide
  34. How to become a Committer - Code Contributions 1. High-quality

    commits (especially commit messages), including upgrade paths or deprecation policies 2. Testing Release Candidates 3. Proposed and led to completion Airflow Improvement Proposal(s) - AIPs 4. Champions one of the areas in the codebase like Airflow Core, API, Docker Image, Helm Chart, etc 5. Made a significant improvement or added an integration that is important to the Airflow Ecosystem
  35. How to become a Committer - Community contributions 1. Instrumental

    in triaging issues 2. Improved documentation of Airflow in a significant way 3. Lead change and improvements in the “community” processes and tools 4. Actively spreads the word about Airflow, for example organising Airflow summit, workshops for community members, giving and recording talks in Meetups & conference, writing blogs 5. Reporting bugs with detailed reproduction steps
  36. Airflow Improvement Proposal (AIP) • The purpose of an AIP

    is to introduce any major change to Apache Airflow, mostly the ones that require architectural changes after planning and discussing with the community • Details on https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals • Proposal lifecycle: ◦ Discuss - discussions on the dev mailing list ◦ Draft - create a proposal on the WIKI ◦ Vote - vote on dev mailing list (only Committers & PMC Members have a binding vote) ◦ Accepted - work is started if vote passes ◦ Completed - once all PRs related to the AIPs are merged
  37. Links • Airflow ◦ Repo: https://github.com/apache/airflow ◦ Website: https://airflow.apache.org/ ◦

    Blog: https://airflow.apache.org/blog/ ◦ Documentation: https://airflow.apache.org/docs/ ◦ Slack: https://s.apache.org/airflow-slack ◦ Twitter: https://twitter.com/apacheairflow • Contact Me: ◦ Twitter: https://twitter.com/kaxil ◦ Github: https://github.com/kaxil/ ◦ LinkedIn: https://www.linkedin.com/in/kaxil/