Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Quitting pip: How we use git submodules to mana...

Quitting pip: How we use git submodules to manage internal dependencies that require fast iteration

At PyCon Berlin 2022

Everybody has to do it and nobody likes doing it: Dependency management is hard, and it does not help that Python offers you many different approaches.

There are several package managers, like pip, conda or poetry, which offer various ways of distributing packages, e.g. from a git source, local path, or wheels served off a private PyPI server. To deal with different versions, there are a multitude of virtual environment systems like virtualenv, pipenv, and venv. Finding the right workflow and picking the right tools can be a challenge.

After we review the current situation in Python dependency and package management and discuss the advantages and drawbacks of various approaches we tried in the past, we present our current solution to the problem: Using git submodules allows us to iterate quickly on different versions of the same library, keep all the benefits of IDE tooling and ensure that dependency versions are pinned replicably.

We share some insight into the tooling created around the workflow, including automation for adding dependencies easily, CI pipelines to ensure compliance, IDE integration, and synergies with our dockerized tech stack.

https://www.mediaire.de
https://www.philippstephan.de
https://github.com/mediaire/submodule-dependencies

Philipp Stephan

April 13, 2022
Tweet

More Decks by Philipp Stephan

Other Decks in Programming

Transcript

  1. Philipp Stephan, mediaire, PyCon 2022 Quitting pip: How we use

    git submodules to manage internal dependencies that require fast iteration 1
  2. 2

  3. Beautiful is better than ugly. Explicit is better than implicit.

    Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity. Errors should never pass silently. Unless explicitly silenced. In the face of ambiguity, refuse the temptation to guess. There should be one—and preferably only one—obvious way to do it. Although that way may not be obvious at fi rst unless you're Dutch. Now is better than never. Although never is often better than right now. If the implementation is hard to explain, it's a bad idea. If the implementation is easy to explain, it may be a good idea. Namespaces are one honking great idea—let’s do more of those! The Zen of Python >>> import this There should be one—and preferably only one—obvious way to do it. 5
  4. Poetry Conda easy_install pip src egg wheel git zip pyenv

    pipenv venv virtualenv condaenv package managers package formats environment separation 6
  5. private PyPI problems • authentication • fast iteration • single

    point of failure • version compatibility • dependency confusion attack 10
  6. 12 demo/project/ $ git submodule init demo/project// $ git submodule

    add ../library library_sub Cloning into 'demo/project/library'... done. git status On branch main Changes to be committed: (use "git restore --staged <file>..." to unstage) new file: .gitmodules new file: library_sub demo/project/// $ demo/project///// $
  7. 13 git status On branch main Changes to be committed:

    (use "git restore --staged <file>..." to unstage) new file: .gitmodules new file: library_sub demo/project/// $ cat .gitmodules [submodule "library"] path = library_sub url = ../library demo/project////// $ demo/project///// $
  8. demo/project/library_sub $ cat .gitmodules [submodule "library"] path = library_sub url

    = ../library touch change.txt demo/project/library_sub $ git status On branch main Changes to be committed: new file: change.txt demo/project////// $ git status On branch main modified: library_sub (modified content) demo/project/library_sub/ $ git commit -am "changes" demo/project /////// $ git status/ On branch main modified: library_sub (new commits) [main e5a8ae5] world 1 file changed, 1 insertion(+) create mode 100644 change.txt demo/project///// $
  9. demo/project/library_sub/ $ git commit -am "changes" demo/project /////// $ git

    status/ On branch main modified: library_sub (new commits) [main e5a8ae5] changes 1 file changed, 1 insertion(+) create mode 100644 change.txt demo/project/ $ demo/project/library_sub// $ git commit -am "sub changes" [main 2f62b1e] sub changes 1 file changed, 1 insertion(+), 1 deletion(-) demo/project $
  10. demo/project/library_sub/ $ git commit -am "changes" [main e5a8ae5] changes 1

    file changed, 1 insertion(+) create mode 100644 change.txt demo/project/library_sub// $
  11. demo/project/library_sub/ $ git commit -am "changes" [main e5a8ae5] changes 1

    file changed, 1 insertion(+) create mode 100644 change.txt demo/project/library_sub// $
  12. demo/library// $ demo/project/library_sub/ $ git commit -am "changes" [main e5a8ae5]

    changes 1 file changed, 1 insertion(+) create mode 100644 change.txt demo/project/library_sub// $ ls demo/library//// $ Switched to a new branch 'feat' git checkout --branch feat demo/project/library_sub// $ touch feat.txt demo/project/library_sub/// $ git commit -am "feature" [feat e2ac8afa] changes 1 file changed, 1 insertion(+) create mode 100644 feat.txt git branch * main demo/library///// $
  13. demo/library//// $ demo/project/library_sub/// $ git commit -am "feature" git branch

    * main demo/library $ demo/project/library_sub $ Writing objects: 100% (3/3) Total 3, reused 0, pack-reused 0 To /demo/library * [new branch] feat -> feat Branch 'feat' set up to track remote branch 'feat' from 'origin'. git push --set-upstream origin feat [feat e2ac8afa] changes 1 file changed, 1 insertion(+) create mode 100644 feat.txt git branch . * main branch demo/library/ $
  14. demo/library $ Writing objects: 100% (3/3) Total 3, reused 0,

    pack-reused 0 To /demo/library * [new branch] feat -> feat Branch 'feat' set up to track remote branch 'feat' from 'origin'. git branch . * main branch demo/project/library_sub/ $ demo/library/ $ cd .. demo/project $ git branch * main demo/project $
  15. project setup 21 project/ __init__.py … project/ README.md Dockerfile .gitignore

    requirements.txt library/ __init__.py … library/ README.md Dockerfile .gitignore requirements.txt
  16. project setup 22 project/ __init__.py … project/ … library/ __init__.py

    … library/ … project/ __init__.py … project/ … _library/ __init__.py … library/ … library/ $ git submodule add [email protected]/library library $ export PYTHONPATH=./library/:${PYTHONPATH} $ git submodule add [email protected]/library _library $ ln -s _library/library library
  17. point to branches/tags project  DEV-2342 common_lib_2 v0.23.42 common_lib_1 v4.2.23

    common_lib_3  DEV-2342 multiple states at the same time project_a  main common_lib  v2.3.42 project_b  main common_lib  v4.2.23 23 IDE support/import discovery project/ project/ _library/ library/ library/
  18. (external) dependency resolution • concat requirements.txt • ordering matters duplication

    • disk space • compile time • build time work fl ow • learning curve • git submodule init/update • commit submodules in 
 untagged state 24
  19. """ Performs a check and only exits successfully (exit code

    0) if all the submodules in the folder are pointing to a valid version tag. It also exits successfully if there are no submodules. If the env variable ALLOW_DIRTY_SUBMODULES has been checked, it returns always code 0. """ COMMAND = """ /bin/bash -c "paste -d ' ' <(git submodule status | cut -d'(' -f1) <(git submodule foreach git describe --tags | grep -v 'Entering')" """ if __name__ == '__main__': out = subprocess.check_output(COMMAND, shell=True) invalid_refs = 0 if os.environ.get('ALLOW_DIRTY_SUBMODULES'): print('Allowing dirty submodules - no check to be performed') sys.exit(0) for line in out.decode('utf-8').splitlines(): splitted = line.split() if len(splitted) == 3: subproject = splitted[1] at_version = splitted[2].strip() print(f'{subproject} is at version {at_version}') if re.match('^\d+\.\d+\.\d+$', at_version): print('- good version tag') else: print('- NOT a good version tag') prevent untagged submodules on CI 26
  20. import os import logging import gitlab import re import yaml

    from gitlab.exceptions import GitlabUpdateError from gitlab_submodule.gitlab_submodule import iterate_subprojects """ A script for automatically creating merge requests s to update to the latest version of a common library. The projects and the assignees/reviewers for each MR are stored in a `.yml` configuration file. This script is idempotent: it will only create one open MR at the time for each project, even if the common library version version changes. In that case it will update the existing one. """ logging.basicConfig( format='%(asctime)s %(levelname)s %(module)s:%(lineno)s %(message)s', level=logging.INFO) logger = logging.getLogger(__name__) PRIVATE_GITLAB_TOKEN = os.getenv('PRIVATE_GITLAB_TOKEN') BASE_DIR = os.path.dirname(__file__) DESCRIPTION_TEMPLATE = open(os.path.join(BASE_DIR, 'mr_template.txt')).read() def connect(): automatically create library update MRs 28