Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open source software: how to live long and go far

Open source software: how to live long and go far

An opinionated guide to building open-source software tools
with a focus on Python and science

A talk that I gave when I was stepping down as a lead for the nilearn software, passing the baton to new maintainers. My goal is to summarize what I have learned across the years as a maintainer of open source/

Gael Varoquaux

February 06, 2025
Tweet

More Decks by Gael Varoquaux

Other Decks in Programming

Transcript

  1. Open source software: how to live long and go far

    Ga¨ el Varoquaux An opinionated guide to building open-source software tools with a focus on Python and science
  2. Open source software: how to live long and go far

    1 A project vision 2 Building with a community 3 Limiting complexity 4 Technical debt 5 Limiting drag 6 Design 7 Iterations 8 Quality 9 Users first G Varoquaux 2
  3. A project vision is crucial Outline a clear vision &

    scope Guides marketing and project choices Must be Focused and Easy to explain Know the competition Sometimes not creating a project is best Choosing where to innovate G Varoquaux 3
  4. Building with a community Community-driven " Open source A community

    goes further Sustainability in the long run Doesn’t depend on a handful of individuals Better software Gives a user-centric focus Avoids monoculture G Varoquaux 4
  5. Building with a community Community-driven " Open source A community

    goes further Sustainability in the long run Doesn’t depend on a handful of individuals Better software Gives a user-centric focus Avoids monoculture A community needs love Peopleware‹ Efforts on communication Volunteers want to learn and to help Relinquishing governance ‹ Title of a great book on managing coders G Varoquaux 4
  6. Building with a community Community-driven " Open source A community

    goes further Sustainability in the long run Doesn’t depend on a handful of individuals Better software Gives a user-centric focus Avoids monoculture A community needs love Peopleware‹ Efforts on communication Volunteers want to learn and to help Relinquishing governance ‹ Title of a great book on managing coders A tiny fraction of the users are going to become contributors Are niche projects viable? G Varoquaux 4
  7. Limiting complexity: being inclusive means keeping it simple Every required

    skill (C++, machine learning, parallel com- puting, brain anatomy) makes it harder to contribute A project’s required skill set should lie at the intersection of the skills of sufficient people Do not rely on a few super individuals G Varoquaux 6
  8. Limiting complexity: understanding is harder than building Debugging is twice

    as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. — Brian Kernighan The everyday roadblock is cognitive load Try reading your old work: your past self does not answer emails G Varoquaux 7
  9. Limiting complexity: a super-linear cost Features are costly [An Experiment

    on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code The 80/20 rule 80% of the usecases can be solved with 20% of the lines of code Avoid feature creep Project vision & design Abstract code is costly Object-oriented programming, de- sign patterns, multiple inheritance, pure functional programming Know these, know not to use them Avoid sophistication G Varoquaux 8
  10. Technical debt: today’s asset may be tomorrow’s liability Choose widely

    what to include Pet project or killer feature? In science: investigators have a bias toward their publications. How to know whether something is useful, revolutionary, or oversold? Prototype outside of the main project Experiment a lot Refer to the project vision G Varoquaux 9
  11. Limiting drag Navigating the trade-offs Compiled code increases severely burden

    Installation is where you loose most users Dependencies Help not reinventing the wheel Yet, each is a liability Avoid dependency creep Drop that feature G Varoquaux 10
  12. Design: innovation and product design Use technical sophistication to find

    the simplest answer to the original problem. G Varoquaux 11
  13. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 13
  14. Design: Some API design principles for Python tools Consistency, consistency,

    consistency np.save(file, obj) pickle.dump(obj, file) fmin(...maxiter=10) lsq linear(...max iter=10) Creates cognitive overload Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 14
  15. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes Objects have hidden states, Objects have no universal interface, entry point, output A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 15
  16. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts How much do usage patterns carry out across the library? Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 16
  17. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Facilitates working with multiple libraries together Easier to get up to speed with a given library Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 17
  18. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Change of behavior depending on input type Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 18
  19. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Interfaces define objects Incompatible behaviors lead to bugs (eg np.matrix) Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 19
  20. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Properties obfuscate the data model of the object Properties can create hidden compute costs Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 20
  21. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Objects are understood by their surface Composition creates cognitive overload Error messages matter Be Pythonic G Varoquaux 21
  22. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Explain the problem Print the offending value Be Pythonic G Varoquaux 22
  23. Design: Some API design principles for Python tools Consistency, consistency,

    consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic Avoid syntax hacks G Varoquaux 23
  24. Iterations: Short cycles, limited ambitions Keep coming back to your

    users Release early, release often G Varoquaux 24
  25. Iterations: Limited resources Limited resources are good Need success in

    the short term, not the long term The startup culture: fail fast Quickly identify non-viable projects The simplest solution that works is the best G Varoquaux 25
  26. Quality: a priority Quality will give you users Bugs give

    you bad rap Quality will give you developers Contribute to learn and improve Quality will make your developers happy People need to be proud of their work G Varoquaux 26
  27. Quality: a priority Quality will give you users Bugs give

    you bad rap Quality will give you developers Contribute to learn and improve Quality will make your developers happy People need to be proud of their work Do less, do better Goes against the grant-system incentive G Varoquaux 26
  28. Quality: everywhere Great documentation Simplify, but don’t dumb down Focus

    on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test G Varoquaux 27
  29. Quality: everywhere Great documentation Simplify, but don’t dumb down Focus

    on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test Quality enables reuse Beyond mere reproducibility G Varoquaux 27
  30. Quality: Process Code review Everything is discussed ñ foster knowledge

    transfert Conventions Coding conventions, naming conventions, documentations conventions Testing Testing all functionality Tests run online Add a test each time there is a bug G Varoquaux 28
  31. Users first: Enabling Usability is key Users are not stupid,

    they are busy Design requires knowing what users are targeted ñ Project vision G Varoquaux 29
  32. Users first: reducing their cognitive load Design: Jonathan Ive, an

    industrial designer, was #4 at Apple Facilitate: Avoid jargon, facilitate adoption Know when to stop: Avoid magic Don’t solve hard problems G Varoquaux 30
  33. @GaelVaroquaux Open source software: how to live long and go

    far 1 A project vision 2 Building with a community 3 Limiting complexity 4 Technical debt 5 Limiting drag 6 Design 7 Iterations 8 Quality 9 Users first
  34. @GaelVaroquaux Open source software: how to live long and go

    far Quality first Complexity is a killer The 80 / 20 rule Everything is a matter of tradeoff