Slide 1

Slide 1 text

Open source software: how to live long and go far Ga¨ el Varoquaux An opinionated guide to building open-source software tools with a focus on Python and science

Slide 2

Slide 2 text

Open source software: how to live long and go far 1 A project vision 2 Building with a community 3 Limiting complexity 4 Technical debt 5 Limiting drag 6 Design 7 Iterations 8 Quality 9 Users first G Varoquaux 2

Slide 3

Slide 3 text

A project vision is crucial Outline a clear vision & scope Guides marketing and project choices Must be Focused and Easy to explain Know the competition Sometimes not creating a project is best Choosing where to innovate G Varoquaux 3

Slide 4

Slide 4 text

Building with a community Community-driven " Open source G Varoquaux 4

Slide 5

Slide 5 text

Building with a community Community-driven " Open source A community goes further Sustainability in the long run Doesn’t depend on a handful of individuals Better software Gives a user-centric focus Avoids monoculture G Varoquaux 4

Slide 6

Slide 6 text

Building with a community Community-driven " Open source A community goes further Sustainability in the long run Doesn’t depend on a handful of individuals Better software Gives a user-centric focus Avoids monoculture A community needs love Peopleware‹ Efforts on communication Volunteers want to learn and to help Relinquishing governance ‹ Title of a great book on managing coders G Varoquaux 4

Slide 7

Slide 7 text

Building with a community Community-driven " Open source A community goes further Sustainability in the long run Doesn’t depend on a handful of individuals Better software Gives a user-centric focus Avoids monoculture A community needs love Peopleware‹ Efforts on communication Volunteers want to learn and to help Relinquishing governance ‹ Title of a great book on managing coders A tiny fraction of the users are going to become contributors Are niche projects viable? G Varoquaux 4

Slide 8

Slide 8 text

Limiting complexity G Varoquaux 5

Slide 9

Slide 9 text

Limiting complexity: being inclusive means keeping it simple Every required skill (C++, machine learning, parallel com- puting, brain anatomy) makes it harder to contribute A project’s required skill set should lie at the intersection of the skills of sufficient people Do not rely on a few super individuals G Varoquaux 6

Slide 10

Slide 10 text

Limiting complexity: understanding is harder than building Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it. — Brian Kernighan The everyday roadblock is cognitive load Try reading your old work: your past self does not answer emails G Varoquaux 7

Slide 11

Slide 11 text

Limiting complexity: a super-linear cost Features are costly [An Experiment on Unit Increase in Problem Complexity, Woodfield 1979] 25% increase in problem complexity ñ 100% increase in code The 80/20 rule 80% of the usecases can be solved with 20% of the lines of code Avoid feature creep Project vision & design Abstract code is costly Object-oriented programming, de- sign patterns, multiple inheritance, pure functional programming Know these, know not to use them Avoid sophistication G Varoquaux 8

Slide 12

Slide 12 text

Technical debt: today’s asset may be tomorrow’s liability Choose widely what to include G Varoquaux 9

Slide 13

Slide 13 text

Technical debt: today’s asset may be tomorrow’s liability Choose widely what to include Pet project or killer feature? In science: investigators have a bias toward their publications. How to know whether something is useful, revolutionary, or oversold? Prototype outside of the main project Experiment a lot Refer to the project vision G Varoquaux 9

Slide 14

Slide 14 text

Limiting drag Compiled code increases severely burden Installation is where you loose most users G Varoquaux 10

Slide 15

Slide 15 text

Limiting drag Navigating the trade-offs Compiled code increases severely burden Installation is where you loose most users Dependencies Help not reinventing the wheel Yet, each is a liability Avoid dependency creep Drop that feature G Varoquaux 10

Slide 16

Slide 16 text

Design: innovation and product design G Varoquaux 11

Slide 17

Slide 17 text

Design: innovation and product design G Varoquaux 11

Slide 18

Slide 18 text

Design: innovation and product design Use technical sophistication to find the simplest answer to the original problem. G Varoquaux 11

Slide 19

Slide 19 text

Design: ergonomics Separating control of temperature and flow rate Rethinking the users’ needs G Varoquaux 12

Slide 20

Slide 20 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 13

Slide 21

Slide 21 text

Design: Some API design principles for Python tools Consistency, consistency, consistency np.save(file, obj) pickle.dump(obj, file) fmin(...maxiter=10) lsq linear(...max iter=10) Creates cognitive overload Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 14

Slide 22

Slide 22 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes Objects have hidden states, Objects have no universal interface, entry point, output A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 15

Slide 23

Slide 23 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts How much do usage patterns carry out across the library? Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 16

Slide 24

Slide 24 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Facilitates working with multiple libraries together Easier to get up to speed with a given library Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 17

Slide 25

Slide 25 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Change of behavior depending on input type Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 18

Slide 26

Slide 26 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Interfaces define objects Incompatible behaviors lead to bugs (eg np.matrix) Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 19

Slide 27

Slide 27 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Properties obfuscate the data model of the object Properties can create hidden compute costs Shallow is better than deep Error messages matter Be Pythonic G Varoquaux 20

Slide 28

Slide 28 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Objects are understood by their surface Composition creates cognitive overload Error messages matter Be Pythonic G Varoquaux 21

Slide 29

Slide 29 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Explain the problem Print the offending value Be Pythonic G Varoquaux 22

Slide 30

Slide 30 text

Design: Some API design principles for Python tools Consistency, consistency, consistency Functions are easier to understand than classes A library should hinge on a small number of concepts Common data containers make the ecosystem stronger Each function should have one and only one purpose Code for interfaces, but don’t overdo duck typing Properties are for impedance matching Shallow is better than deep Error messages matter Be Pythonic Avoid syntax hacks G Varoquaux 23

Slide 31

Slide 31 text

Iterations: Short cycles, limited ambitions G Varoquaux 24

Slide 32

Slide 32 text

Iterations: Short cycles, limited ambitions Keep coming back to your users Release early, release often G Varoquaux 24

Slide 33

Slide 33 text

Iterations: Limited resources Limited resources are good Need success in the short term, not the long term The startup culture: fail fast Quickly identify non-viable projects The simplest solution that works is the best G Varoquaux 25

Slide 34

Slide 34 text

Quality: a priority G Varoquaux 26

Slide 35

Slide 35 text

Quality: a priority Quality will give you users Bugs give you bad rap Quality will give you developers Contribute to learn and improve Quality will make your developers happy People need to be proud of their work G Varoquaux 26

Slide 36

Slide 36 text

Quality: a priority Quality will give you users Bugs give you bad rap Quality will give you developers Contribute to learn and improve Quality will make your developers happy People need to be proud of their work Do less, do better Goes against the grant-system incentive G Varoquaux 26

Slide 37

Slide 37 text

Quality: everywhere Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test G Varoquaux 27

Slide 38

Slide 38 text

Quality: everywhere Great documentation Simplify, but don’t dumb down Focus on what the user is trying to solve Great APIs Example-based development If something is hard to explain, rethink the concepts Limit the number of different concepts and objects Consistency, consistency, consistency Good numerics Write tests based on mathematical properties When a user finds an instability, write a new test Quality enables reuse Beyond mere reproducibility G Varoquaux 27

Slide 39

Slide 39 text

Quality: Process Code review Everything is discussed ñ foster knowledge transfert Conventions Coding conventions, naming conventions, documentations conventions Testing Testing all functionality Tests run online Add a test each time there is a bug G Varoquaux 28

Slide 40

Slide 40 text

Users first: Enabling Usability is key Users are not stupid, they are busy Design requires knowing what users are targeted ñ Project vision G Varoquaux 29

Slide 41

Slide 41 text

Users first: reducing their cognitive load Design: Jonathan Ive, an industrial designer, was #4 at Apple Facilitate: Avoid jargon, facilitate adoption Know when to stop: Avoid magic Don’t solve hard problems G Varoquaux 30

Slide 42

Slide 42 text

@GaelVaroquaux Open source software: how to live long and go far 1 A project vision 2 Building with a community 3 Limiting complexity 4 Technical debt 5 Limiting drag 6 Design 7 Iterations 8 Quality 9 Users first

Slide 43

Slide 43 text

@GaelVaroquaux Open source software: how to live long and go far Quality first Complexity is a killer The 80 / 20 rule Everything is a matter of tradeoff