Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Saving the world one line at a time?

Gael Varoquaux
September 05, 2023

Saving the world one line at a time?

This talk attempts a reflection on successes and failures on open computational science to make a better society. Two decades ago, I went full on working on open source and science, because I knew that these were vectors of progress. Looking back, I helped create the Python scientific ecosystem, and a major machine-learning toolkit, scikit-learn. What were the drivers of successes as software projects? As societal projects? What aspects of open source software makes better science? And better societies?

Gael Varoquaux

September 05, 2023
Tweet

More Decks by Gael Varoquaux

Other Decks in Technology

Transcript

  1. Science & technology change society Newtonian Mechanics Yesterday: planetary movements

    Today: bridges Quantum Mechanics Nuclear energy Cognitive Psychology Better education G Varoquaux 1
  2. My craft: coding Coding is was I did best for

    the better or the worst (also, I preferred talking to computers than to people) Coding is amazing because it enables a nobody to create G Varoquaux 2
  3. Open source Open source opens knowledges It will democratize science

    Facilitate teaching Reach developing countries Software scarcity economics sharing it is free may even reduce costs G Varoquaux 3
  4. Mid-2000s, matlab was cool Massive amount of code and teaching

    materials in matlab 2020s Python is everywhere scipy cited by 20124 scikit-learn cited by 78496 Free and open Made by a community How did that happen? G Varoquaux 7
  5. Stories of personal involvement Online, conferences, sprints I got involved

    in scipy: Friendliness of developers Fernando Perez (IPython), Prabhu Ramachandran (Mayavi)... More positive feedback than in my work in physics The impression of being useful, making a positive change Growing in skills and confidence G Varoquaux 8
  6. Mayavi: 3D visualization in Python Powerful visualization The power of

    VTK, accessible UI components Simplified scripting Reflexivity UI ↔ object API ⇒ from GUI to code Numpy as a data structure x, y, z = np.ogrid[-10:10:100j, -10:10:100j] from enthought.mayavi import mlab ctr = mlab.contour3d(.5*x**2 + y**2 + z**2) G Varoquaux 11
  7. Mayavi: 3D visualization in Python Limiting factors VTK ⇒ complexity

    Codebase complex and object-oriented VTK Factories, adapters, composition, listeners, singletons Users of GUIs do not turn into developers Complex API below the surface (transmitted by reflexivity) What made us: Simple API What killed us: Complex internals G Varoquaux 12
  8. Mayavi: only 2 core devs scikit-learn ∼ 300 email/month nipy

    ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” Code maintenance too expensive to be alone G Varoquaux 13
  9. Machine learning Learning rules from data Fitting/separating clouds of points

    Computational statistics on steroids Useful everywhere Science (data analysis) Artificial intelligence Health, retail... Applied maths ⇒ often obfuscated G Varoquaux 15
  10. scikit-learn: the vision Machine learning for everyone Complex algorithms Statistical

    expertise First roadblock to having an impact is adoption Facilitate Lower costs Develop new application Leveraging the Python ecosystem G Varoquaux 16
  11. scikit-learn: encapsulating The open box model model = svm.SVM(c=c) model.fit(X

    train, y train) y test = model.predict(X test) No need to understand internals - Democratizes - Models are interchangeable - Parameters expose internals Separate operations - All configuration at init G Varoquaux 17
  12. scikit-learn: framework, not Targeting all usecases impossible ⇒ Push user-implemented

    objects Leverage Python’s expressivity API (contract) must stay simple But...corner cases, advanced usage eg handling sample meta-data Designing opt-in rich API model.set fit request(sample weight=True) G Varoquaux 18
  13. Making easy is hard Technical choices Models with less knobs

    Stable algorithms Good defaults Usability Helping users write readable code Documentation Joint design of code & docs A MOOC, targetting everyone G Varoquaux 19
  14. Community-driven development A huge community 2 000 contributeurs, 20 active

    core - academia, start-up - world-wide, with a Paris hub Broadens perspective Different focus ⇒ better tool Organic growth Not all can be organic Bigger projects Complementary volunteer / full- time G Varoquaux 20
  15. Organizing people Communication is crucial No-one has the big picture

    Communication for alignment Recognition Broadcasting people’s name Clear-cut teams Norms on promotion Decision making Dialogue above hierarchy Accepting others > being “right” Taught me democracy: discussion and con- vincing is what builds the group G Varoquaux 21
  16. Ongoing efforts Performance Faster implementations eg: 10x in nearest neighbors

    GPU integration challenge: user-level ecosystem Usability Dataframe support Model visualization Model validation G Varoquaux 22
  17. The battle for the mind The lust for complexity People

    dream of big cars Bid data fantasy Reality: less than 1 MB 11 to 100 MB 1.1 to 10 GB 101 GB to 1 TB 11 to 100 TB 1.1 to 10 PB over 100 PB Promises of deep learning Often, tree-based models perform better [Grinsztajn..., NeurIPS 2022] A marketting shortcoming G Varoquaux 23
  18. Addressing the remaining gap: data prepation tab vec = skrub.TableVectorizer()

    X = tab vec.fit transform(df) Across tables Database Analyse Skrub: Coming soon G Varoquaux 24 Sex Date hired Position Title M 17/04/1998 Police Officer F 05/08/2012 Social Worker IV M 28/12/2017 Police Officer III F 10/09/2020 Police Aide
  19. Example-driven development Sphinx-gallery Builds docs from examples Used across the

    ecosystem Fosters simple APIs Compiling to the brower Jupyterlite = webassembly G Varoquaux 25
  20. Some lessons learned It’s about making accessible Communities can move

    mountains (and require people skills) Unchecked complexity is a killer Algorithmics & software engineering (tests, CI, version control) G Varoquaux 26
  21. Open-source software Open-source software makes a better world makes a

    better world makes a better world? makes a better world? G Varoquaux 28
  22. #1 impact of machine learning “The best minds of my

    generation are thinking about how to make people click ads” –Jeff Hammerbacher, early employee at Facebook G Varoquaux 29
  23. Are we enabling good? Selling influence to the best bidder

    Mind control of the rich on the poor Cambridge analytica used scikit-learn It was on their job postings G Varoquaux 30
  24. Choose what we facilitate I chose health AI for brain

    images? Brain imaging = rich data Small episode in a life Heatlh > Medicine G Varoquaux 31
  25. Choose what we facilitate I chose health AI for brain

    images? Brain imaging = rich data Small episode in a life Heatlh > Medicine Clinical records Study real-life health outcomes Beyond prediction: prescription Better policy > automated doctor Focus guides innovation: reveals challenges G Varoquaux 31
  26. Damages of IA Entranches inequalities Captures historically underserved populations Black

    ⇒ poor ⇒ crime Shifts power balances Big tech everywhere G Varoquaux 32
  27. It’s about automation A slow and deep transformation of our

    society Shift power balances: jobs are lost Can free time if we adapt the social structure G Varoquaux 33
  28. Science at large Scientific knowledge brings progress? Climate modeling failing

    to inform climate change policy Social efforts are needed everywhere G Varoquaux 34
  29. Pitfalls of solutionism For every problem, seeking technological solution Shiny

    magic tool In health: the automated doctor In global warming: the electric cars Avoids addressing people Behaviors are part of the equation But tools are easier to control Loosing humanism G Varoquaux 35
  30. Appropriate technology Transparency, outreach To foster trust Public understanding Autonomy,

    appropriation People should own their decision big tech People understand what they can manipulate Favor lightweight tech More cellphones than car in Africa G Varoquaux 36
  31. By creating the technology, we govern who has access to

    it A complexity utility tradeoff Tech’s social norm Value system of big players defines the cool We must act on social norms of success G Varoquaux 37
  32. Saving the world one line at a time? The scientific

    Python ecosystem Success via democratizing and community Appropriate technology Technology shapes society As tech actors, we can & must chose how we innovate Lightweight enables the many, not the mighty Social solutions Inspiring others goes further than doing alone Social action is needed to turn technological progress turns into societal one Industrial revolutions: laws needed to turn productivity gains into school time G Varoquaux 38 @GaelVaroquaux