Saving the world one line at a time?

Saving the world one line at a time? Ga¨ el
Varoquaux

Science & technology change society Newtonian Mechanics Yesterday: planetary movements
Today: bridges Quantum Mechanics Nuclear energy Cognitive Psychology Better education G Varoquaux 1

My craft: coding Coding is was I did best for
the better or the worst (also, I preferred talking to computers than to people) Coding is amazing because it enables a nobody to create G Varoquaux 2

Open source Open source opens knowledges It will democratize science
Facilitate teaching Reach developing countries Software scarcity economics sharing it is free may even reduce costs G Varoquaux 3

G Varoquaux 4

Outline 1 Growing a Python stack 2 Societal impact? G
Varoquaux 5

scikit 1 Growing a Python stack

Mid-2000s, matlab was cool Massive amount of code and teaching
materials in matlab 2020s Python is everywhere scipy cited by 20124 scikit-learn cited by 78496 Free and open Made by a community How did that happen? G Varoquaux 7

Stories of personal involvement Online, conferences, sprints I got involved
in scipy: Friendliness of developers Fernando Perez (IPython), Prabhu Ramachandran (Mayavi)... More positive feedback than in my work in physics The impression of being useful, making a positive change Growing in skills and confidence G Varoquaux 8

Software adventures & lessons learned G Varoquaux 9 scikit

Mayavi: 3D visualization in Python My first major package G
Varoquaux 10

Mayavi: 3D visualization in Python Powerful visualization The power of
VTK, accessible UI components Simplified scripting Reflexivity UI ↔ object API ⇒ from GUI to code Numpy as a data structure x, y, z = np.ogrid[-10:10:100j, -10:10:100j] from enthought.mayavi import mlab ctr = mlab.contour3d(.5*x**2 + y**2 + z**2) G Varoquaux 11

Mayavi: 3D visualization in Python Limiting factors VTK ⇒ complexity
Codebase complex and object-oriented VTK Factories, adapters, composition, listeners, singletons Users of GUIs do not turn into developers Complex API below the surface (transmitted by reflexivity) What made us: Simple API What killed us: Complex internals G Varoquaux 12

Mayavi: only 2 core devs scikit-learn ∼ 300 email/month nipy
∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” Code maintenance too expensive to be alone G Varoquaux 13

scikit-learn: machine learning in Python #1 machine-learning package; millions of
users G Varoquaux 14

Machine learning Learning rules from data Fitting/separating clouds of points
Computational statistics on steroids Useful everywhere Science (data analysis) Artificial intelligence Health, retail... Applied maths ⇒ often obfuscated G Varoquaux 15

scikit-learn: the vision Machine learning for everyone Complex algorithms Statistical
expertise First roadblock to having an impact is adoption Facilitate Lower costs Develop new application Leveraging the Python ecosystem G Varoquaux 16

scikit-learn: encapsulating The open box model model = svm.SVM(c=c) model.fit(X
train, y train) y test = model.predict(X test) No need to understand internals - Democratizes - Models are interchangeable - Parameters expose internals Separate operations - All configuration at init G Varoquaux 17

scikit-learn: framework, not Targeting all usecases impossible ⇒ Push user-implemented
objects Leverage Python’s expressivity API (contract) must stay simple But...corner cases, advanced usage eg handling sample meta-data Designing opt-in rich API model.set fit request(sample weight=True) G Varoquaux 18

Making easy is hard Technical choices Models with less knobs
Stable algorithms Good defaults Usability Helping users write readable code Documentation Joint design of code & docs A MOOC, targetting everyone G Varoquaux 19

Community-driven development A huge community 2 000 contributeurs, 20 active
core - academia, start-up - world-wide, with a Paris hub Broadens perspective Different focus ⇒ better tool Organic growth Not all can be organic Bigger projects Complementary volunteer / full- time G Varoquaux 20

Organizing people Communication is crucial No-one has the big picture
Communication for alignment Recognition Broadcasting people’s name Clear-cut teams Norms on promotion Decision making Dialogue above hierarchy Accepting others > being “right” Taught me democracy: discussion and con- vincing is what builds the group G Varoquaux 21

Ongoing efforts Performance Faster implementations eg: 10x in nearest neighbors
GPU integration challenge: user-level ecosystem Usability Dataframe support Model visualization Model validation G Varoquaux 22

The battle for the mind The lust for complexity People
dream of big cars Bid data fantasy Reality: less than 1 MB 11 to 100 MB 1.1 to 10 GB 101 GB to 1 TB 11 to 100 TB 1.1 to 10 PB over 100 PB Promises of deep learning Often, tree-based models perform better [Grinsztajn..., NeurIPS 2022] A marketting shortcoming G Varoquaux 23

Addressing the remaining gap: data prepation tab vec = skrub.TableVectorizer()
X = tab vec.fit transform(df) Across tables Database Analyse Skrub: Coming soon G Varoquaux 24 Sex Date hired Position Title M 17/04/1998 Police Officer F 05/08/2012 Social Worker IV M 28/12/2017 Police Officer III F 10/09/2020 Police Aide

Example-driven development Sphinx-gallery Builds docs from examples Used across the
ecosystem Fosters simple APIs Compiling to the brower Jupyterlite = webassembly G Varoquaux 25

Some lessons learned It’s about making accessible Communities can move
mountains (and require people skills) Unchecked complexity is a killer Algorithmics & software engineering (tests, CI, version control) G Varoquaux 26

2 Societal impact

Open-source software Open-source software makes a better world makes a
better world makes a better world? makes a better world? G Varoquaux 28

#1 impact of machine learning “The best minds of my
generation are thinking about how to make people click ads” –Jeff Hammerbacher, early employee at Facebook G Varoquaux 29

Are we enabling good? Selling influence to the best bidder
Mind control of the rich on the poor Cambridge analytica used scikit-learn It was on their job postings G Varoquaux 30

Choose what we facilitate I chose health AI for brain
images? Brain imaging = rich data Small episode in a life Heatlh > Medicine G Varoquaux 31

Choose what we facilitate I chose health AI for brain
images? Brain imaging = rich data Small episode in a life Heatlh > Medicine Clinical records Study real-life health outcomes Beyond prediction: prescription Better policy > automated doctor Focus guides innovation: reveals challenges G Varoquaux 31

Damages of IA Entranches inequalities Captures historically underserved populations Black
⇒ poor ⇒ crime Shifts power balances Big tech everywhere G Varoquaux 32

It’s about automation A slow and deep transformation of our
society Shift power balances: jobs are lost Can free time if we adapt the social structure G Varoquaux 33

Science at large Scientific knowledge brings progress? Climate modeling failing
to inform climate change policy Social efforts are needed everywhere G Varoquaux 34

Pitfalls of solutionism For every problem, seeking technological solution Shiny
magic tool In health: the automated doctor In global warming: the electric cars Avoids addressing people Behaviors are part of the equation But tools are easier to control Loosing humanism G Varoquaux 35

Appropriate technology Transparency, outreach To foster trust Public understanding Autonomy,
appropriation People should own their decision big tech People understand what they can manipulate Favor lightweight tech More cellphones than car in Africa G Varoquaux 36

By creating the technology, we govern who has access to
it A complexity utility tradeoff Tech’s social norm Value system of big players defines the cool We must act on social norms of success G Varoquaux 37

Saving the world one line at a time? The scientific
Python ecosystem Success via democratizing and community Appropriate technology Technology shapes society As tech actors, we can & must chose how we innovate Lightweight enables the many, not the mighty Social solutions Inspiring others goes further than doing alone Social action is needed to turn technological progress turns into societal one Industrial revolutions: laws needed to turn productivity gains into school time G Varoquaux 38 @GaelVaroquaux

Saving the world one line at a time?

Saving the world one line at a time?

More Decks by Gael Varoquaux

Other Decks in Technology

Featured

Transcript