Slide 1

Slide 1 text

Saving the world one line at a time? Ga¨ el Varoquaux

Slide 2

Slide 2 text

Science & technology change society Newtonian Mechanics Yesterday: planetary movements Today: bridges Quantum Mechanics Nuclear energy Cognitive Psychology Better education G Varoquaux 1

Slide 3

Slide 3 text

My craft: coding Coding is was I did best for the better or the worst (also, I preferred talking to computers than to people) Coding is amazing because it enables a nobody to create G Varoquaux 2

Slide 4

Slide 4 text

Open source Open source opens knowledges It will democratize science Facilitate teaching Reach developing countries Software scarcity economics sharing it is free may even reduce costs G Varoquaux 3

Slide 5

Slide 5 text

G Varoquaux 4

Slide 6

Slide 6 text

Outline 1 Growing a Python stack 2 Societal impact? G Varoquaux 5

Slide 7

Slide 7 text

scikit 1 Growing a Python stack

Slide 8

Slide 8 text

Mid-2000s, matlab was cool Massive amount of code and teaching materials in matlab 2020s Python is everywhere scipy cited by 20124 scikit-learn cited by 78496 Free and open Made by a community How did that happen? G Varoquaux 7

Slide 9

Slide 9 text

Stories of personal involvement Online, conferences, sprints I got involved in scipy: Friendliness of developers Fernando Perez (IPython), Prabhu Ramachandran (Mayavi)... More positive feedback than in my work in physics The impression of being useful, making a positive change Growing in skills and confidence G Varoquaux 8

Slide 10

Slide 10 text

Software adventures & lessons learned G Varoquaux 9 scikit

Slide 11

Slide 11 text

Mayavi: 3D visualization in Python My first major package G Varoquaux 10

Slide 12

Slide 12 text

Mayavi: 3D visualization in Python Powerful visualization The power of VTK, accessible UI components Simplified scripting Reflexivity UI ↔ object API ⇒ from GUI to code Numpy as a data structure x, y, z = np.ogrid[-10:10:100j, -10:10:100j] from enthought.mayavi import mlab ctr = mlab.contour3d(.5*x**2 + y**2 + z**2) G Varoquaux 11

Slide 13

Slide 13 text

Mayavi: 3D visualization in Python Limiting factors VTK ⇒ complexity Codebase complex and object-oriented VTK Factories, adapters, composition, listeners, singletons Users of GUIs do not turn into developers Complex API below the surface (transmitted by reflexivity) What made us: Simple API What killed us: Complex internals G Varoquaux 12

Slide 14

Slide 14 text

Mayavi: only 2 core devs scikit-learn ∼ 300 email/month nipy ∼ 45 email/month joblib ∼ 45 email/month mayavi ∼ 30 email/month “Hey Gael, I take it you’re too busy. That’s okay, I spent a day trying to install XXX and I think I’ll succeed myself. Next time though please don’t ignore my emails, I really don’t like it. You can say, ‘sorry, I have no time to help you.’ Just don’t ignore.” Code maintenance too expensive to be alone G Varoquaux 13

Slide 15

Slide 15 text

scikit-learn: machine learning in Python #1 machine-learning package; millions of users G Varoquaux 14

Slide 16

Slide 16 text

Machine learning Learning rules from data Fitting/separating clouds of points Computational statistics on steroids Useful everywhere Science (data analysis) Artificial intelligence Health, retail... Applied maths ⇒ often obfuscated G Varoquaux 15

Slide 17

Slide 17 text

scikit-learn: the vision Machine learning for everyone Complex algorithms Statistical expertise First roadblock to having an impact is adoption Facilitate Lower costs Develop new application Leveraging the Python ecosystem G Varoquaux 16

Slide 18

Slide 18 text

scikit-learn: encapsulating The open box model model = svm.SVM(c=c) model.fit(X train, y train) y test = model.predict(X test) No need to understand internals - Democratizes - Models are interchangeable - Parameters expose internals Separate operations - All configuration at init G Varoquaux 17

Slide 19

Slide 19 text

scikit-learn: framework, not Targeting all usecases impossible ⇒ Push user-implemented objects Leverage Python’s expressivity API (contract) must stay simple But...corner cases, advanced usage eg handling sample meta-data Designing opt-in rich API model.set fit request(sample weight=True) G Varoquaux 18

Slide 20

Slide 20 text

Making easy is hard Technical choices Models with less knobs Stable algorithms Good defaults Usability Helping users write readable code Documentation Joint design of code & docs A MOOC, targetting everyone G Varoquaux 19

Slide 21

Slide 21 text

Community-driven development A huge community 2 000 contributeurs, 20 active core - academia, start-up - world-wide, with a Paris hub Broadens perspective Different focus ⇒ better tool Organic growth Not all can be organic Bigger projects Complementary volunteer / full- time G Varoquaux 20

Slide 22

Slide 22 text

Organizing people Communication is crucial No-one has the big picture Communication for alignment Recognition Broadcasting people’s name Clear-cut teams Norms on promotion Decision making Dialogue above hierarchy Accepting others > being “right” Taught me democracy: discussion and con- vincing is what builds the group G Varoquaux 21

Slide 23

Slide 23 text

Ongoing efforts Performance Faster implementations eg: 10x in nearest neighbors GPU integration challenge: user-level ecosystem Usability Dataframe support Model visualization Model validation G Varoquaux 22

Slide 24

Slide 24 text

The battle for the mind The lust for complexity People dream of big cars Bid data fantasy Reality: less than 1 MB 11 to 100 MB 1.1 to 10 GB 101 GB to 1 TB 11 to 100 TB 1.1 to 10 PB over 100 PB Promises of deep learning Often, tree-based models perform better [Grinsztajn..., NeurIPS 2022] A marketting shortcoming G Varoquaux 23

Slide 25

Slide 25 text

Addressing the remaining gap: data prepation tab vec = skrub.TableVectorizer() X = tab vec.fit transform(df) Across tables Database Analyse Skrub: Coming soon G Varoquaux 24 Sex Date hired Position Title M 17/04/1998 Police Officer F 05/08/2012 Social Worker IV M 28/12/2017 Police Officer III F 10/09/2020 Police Aide

Slide 26

Slide 26 text

Example-driven development Sphinx-gallery Builds docs from examples Used across the ecosystem Fosters simple APIs Compiling to the brower Jupyterlite = webassembly G Varoquaux 25

Slide 27

Slide 27 text

Some lessons learned It’s about making accessible Communities can move mountains (and require people skills) Unchecked complexity is a killer Algorithmics & software engineering (tests, CI, version control) G Varoquaux 26

Slide 28

Slide 28 text

2 Societal impact

Slide 29

Slide 29 text

Open-source software Open-source software makes a better world makes a better world makes a better world? makes a better world? G Varoquaux 28

Slide 30

Slide 30 text

#1 impact of machine learning “The best minds of my generation are thinking about how to make people click ads” –Jeff Hammerbacher, early employee at Facebook G Varoquaux 29

Slide 31

Slide 31 text

Are we enabling good? Selling influence to the best bidder Mind control of the rich on the poor Cambridge analytica used scikit-learn It was on their job postings G Varoquaux 30

Slide 32

Slide 32 text

Choose what we facilitate I chose health AI for brain images? Brain imaging = rich data Small episode in a life Heatlh > Medicine G Varoquaux 31

Slide 33

Slide 33 text

Choose what we facilitate I chose health AI for brain images? Brain imaging = rich data Small episode in a life Heatlh > Medicine Clinical records Study real-life health outcomes Beyond prediction: prescription Better policy > automated doctor Focus guides innovation: reveals challenges G Varoquaux 31

Slide 34

Slide 34 text

Damages of IA Entranches inequalities Captures historically underserved populations Black ⇒ poor ⇒ crime Shifts power balances Big tech everywhere G Varoquaux 32

Slide 35

Slide 35 text

It’s about automation A slow and deep transformation of our society Shift power balances: jobs are lost Can free time if we adapt the social structure G Varoquaux 33

Slide 36

Slide 36 text

Science at large Scientific knowledge brings progress? Climate modeling failing to inform climate change policy Social efforts are needed everywhere G Varoquaux 34

Slide 37

Slide 37 text

Pitfalls of solutionism For every problem, seeking technological solution Shiny magic tool In health: the automated doctor In global warming: the electric cars Avoids addressing people Behaviors are part of the equation But tools are easier to control Loosing humanism G Varoquaux 35

Slide 38

Slide 38 text

Appropriate technology Transparency, outreach To foster trust Public understanding Autonomy, appropriation People should own their decision big tech People understand what they can manipulate Favor lightweight tech More cellphones than car in Africa G Varoquaux 36

Slide 39

Slide 39 text

By creating the technology, we govern who has access to it A complexity utility tradeoff Tech’s social norm Value system of big players defines the cool We must act on social norms of success G Varoquaux 37

Slide 40

Slide 40 text

Saving the world one line at a time? The scientific Python ecosystem Success via democratizing and community Appropriate technology Technology shapes society As tech actors, we can & must chose how we innovate Lightweight enables the many, not the mighty Social solutions Inspiring others goes further than doing alone Social action is needed to turn technological progress turns into societal one Industrial revolutions: laws needed to turn productivity gains into school time G Varoquaux 38 @GaelVaroquaux