Can Software Save Science?

Can Software Save Science?

Keynote presentation at AMS 2020 Symposium on Advances in Modeling and Analysis Using Python.

https://ams.confex.com/ams/2020Annual/meetingapp.cgi/Paper/370521

654d48d6c1c10c50c160954ba31207a2?s=128

Ryan Abernathey

January 15, 2020
Tweet

Transcript

  1. C a n S o f t wa r e

    S av e S c i e n c e ? R y a n A b e r n a t h e y A M S 2 0 2 0
  2. A b o u t M e ( v 1

    ) !2 • Physical Oceanographer, PhD. From MIT • Associate Professor at Columbia / LDEO • Output: papers • 45 Peer Reviewed Publications
 H-index of 16 • NASA New Investigator Award,
 Sloan Fellowship in Ocean Sciences,
 NSF Career Award …
  3. !3

  4. • Output: software • Core developer of Xarray • Core

    developer of Zarr • Co-founder of Pangeo • Creator of lots of other software tools (pyqg xgcm, xrft, xhistogram, …) A b o u t M e ( V 2 ) !4
  5. • Anxious • Competitive • Severe Imposter syndrome • Highly

    stressed → Not fun to be around • (despite all the privilege in the world) A b o u t M e ( v 1 ) !5 “confounded face”
  6. • Relaxed • Having Fun • Great relationships with collaborators

    • Highly psyched → Fun to be around A b o u t M e ( v 2 ) !6 “grinning face with smiling eyes”
  7. I s i t j u s t m e

    ? !7
  8. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  9. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  10. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  11. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  12. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  13. A N E C D O T E 2 !9

  14. S c i e n c e h a s

    B i g P r o b l e m s !10 http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html
  15. S c i e n c e h a s

    B i g P r o b l e m s !11 https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
  16. S c i e n c e h a s

    B i g P r o b l e m s !12
  17. A r e w e i n a r at

    r a c e ? !13
  18. A r e w e i n a r at

    r a c e ? !13 A rat race is an endless, self-defeating, or pointless pursuit. The phrase equates humans to rats attempting to earn a reward such as cheese, in vain… The term is commonly associated with an exhausting, repetitive lifestyle that leaves no time for relaxation or enjoyment.
  19. A r e w e i n a r at

    r a c e ? !14 Papers Scientists
  20. Pa p e r ≠ K n o w l

    e d g e !15 What is the best way to communicate scientific knowledge in 2020?
  21. W h at i s a s c i e

    n t i f i c D i s c o v e r y ? !16 New Data Inferences + Measurements
 (satellite, weather balloon etc.) Simulations, data analyses, equations, visualizations, interpretations. 99.9% of the time, this part is encoded in computer code. Old Data Paleoclimate discovery
 New proxy record Climate dynamics
 CMIP6 / ERA5 data analysis Theoretical GFD
 Equations and simulations
  22. W h at i s a s c i e

    n t i f i c D i s c o v e r y ? !16 New Data Inferences + Measurements
 (satellite, weather balloon etc.) Simulations, data analyses, equations, visualizations, interpretations. 99.9% of the time, this part is encoded in computer code. Old Data Paleoclimate discovery
 New proxy record Climate dynamics
 CMIP6 / ERA5 data analysis Theoretical GFD
 Equations and simulations There are discoveries with no data. Are there discoveries with no code?
  23. P r o p o s a l !17 Scientists

    should devote more effort to software and less to papers. Paper 1 Paper 2 Paper 3 Typical Ph.D. Output Current Proposed Data Product Software Paper 1
  24. • Encourage all scientists to become responsible participants in the

    open source ecosystem. ‣ Contribute to “upstream” tools on which their research relies
 (not just code; bug reports, docs, examples, tutorials, etc.) • Empower all scientists as software / data engineers ‣ Learn best practices for sharing, testing, documenting, and collaborating on research software ‣ Use these practices to transform how we share knowledge S p e c i f i c s !18
  25. E C O S Y S T E M !19

    aosp SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015) Project Project Project Project Project
  26. R e t h i n k i n g

    S c i e n t i f i c O u t p u t !20 Paper data data code Status Quo Future existing data new data Code Docs
  27. Cloud computing environment C o n t i n u

    o u s I n t e g r at i o n o f S c i e n c e ? !21 Example project: “Assessing Global Sea Level Rise” Analysis Code Data Data Docs Execute code; generate outputs “Sea level has risen 10 cm since 1992…”
  28. O P E N S O U R C E

    C U LT U R E !22 vs.
  29. O P E N S O U R C E

    C U LT U R E !22 vs. “SciPy is the best conference ever!” “I look forward to SciPy all year long!”
  30. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)
  31. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)
  32. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)
  33. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019) • Mainstream scientific culture is organized around the individual scientist • Success is attributed to individual brilliance / hard work • Submit / revise / publish cycle takes years! (“waterfall”) • End result is a paper, impact measured in citations
  34. C a r e e r Pat h s !24

    Software engineering skills are in high demand! Contributing to open source builds a portfolio that can open many doors.
  35. S u m m a r y !25 Proposed Data

    Product Software Paper By placing software at the center of our scientific practice, we can… • Create more robust, reproducible, extensible scientific knowledge • Have more fun and feel more connected to a community • Train a better workforce and create new job opportunities