Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Can Software Save Science?

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Can Software Save Science?

Keynote presentation at AMS 2020 Symposium on Advances in Modeling and Analysis Using Python.

https://ams.confex.com/ams/2020Annual/meetingapp.cgi/Paper/370521

Avatar for Ryan Abernathey

Ryan Abernathey

January 15, 2020
Tweet

More Decks by Ryan Abernathey

Other Decks in Science

Transcript

  1. C a n S o f t wa r e

    S av e S c i e n c e ? R y a n A b e r n a t h e y A M S 2 0 2 0
  2. A b o u t M e ( v 1

    ) !2 • Physical Oceanographer, PhD. From MIT • Associate Professor at Columbia / LDEO • Output: papers • 45 Peer Reviewed Publications
 H-index of 16 • NASA New Investigator Award,
 Sloan Fellowship in Ocean Sciences,
 NSF Career Award …
  3. !3

  4. • Output: software • Core developer of Xarray • Core

    developer of Zarr • Co-founder of Pangeo • Creator of lots of other software tools (pyqg xgcm, xrft, xhistogram, …) A b o u t M e ( V 2 ) !4
  5. • Anxious • Competitive • Severe Imposter syndrome • Highly

    stressed → Not fun to be around • (despite all the privilege in the world) A b o u t M e ( v 1 ) !5 “confounded face”
  6. • Relaxed • Having Fun • Great relationships with collaborators

    • Highly psyched → Fun to be around A b o u t M e ( v 2 ) !6 “grinning face with smiling eyes”
  7. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  8. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  9. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  10. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  11. a n e c d o t e 1 !8

    Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture
  12. S c i e n c e h a s

    B i g P r o b l e m s !10 http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html
  13. S c i e n c e h a s

    B i g P r o b l e m s !11 https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970
  14. S c i e n c e h a s

    B i g P r o b l e m s !12
  15. A r e w e i n a r at

    r a c e ? !13
  16. A r e w e i n a r at

    r a c e ? !13 A rat race is an endless, self-defeating, or pointless pursuit. The phrase equates humans to rats attempting to earn a reward such as cheese, in vain… The term is commonly associated with an exhausting, repetitive lifestyle that leaves no time for relaxation or enjoyment.
  17. A r e w e i n a r at

    r a c e ? !14 Papers Scientists
  18. Pa p e r ≠ K n o w l

    e d g e !15 What is the best way to communicate scientific knowledge in 2020?
  19. W h at i s a s c i e

    n t i f i c D i s c o v e r y ? !16 New Data Inferences + Measurements
 (satellite, weather balloon etc.) Simulations, data analyses, equations, visualizations, interpretations. 99.9% of the time, this part is encoded in computer code. Old Data Paleoclimate discovery
 New proxy record Climate dynamics
 CMIP6 / ERA5 data analysis Theoretical GFD
 Equations and simulations
  20. W h at i s a s c i e

    n t i f i c D i s c o v e r y ? !16 New Data Inferences + Measurements
 (satellite, weather balloon etc.) Simulations, data analyses, equations, visualizations, interpretations. 99.9% of the time, this part is encoded in computer code. Old Data Paleoclimate discovery
 New proxy record Climate dynamics
 CMIP6 / ERA5 data analysis Theoretical GFD
 Equations and simulations There are discoveries with no data. Are there discoveries with no code?
  21. P r o p o s a l !17 Scientists

    should devote more effort to software and less to papers. Paper 1 Paper 2 Paper 3 Typical Ph.D. Output Current Proposed Data Product Software Paper 1
  22. • Encourage all scientists to become responsible participants in the

    open source ecosystem. ‣ Contribute to “upstream” tools on which their research relies
 (not just code; bug reports, docs, examples, tutorials, etc.) • Empower all scientists as software / data engineers ‣ Learn best practices for sharing, testing, documenting, and collaborating on research software ‣ Use these practices to transform how we share knowledge S p e c i f i c s !18
  23. E C O S Y S T E M !19

    aosp SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015) Project Project Project Project Project
  24. R e t h i n k i n g

    S c i e n t i f i c O u t p u t !20 Paper data data code Status Quo Future existing data new data Code Docs
  25. Cloud computing environment C o n t i n u

    o u s I n t e g r at i o n o f S c i e n c e ? !21 Example project: “Assessing Global Sea Level Rise” Analysis Code Data Data Docs Execute code; generate outputs “Sea level has risen 10 cm since 1992…”
  26. O P E N S O U R C E

    C U LT U R E !22 vs.
  27. O P E N S O U R C E

    C U LT U R E !22 vs. “SciPy is the best conference ever!” “I look forward to SciPy all year long!”
  28. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)
  29. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)
  30. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)
  31. • Open source culture is organized around community projects (e.g.

    scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019) • Mainstream scientific culture is organized around the individual scientist • Success is attributed to individual brilliance / hard work • Submit / revise / publish cycle takes years! (“waterfall”) • End result is a paper, impact measured in citations
  32. C a r e e r Pat h s !24

    Software engineering skills are in high demand! Contributing to open source builds a portfolio that can open many doors.
  33. S u m m a r y !25 Proposed Data

    Product Software Paper By placing software at the center of our scientific practice, we can… • Create more robust, reproducible, extensible scientific knowledge • Have more fun and feel more connected to a community • Train a better workforce and create new job opportunities