Slide 1

Slide 1 text

C a n S o f t wa r e S av e S c i e n c e ? R y a n A b e r n a t h e y A M S 2 0 2 0

Slide 2

Slide 2 text

A b o u t M e ( v 1 ) !2 • Physical Oceanographer, PhD. From MIT • Associate Professor at Columbia / LDEO • Output: papers • 45 Peer Reviewed Publications
 H-index of 16 • NASA New Investigator Award,
 Sloan Fellowship in Ocean Sciences,
 NSF Career Award …

Slide 3

Slide 3 text

!3

Slide 4

Slide 4 text

• Output: software • Core developer of Xarray • Core developer of Zarr • Co-founder of Pangeo • Creator of lots of other software tools (pyqg xgcm, xrft, xhistogram, …) A b o u t M e ( V 2 ) !4

Slide 5

Slide 5 text

• Anxious • Competitive • Severe Imposter syndrome • Highly stressed → Not fun to be around • (despite all the privilege in the world) A b o u t M e ( v 1 ) !5 “confounded face”

Slide 6

Slide 6 text

• Relaxed • Having Fun • Great relationships with collaborators • Highly psyched → Fun to be around A b o u t M e ( v 2 ) !6 “grinning face with smiling eyes”

Slide 7

Slide 7 text

I s i t j u s t m e ? !7

Slide 8

Slide 8 text

a n e c d o t e 1 !8 Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture

Slide 9

Slide 9 text

a n e c d o t e 1 !8 Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture

Slide 10

Slide 10 text

a n e c d o t e 1 !8 Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture

Slide 11

Slide 11 text

a n e c d o t e 1 !8 Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture

Slide 12

Slide 12 text

a n e c d o t e 1 !8 Setting: Columbia Classroom, Fall 2018,
 Research Computing for Earth Science, Matplotlib Lecture

Slide 13

Slide 13 text

A N E C D O T E 2 !9

Slide 14

Slide 14 text

S c i e n c e h a s B i g P r o b l e m s !10 http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html

Slide 15

Slide 15 text

S c i e n c e h a s B i g P r o b l e m s !11 https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970

Slide 16

Slide 16 text

S c i e n c e h a s B i g P r o b l e m s !12

Slide 17

Slide 17 text

A r e w e i n a r at r a c e ? !13

Slide 18

Slide 18 text

A r e w e i n a r at r a c e ? !13 A rat race is an endless, self-defeating, or pointless pursuit. The phrase equates humans to rats attempting to earn a reward such as cheese, in vain… The term is commonly associated with an exhausting, repetitive lifestyle that leaves no time for relaxation or enjoyment.

Slide 19

Slide 19 text

A r e w e i n a r at r a c e ? !14 Papers Scientists

Slide 20

Slide 20 text

Pa p e r ≠ K n o w l e d g e !15 What is the best way to communicate scientific knowledge in 2020?

Slide 21

Slide 21 text

W h at i s a s c i e n t i f i c D i s c o v e r y ? !16 New Data Inferences + Measurements
 (satellite, weather balloon etc.) Simulations, data analyses, equations, visualizations, interpretations. 99.9% of the time, this part is encoded in computer code. Old Data Paleoclimate discovery
 New proxy record Climate dynamics
 CMIP6 / ERA5 data analysis Theoretical GFD
 Equations and simulations

Slide 22

Slide 22 text

W h at i s a s c i e n t i f i c D i s c o v e r y ? !16 New Data Inferences + Measurements
 (satellite, weather balloon etc.) Simulations, data analyses, equations, visualizations, interpretations. 99.9% of the time, this part is encoded in computer code. Old Data Paleoclimate discovery
 New proxy record Climate dynamics
 CMIP6 / ERA5 data analysis Theoretical GFD
 Equations and simulations There are discoveries with no data. Are there discoveries with no code?

Slide 23

Slide 23 text

P r o p o s a l !17 Scientists should devote more effort to software and less to papers. Paper 1 Paper 2 Paper 3 Typical Ph.D. Output Current Proposed Data Product Software Paper 1

Slide 24

Slide 24 text

• Encourage all scientists to become responsible participants in the open source ecosystem. ‣ Contribute to “upstream” tools on which their research relies
 (not just code; bug reports, docs, examples, tutorials, etc.) • Empower all scientists as software / data engineers ‣ Learn best practices for sharing, testing, documenting, and collaborating on research software ‣ Use these practices to transform how we share knowledge S p e c i f i c s !18

Slide 25

Slide 25 text

E C O S Y S T E M !19 aosp SciPy Credit: Stephan Hoyer, Jake Vanderplas (SciPy 2015) Project Project Project Project Project

Slide 26

Slide 26 text

R e t h i n k i n g S c i e n t i f i c O u t p u t !20 Paper data data code Status Quo Future existing data new data Code Docs

Slide 27

Slide 27 text

Cloud computing environment C o n t i n u o u s I n t e g r at i o n o f S c i e n c e ? !21 Example project: “Assessing Global Sea Level Rise” Analysis Code Data Data Docs Execute code; generate outputs “Sea level has risen 10 cm since 1992…”

Slide 28

Slide 28 text

O P E N S O U R C E C U LT U R E !22 vs.

Slide 29

Slide 29 text

O P E N S O U R C E C U LT U R E !22 vs. “SciPy is the best conference ever!” “I look forward to SciPy all year long!”

Slide 30

Slide 30 text

• Open source culture is organized around community projects (e.g. scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)

Slide 31

Slide 31 text

• Open source culture is organized around community projects (e.g. scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)

Slide 32

Slide 32 text

• Open source culture is organized around community projects (e.g. scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019)

Slide 33

Slide 33 text

• Open source culture is organized around community projects (e.g. scikit-learn, metpy, etc.) • Individual achievement is not emphasized; Collaboration is essential • Progress is atomic, rapid, and public, with instant feedback and gratification (“agile”) • End result is a product, something others can really use and benefit from O P E N S O U R C E C U LT U R E !23 Pangeo Community Meeting (2019) • Mainstream scientific culture is organized around the individual scientist • Success is attributed to individual brilliance / hard work • Submit / revise / publish cycle takes years! (“waterfall”) • End result is a paper, impact measured in citations

Slide 34

Slide 34 text

C a r e e r Pat h s !24 Software engineering skills are in high demand! Contributing to open source builds a portfolio that can open many doors.

Slide 35

Slide 35 text

S u m m a r y !25 Proposed Data Product Software Paper By placing software at the center of our scientific practice, we can… • Create more robust, reproducible, extensible scientific knowledge • Have more fun and feel more connected to a community • Train a better workforce and create new job opportunities