Slide 1

Slide 1 text

The Unexpected Effectiveness of Python in Science Jake VanderPlas @jakevdp PyCon 2017

Slide 2

Slide 2 text

PyCon’s Mosaic

Slide 3

Slide 3 text

$ whoami jakevdp

Slide 4

Slide 4 text

$ whoami jakevdp

Slide 5

Slide 5 text

Code: Books: $ whoami jakevdp Blog: http://jakevdp.github.io

Slide 6

Slide 6 text

$ whoami jakevdp

Slide 7

Slide 7 text

$ whoami jakevdp

Slide 8

Slide 8 text

Charles Barsotti, New Yorker $ whoami jakevdp

Slide 9

Slide 9 text

Edwin Hubble at the 48" Schmidt Telescope, Palomar Observatory, 1949. (credit: PNAS) Astronomy Then . . .

Slide 10

Slide 10 text

Astronomy Now . . . Source: http://spacetelescope.org/ Source: http://sdss.org/ Hubble Space Telescope Sloan Digital Sky Survey

Slide 11

Slide 11 text

Source: http://spacetelescope.org Hubble’s “Ultra Deep Field”

Slide 12

Slide 12 text

Source: http://spacetelescope.org Hubble’s “Ultra Deep Field”

Slide 13

Slide 13 text

Source: http://sdss.org SDSS Galaxy Catalog

Slide 14

Slide 14 text

Source: http://sdss.org SDSS Galaxy Catalog

Slide 15

Slide 15 text

Astronomy in the 21st Century . . . Kepler (2009) JWST (2018) LSST (2020)

Slide 16

Slide 16 text

Artist’s Impression: Wikipedia TRAPPIST-1 Exoplanetary System

Slide 17

Slide 17 text

K2 Data: Ethan Kruse Kepler Telescope: NASA TRAPPIST-1 Exoplanetary System Kepler (K2) Observations

Slide 18

Slide 18 text

No content

Slide 19

Slide 19 text

Source: NASA James Webb Space Telescope (JWST)

Slide 20

Slide 20 text

James Webb Space Telescope (JWST) Source: NASA/JWST

Slide 21

Slide 21 text

James Webb Space Telescope (JWST) Source: NASA/JWST

Slide 22

Slide 22 text

No content

Slide 23

Slide 23 text

Large Synoptic Survey Telescope (credit: LSST Corp) Large Synoptic Survey Telescope

Slide 24

Slide 24 text

8.4-meter Primary Mirror

Slide 25

Slide 25 text

3 Gigapixel Camera

Slide 26

Slide 26 text

3 Gigapixel Camera

Slide 27

Slide 27 text

3 Gigapixel Camera

Slide 28

Slide 28 text

3 Gigapixel Camera = ~1500 HD TVs

Slide 29

Slide 29 text

- Survey mode: 2 exposures every ~30 seconds - Images the full southern sky every three nights for a decade - 15-30 TB/night! - Final 10-year catalog: 100s of Petabytes

Slide 30

Slide 30 text

What will we do with all this data? (Left as a 616-page exercise for the reader) https://www.lsst.org/scientists/scibook

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Thanks to Juan Nunez-Iglesias, Thomas P. Robitaille, and Chris Beaumont. Mentions of Software in Astronomy Publications: Compiled from NASA ADS (code).

Slide 33

Slide 33 text

The Unexpected Effectiveness of Python in Science

Slide 34

Slide 34 text

But Why Python? Python is a “teaching language” . . . created to “bridge the gap between the shell and C” “never intended. . . to be the primary language for programmers.” Guido Van Rossum The Making of Python

Slide 35

Slide 35 text

“I thought we'd write small Python programs, maybe 10 lines, maybe 50, maybe 500 lines — that would be a big one” Guido Van Rossum The Making of Python

Slide 36

Slide 36 text

Why is Python such an effective tool in science?

Slide 37

Slide 37 text

Why is Python such an effective tool in science? 1. Interoperability with Other Languages

Slide 38

Slide 38 text

“If I have seen further, it is by standing on the shoulders of giants.” - Isaac Newton

Slide 39

Slide 39 text

“If I have seen further, it is by importing from the code of giants.” - Definitely Not Isaac Newton

Slide 40

Slide 40 text

“Scientists... work with a wide variety of systems ranging from simulation codes, data analysis packages, databases, visualization tools, and home-grown software-each of which presents the user with a different set of interfaces and file formats. As a result, a scientist may spend a considerable amount of time simply trying to get all of these components to work together in some manner...” - David Beazley Pythonista Extraordinaire Scientific Computing with Python (ACM vol. 216, 2000) Science Before Python . . .

Slide 41

Slide 41 text

Science Before Python . . . “I had a hodge-podge of work processes. I would have Perl scripts that called C++ numerical routines that would dump data files, and I would load them up into MatLab to plot them. After a while I got tired of the MatLab dependency. . . so I started loading them up in GnuPlot.” -John Hunter creator of Matplotlib SciPy 2012 Keynote

Slide 42

Slide 42 text

Science Before Python . . . “My advisor had a heavily customized awk/sed/bash workflow to manage job submissions and postprocessing of C codes for supercomputing runs… So I used her scripts to run my jobs, and on top of that had added my own layer of Perl, plus a hefty amount of Gnuplot, IDL and Mathematica.” - Fernando Perez creator of IPython via email

Slide 43

Slide 43 text

Python is Glue.

Slide 44

Slide 44 text

Python glues together this hodge-podge of scientific tools. High-level syntax wraps low-level C/Fortran libraries, which is (mostly) where the computation happens. Python is Glue.

Slide 45

Slide 45 text

Why is Python such an effective tool in science? 1. Interoperability with Other Languages 2. “Batteries Included” + Third-Party Modules

Slide 46

Slide 46 text

Python has built-in libraries for nearly everything . . . . . . and there are third-party libraries for everything else.

Slide 47

Slide 47 text

The Genesis of Scientific Python “Prior to Python, I used Perl (for a year) and then Matlab and shell scripts & Fortran & C/C++ libraries. When I discovered Python, I really liked the language... But, it was very nascent and lacked a lot of libraries. I felt like I could add value to the world by connecting low-level libraries to high-level usage in Python.” - Travis Oliphant creator of NumPy & SciPy via email

Slide 48

Slide 48 text

Python’s Scientific Stack

Slide 49

Slide 49 text

Python’s Scientific Stack

Slide 50

Slide 50 text

Bokeh Python’s Scientific Stack

Slide 51

Slide 51 text

Bokeh Python’s Scientific Stack

Slide 52

Slide 52 text

Python’s Scientific Ecosystem (and many, many more) Bokeh

Slide 53

Slide 53 text

Why is Python such an effective tool in science? 1. Interoperability with Other Languages 2. “Batteries Included” + Third-Party Modules 3. Simplicity & Dynamic Nature

Slide 54

Slide 54 text

https://xkcd.com/353/

Slide 55

Slide 55 text

Python Enters Science: Python in Astronomy 2015 “Python is a language that is very powerful for developers, but is also accessible to Astronomers. Getting those two classes of people using the same tools, I think, provides a huge benefit that’s not always noticed or mentioned.” - Perry Greenfield Space Telescope Science Institute PyAstro 2015

Slide 56

Slide 56 text

Often-overlooked fact . . . For day-to-day scientific data exploration, speed of development is primary, and speed of execution is often secondary. Background Source

Slide 57

Slide 57 text

Why don’t you use C instead of Python? It’s so much faster!

Slide 58

Slide 58 text

Why don’t you commute by airplane instead of by car? It’s so much faster! Why don’t you use C instead of Python? It’s so much faster!

Slide 59

Slide 59 text

Ada Marie did what scientists do: She asked a small question, and then she asked two. And each of those led her to three questions more, And some of those questions resulted in four. From Ada Twist, Scientist by Andrea Beaty & David Roberts Scientific Coding is Nonlinear and Exploratory

Slide 60

Slide 60 text

Jupyter notebooks embody this kind of quick, nonlinear exploration:

Slide 61

Slide 61 text

Why is Python such an effective tool in science? 1. Interoperability with Other Languages 2. “Batteries Included” + Third-Party Modules 3. Simplicity & Dynamic Nature 4. Open ethos well-fit to science

Slide 62

Slide 62 text

No content

Slide 63

Slide 63 text

No content

Slide 64

Slide 64 text

No content

Slide 65

Slide 65 text

No content

Slide 66

Slide 66 text

No content

Slide 67

Slide 67 text

No content

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

“An article about computational result is advertising, not scholarship. The actual scholarship is the full software environment, code and data, that produced the result.” –Buckheit and Donoho (1995)

Slide 70

Slide 70 text

LIGO Gravitational Wave Event (GW150914) Source: LIGO

Slide 71

Slide 71 text

LIGO Gravitational Wave Event (GW150914) Source: LIGO

Slide 72

Slide 72 text

LIGO Gravitational Wave Event (GW150914) Source: LIGO

Slide 73

Slide 73 text

My Projects: Same Open Philosophy

Slide 74

Slide 74 text

My Projects: Same Open Philosophy Entire content available on GitHub as Jupyter Notebooks

Slide 75

Slide 75 text

Python World Influencing Science . . . Scientists are increasingly hosting research code on Github & similar services to aid in reproducibility.

Slide 76

Slide 76 text

Traditional Astronomy Software Python & Open Source Possessive/non-sharing Cooperative/sharing Fragmented & Overlapping efforts Build on common projects Top-down planning Bottom-up/Loose organization Committee-oriented design Design by “doers” Endless analysis & argument Action-oriented & experimentation Unwilling to discard old tech Good at replacing old tech No leader to resolve conflicts BDFL resolves conflicts Adapted From Perry Greenfield’s PyData Keynote Python World Influencing Science . . . Python’s software practices increasingly adopted by academia

Slide 77

Slide 77 text

Why is Python such an effective tool in science? 1. Interoperability with Other Languages 2. “Batteries Included” + Third-Party Modules 3. Simplicity & Dynamic Nature 4. Open ethos well-fit to science

Slide 78

Slide 78 text

PyCon’s Mosaic

Slide 79

Slide 79 text

Email: [email protected] Twitter: @jakevdp Github: jakevdp Web: http://vanderplas.com/ Blog: http://jakevdp.github.io/ Thank You!