Slide 1

Slide 1 text

Losing Your Loops Fast Numerical Computing with NumPy Jake VanderPlas PyCon 2015

Slide 2

Slide 2 text

Python is Fast . . . for Writing, Testing, and Developing Code

Slide 3

Slide 3 text

Python is Fast . . . for Writing, Testing, and Developing Code

Slide 4

Slide 4 text

Python is Fast . . . for Writing, Testing, and Developing Code

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

Python is Fast . . . because it is interpreted, dynamically typed, and high-level

Slide 7

Slide 7 text

Python is Slow . . . for Repeated Execution of Low-level Tasks

Slide 8

Slide 8 text

Python is Slow (%timeit is a useful magic command available in IPython) A simple function implemented in Python . . .

Slide 9

Slide 9 text

Python is Slow The same function implemented in Fortran . . .

Slide 10

Slide 10 text

Python is Slow Python is ~100x slower than Fortran for this simple task!

Slide 11

Slide 11 text

Python is Slow Why is Python Slow? . . . for Repeated Execution of Low-level Tasks Python is a high-level, interpreted and dynamically-typed language. Each Python operation comes with a small type-checking overhead. With many repeated small operations (e.g. in a loop), this overhead becomes significant!

Slide 12

Slide 12 text

what makes Python fast (for development) is what makes Python slow (for code execution) The paradox . . . * Though JIT compilers like PyPy, Numba, etc. may change this soon . . .

Slide 13

Slide 13 text

NumPy is designed to help us get the best of both worlds . . . - Fast development time of Python - Fast execution time of C/Fortran . . . by pushing repeated operations into a statically-typed compiled layer. import numpy

Slide 14

Slide 14 text

Four Strategies For Speeding-up Code with NumPy 1. Use NumPy’s ufuncs 2. Use NumPy’s aggregations 3. Use NumPy’s broadcasting 4. Use NumPy’s slicing, masking, and fancy indexing Overall goal: push repeated operations into compiled code and Get Rid of Slow Loops!

Slide 15

Slide 15 text

Strategy #1: Use NumPy’s ufuncs

Slide 16

Slide 16 text

Use NumPy’s ufuncs Strategy #1: ufuncs are NumPy’s Universal Functions . . . They operate element-wise on arrays.

Slide 17

Slide 17 text

Use NumPy’s ufuncs Strategy #1: Element-wise operations . . . . . . with Python lists:

Slide 18

Slide 18 text

Use NumPy’s ufuncs Strategy #1: Element-wise operations . . . . . . with Python lists: . . . with NumPy arrays:

Slide 19

Slide 19 text

Use NumPy’s ufuncs Strategy #1: Ufuncs are fast . . .

Slide 20

Slide 20 text

Use NumPy’s ufuncs Strategy #1: Ufuncs are fast . . .

Slide 21

Slide 21 text

Use NumPy’s ufuncs Strategy #1: Ufuncs are fast . . . . . . 100x speedup with NumPy!

Slide 22

Slide 22 text

Use NumPy’s ufuncs Strategy #1: There are many ufuncs available: - Arithmetic Operators: + - * / // % ** - Bitwise Operators: & | ~ ^ >> << - Comparison Oper’s: < > <= >= == != - Trig Family: np.sin, np.cos, np.tan ... - Exponential Family: np.exp, np.log, np.log10 ... - Special Functions: scipy.special.* . . . and many, many more.

Slide 23

Slide 23 text

Strategy #2: Use NumPy’s aggregations

Slide 24

Slide 24 text

Strategy #2: Use NumPy’s aggregations Aggregations are functions which summarize the values in an array (e.g. min, max, sum, mean, etc.)

Slide 25

Slide 25 text

Strategy #2: Use NumPy’s aggregations NumPy aggregations are much faster than Python built-ins . . .

Slide 26

Slide 26 text

Strategy #2: Use NumPy’s aggregations NumPy aggregations are much faster than Python built-ins . . .

Slide 27

Slide 27 text

Strategy #2: Use NumPy’s aggregations NumPy aggregations are much faster than Python built-ins . . . ~70x speedup with NumPy!

Slide 28

Slide 28 text

Strategy #2: Use NumPy’s aggregations NumPy aggregations also work on multi-dimensional arrays . . .

Slide 29

Slide 29 text

Strategy #2: Use NumPy’s aggregations NumPy aggregations also work on multi-dimensional arrays . . .

Slide 30

Slide 30 text

Strategy #2: Use NumPy’s aggregations Lots of aggregations available . . . np.min() np.max() np.sum() np.prod() np.mean() np.std() np.var() np.any() np.all() np.median() np.percentile() np.argmin() np.argmax() . . . np.nanmin() np.nanmax() np.nansum(). . . . . . and all have the same call signature. Use them often!

Slide 31

Slide 31 text

Strategy #3: Use NumPy’s broadcasting

Slide 32

Slide 32 text

Strategy #3: Use NumPy’s broadcasting Broadcasting is a set of rules by which ufuncs operate on arrays of different sizes and/or dimensions.

Slide 33

Slide 33 text

Strategy #3: Use NumPy’s broadcasting Image source: http://astroML.org/ Visualizing Broadcasting...

Slide 34

Slide 34 text

Strategy #3: Use NumPy’s broadcasting Broadcasting rules . . . 1. If array shapes differ, left-pad the smaller shape with 1s 2. If any dimension does not match, broadcast the dimension with size=1 3. If neither non-matching dimension is 1, raise an error.

Slide 35

Slide 35 text

Strategy #3: Use NumPy’s broadcasting 1. If array shapes differ, left-pad the smaller shape with 1s 2. If any dimension does not match, broadcast the dimension with size=1 3. If neither non-matching dimension is 1, raise an error. shape=[3] shape=[] 1. shape=[3] shape=[1] 2. shape=[3] shape=[3] final shape = [3]

Slide 36

Slide 36 text

Strategy #3: Use NumPy’s broadcasting 1. If array shapes differ, left-pad the smaller shape with 1s 2. If any dimension does not match, broadcast the dimension with size=1 3. If neither non-matching dimension is 1, raise an error. shape=[3,3] shape=[3] 1. shape=[3,3] shape=[1,3] 2. shape=[3,3] shape=[3,3] final shape = [3,3]

Slide 37

Slide 37 text

Strategy #3: Use NumPy’s broadcasting 1. If array shapes differ, left-pad the smaller shape with 1s 2. If any dimension does not match, broadcast the dimension with size=1 3. If neither non-matching dimension is 1, raise an error. shape=[3,1] shape=[3] 1. shape=[3,1] shape=[1,3] 2. shape=[3,3] shape=[3,3] final shape = [3,3]

Slide 38

Slide 38 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing

Slide 39

Slide 39 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing With Python lists, indexing accepts integers or slices . . .

Slide 40

Slide 40 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing NumPy arrays are similar . . .

Slide 41

Slide 41 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing . . . but NumPy offers other fast and convenient indexing options as well.

Slide 42

Slide 42 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing “Masking”: indexing with boolean masks A mask is a boolean array:

Slide 43

Slide 43 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing “Masking”: indexing with boolean masks Masks are often constructed using comparison operators and boolean logic, e.g.

Slide 44

Slide 44 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing “Fancy Indexing”: passing a list/array of indices . . .

Slide 45

Slide 45 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing Multiple dimensions: use commas to separate indices!

Slide 46

Slide 46 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing Multiple dimensions: use commas to separate indices!

Slide 47

Slide 47 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing Masking in multiple dimensions . . .

Slide 48

Slide 48 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing Mixing fancy indexing and slicing . . .

Slide 49

Slide 49 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing Mixing masking and slicing . . .

Slide 50

Slide 50 text

Strategy #4: Use NumPy’s slicing, masking, and fancy indexing All of these operations can be composed and combined in nearly limitless ways!

Slide 51

Slide 51 text

Example: Computing Nearest Neighbors Let’s combine all these ideas to compute nearest neighbors of points without a single loop!

Slide 52

Slide 52 text

Example: Computing Nearest Neighbors

Slide 53

Slide 53 text

Example: Computing Nearest Neighbors D i j 2 = (x i - x j )2 + (y i - y j )2 Naive approach requires three nested loops . . . . . . but we can do better.

Slide 54

Slide 54 text

Example: Computing Nearest Neighbors

Slide 55

Slide 55 text

Example: Computing Nearest Neighbors

Slide 56

Slide 56 text

Example: Computing Nearest Neighbors

Slide 57

Slide 57 text

Example: Computing Nearest Neighbors

Slide 58

Slide 58 text

Example: Computing Nearest Neighbors

Slide 59

Slide 59 text

Example: Computing Nearest Neighbors

Slide 60

Slide 60 text

Summary . . . - Writing Python is fast; loops can be slow - NumPy pushes loops into its compiled layer: - fast development time of Python - fast execution time of compiled code Strategies: 1. ufuncs for element-wise operations 2. aggregations for array summarization 3. broadcasting for combining arrays 4. slicing, masking, and fancy indexing for selecting and operating on subsets of arrays

Slide 61

Slide 61 text

~ Thank You! ~ Email: [email protected] Twitter: @jakevdp Github: jakevdp Web: http://vanderplas.com/ Blog: http://jakevdp.github.io/