Using NumPy efficiently

Using NumPy efﬁciently David Cournapeau @cournape github.com/cournape

Hello • I am David Cournapeau (μϏ υ): @cournape (twitter/github)
• NumPy/SciPy user since 2005 • Former core contributor to NumPy, SciPy • Started the learn project, which would later become scikit learn • Currently leading ML Engineering team at Cogent Labs

PyData stack today

Why would you care about NumPy ? • Used a
fundamental piece in many higher level Machine Learning libraries (scikit learn/image, pandas, Tensorﬂow/Chainer/PyTorch) • Required to understand the source code of those libraries • Historically: key enabler of python for ML and Data Science • NumPy is a library for array computing • Long history in computing (APL, J, K, Matlab, etc…): see e.g. http://jsoftware.com/ • Both about eﬃciency and expressivity

A bit of history • Early work for array computing
in Python (matrix-sig mailing list): • 1995: Jim Fulton, Jim Hugunin. Became Numeric • 1995-2000ies: Paul Dubois, Konrad Hinsen, David Ascher, Travis Oliphant and other contributed later • 2001: Numarray: Perry Greenﬁeld, Rick White and Todd Miller • 2005: “grand uniﬁcation” into NumPy, led by Travis Oliphant

Array Computing for speed • You want to compute some
math operations: • In NumPy:

Why the difference ? • Why (c)python is slow for
computation: boxing From Python Data Science Handbook by Jake Vanderplas

Why the difference ? • Why (c)python is slow for
computation: genericity • E.g. lists can contains arbitrary python values • You need to jump pointers to access values • Note: accessing an arbitrary value in RAM costs ~ 100 cycles (as much as computing the exponential of a double in C !) From Python Data Science Handbook by Jake Vanderplas

Array computing for expressivity • One simple ReLU layer in
neural network for 1d vector x: logits = W @ x + b output = softmax(logits) print(logits.shape) • Maps more directly to many scientiﬁc domains

Structure of NumPy arrays • A NumPy array is essentially:
• A single bloc of memory • A dtype to describe how to interpret single values in the memory bloc • Metadata such as shape, strides, etc. • NumPy arrays memory cost same as C + constant

Structure of NumPy arrays • Data is like a C
array • Dtype is a python object with information about values in the array (size, endianness, etc.) • dimensions, strides and dtype are used for multidimensional indexing

Example • Notebook example for array creation, metadata and simple
slices

Broadcasting 1/4 • Linear Algebra deﬁnes most basic NumPy operations
• We do not always want to be as strict as mathematics: • We want to add scalar to arrays without having to create arrays with the duplicated scalar • We sometimes do not care about row vs column vector • We sometimes want to save memory and avoid temporaries

Broadcasting 2/4 • Broadcasting: rules to work with arrays (and
scalars) with non conforming shapes • NumPy provides powerful broadcasting capabilities import numpy as np # np.newaxis creates a new dimension, but array has the same size x = np.arange(5)[:, np.newaxis] y = np.arange(5) print(x + y)

Broadcasting 3/4 • Broadcasting rules: • If arrays have diﬀerent
number of dimensions, insert new axes on the left until arrays have same number of dimensions • For each axis i, if arrays dimension[i] do not match, “stretch” the arrays where dimension[i] = 1 to match other array(s) • (if no match and dimension[i] ! = 1 -> error) From Python Data Science Handbook by Jake Vanderplas

Broadcasting 4/4 • A few notes: • Broadcasting is done
“logically”, and the temporary arrays are not created in memory • Integrated in the ufunc and multi-dimensional indexing infrastructure in NumPy code (see later) • Indices are broadcasted as well in fancy indexing (see later) • You can use np.broadcast_arrays to explicitly build arrays as if they were broadcasted

Indexing: views • One can use slices any time one
needs to extract “regular” subarrays • If arrays are solely indexed through slices, the returned array is a view (no data copied) import numpy as np x = np.arange(6).reshape(2, 3) print(x) print(x[:, ::2]) print(x[::2, ::2])

Examples

Indexing: fancy indexing • As soon as you index an
array with an array, you are using fancy indexing • Fancy indexing always returns a copy (why ?) • 2 main cases of fancy indexing: • Use an array of boolean (aka mask) • Use an array of integers • Fancy indexing can get too fancy…

Fancy indexing with masks • Indexing with array of booleans
• Appears naturally with comparison

Fancy indexing with integer arrays • Indexing with array of
integers • Appears naturally to select speciﬁc values from their indices

Does not sound that fancy ?

See Jaime Fernández - The Future of NumPy Indexing presentation
Sebastian Berge: new fancy indexing NEP

How to go further • From Python to NumPy by
Nicolas Rougier: http:// www.labri.fr/perso/nrougier/from-python-to-numpy • 100 NumPy exercises by Nicolas Rougier: https:// github.com/rougier/numpy-100/blob/master/ 100%20Numpy%20exercises.md • Guide to NumPy: http://web.mit.edu/dvp/Public/ numpybook.pdf • “New” ND index by Mark Wiebe, with notes about speeding up indexing, etc.: https://github.com/numpy/numpy/blob/ master/doc/neps/nep-0010-new-iterator-ufunc.rst

Thank you

Using NumPy efficiently

Using NumPy efficiently

cournape

More Decks by cournape

Other Decks in Programming

Featured

Transcript

Using NumPy efﬁciently David Cournapeau @cournape github.com/cournape

Hello • I am David Cournapeau (μϏ υ): @cournape (twitter/github)

PyData stack today

Why would you care about NumPy ? • Used a

A bit of history • Early work for array computing

Array Computing for speed • You want to compute some

Why the difference ? • Why (c)python is slow for

Why the difference ? • Why (c)python is slow for

Array computing for expressivity • One simple ReLU layer in

Structure of NumPy arrays • A NumPy array is essentially:

Structure of NumPy arrays • Data is like a C

Example • Notebook example for array creation, metadata and simple

Broadcasting 1/4 • Linear Algebra deﬁnes most basic NumPy operations

Broadcasting 2/4 • Broadcasting: rules to work with arrays (and

Broadcasting 3/4 • Broadcasting rules: • If arrays have diﬀerent

Broadcasting 4/4 • A few notes: • Broadcasting is done

Indexing: views • One can use slices any time one

Examples

Indexing: fancy indexing • As soon as you index an

Fancy indexing with masks • Indexing with array of booleans

Fancy indexing with integer arrays • Indexing with array of

Does not sound that fancy ?

See Jaime Fernández - The Future of NumPy Indexing presentation

How to go further • From Python to NumPy by

Thank you