Why would you care about NumPy ? • Used a fundamental piece in many higher level Machine Learning libraries (scikit learn/image, pandas, Tensorflow/Chainer/PyTorch)
• Required to understand the source code of those libraries
• Historically: key enabler of python for ML and Data Science
• NumPy is a library for array computing
• Long history in computing (APL, J, K, Matlab, etc…): see e.g. http://jsoftware.com/
Why the difference ? • Why (c)python is slow for computation: genericity
• E.g. lists can contains arbitrary python values
• You need to jump pointers to access values
• Note: accessing an arbitrary value in RAM costs ~ 100 cycles (as much as computing the exponential of a double in C !) From Python Data Science Handbook by Jake Vanderplas
Array computing for expressivity • One simple ReLU layer in neural network for 1d vector x: logits = W @ x + b output = softmax(logits) print(logits.shape) • Maps more directly to many scientific domains
Broadcasting 2/4 • Broadcasting: rules to work with arrays (and scalars) with non conforming shapes
• NumPy provides powerful broadcasting capabilities import numpy as np # np.newaxis creates a new dimension, but array has the same size x = np.arange(5)[:, np.newaxis] y = np.arange(5) print(x + y)
Indexing: views • One can use slices any time one needs to extract “regular” subarrays
• If arrays are solely indexed through slices, the returned array is a view (no data copied) import numpy as np x = np.arange(6).reshape(2, 3) print(x) print(x[:, ::2]) print(x[::2, ::2])
How to go further • From Python to NumPy by Nicolas Rougier: http:// www.labri.fr/perso/nrougier/from-python-to-numpy
• 100 NumPy exercises by Nicolas Rougier: https:// github.com/rougier/numpy-100/blob/master/ 100%20Numpy%20exercises.md
• Guide to NumPy: http://web.mit.edu/dvp/Public/ numpybook.pdf
• “New” ND index by Mark Wiebe, with notes about speeding up indexing, etc.: https://github.com/numpy/numpy/blob/ master/doc/neps/nep-0010-new-iterator-ufunc.rst