$30 off During Our Annual Pro Sale. View Details »

Using NumPy efficiently

Using NumPy efficiently

An intermediate presentation about some of NumPy main features: broadcasting, indexing, basic internals.

cournape

May 12, 2018
Tweet

More Decks by cournape

Other Decks in Programming

Transcript

  1. Using NumPy
    efficiently
    David Cournapeau

    @cournape github.com/cournape

    View Slide

  2. Hello
    • I am David Cournapeau (μϏ
    υ): @cournape (twitter/github)

    • NumPy/SciPy user since 2005

    • Former core contributor to
    NumPy, SciPy

    • Started the learn project, which
    would later become scikit learn

    • Currently leading ML
    Engineering team at Cogent
    Labs

    View Slide

  3. PyData stack today

    View Slide

  4. Why would you care about
    NumPy ?
    • Used a fundamental piece in many higher level Machine Learning
    libraries (scikit learn/image, pandas, Tensorflow/Chainer/PyTorch)

    • Required to understand the source code of those libraries

    • Historically: key enabler of python for ML and Data Science

    • NumPy is a library for array computing

    • Long history in computing (APL, J, K, Matlab, etc…): see e.g.
    http://jsoftware.com/

    • Both about efficiency and expressivity

    View Slide

  5. A bit of history
    • Early work for array computing in Python (matrix-sig mailing
    list):

    • 1995: Jim Fulton, Jim Hugunin. Became Numeric

    • 1995-2000ies: Paul Dubois, Konrad Hinsen, David Ascher,
    Travis Oliphant and other contributed later

    • 2001: Numarray: Perry Greenfield, Rick White and Todd
    Miller

    • 2005: “grand unification” into NumPy, led by Travis
    Oliphant

    View Slide

  6. Array Computing for speed
    • You want to compute some math operations:
    • In NumPy:

    View Slide

  7. Why the difference ?
    • Why (c)python is slow for computation: boxing
    From Python Data Science Handbook by Jake Vanderplas

    View Slide

  8. Why the difference ?
    • Why (c)python is slow for
    computation: genericity

    • E.g. lists can contains arbitrary
    python values

    • You need to jump pointers to
    access values

    • Note: accessing an arbitrary
    value in RAM costs ~ 100
    cycles (as much as computing
    the exponential of a double in
    C !)
    From Python Data Science Handbook by Jake Vanderplas

    View Slide

  9. Array computing for
    expressivity
    • One simple ReLU layer in neural network for 1d vector x:
    logits = W @ x + b
    output = softmax(logits)
    print(logits.shape)
    • Maps more directly to many scientific domains

    View Slide

  10. Structure of NumPy arrays
    • A NumPy array is essentially:

    • A single bloc of memory

    • A dtype to describe how to interpret single values in the
    memory bloc

    • Metadata such as shape, strides, etc.

    • NumPy arrays memory cost same as C + constant

    View Slide

  11. Structure of NumPy arrays
    • Data is like a C array

    • Dtype is a python object with
    information about values in
    the array (size, endianness,
    etc.)

    • dimensions, strides and dtype
    are used for multidimensional
    indexing

    View Slide

  12. Example
    • Notebook example for array creation, metadata and
    simple slices

    View Slide

  13. Broadcasting 1/4
    • Linear Algebra defines most basic NumPy operations

    • We do not always want to be as strict as mathematics:

    • We want to add scalar to arrays without having to
    create arrays with the duplicated scalar

    • We sometimes do not care about row vs column vector

    • We sometimes want to save memory and avoid
    temporaries

    View Slide

  14. Broadcasting 2/4
    • Broadcasting: rules to work with arrays (and scalars) with
    non conforming shapes

    • NumPy provides powerful broadcasting capabilities
    import numpy as np
    # np.newaxis creates a new dimension, but array has the same size
    x = np.arange(5)[:, np.newaxis]
    y = np.arange(5)
    print(x + y)

    View Slide

  15. Broadcasting 3/4
    • Broadcasting rules:

    • If arrays have different number
    of dimensions, insert new axes
    on the left until arrays have
    same number of dimensions

    • For each axis i, if arrays
    dimension[i] do not match,
    “stretch” the arrays where
    dimension[i] = 1 to match
    other array(s)

    • (if no match and dimension[i] !
    = 1 -> error) From Python Data Science Handbook by Jake Vanderplas

    View Slide

  16. Broadcasting 4/4
    • A few notes:

    • Broadcasting is done “logically”, and the temporary arrays
    are not created in memory

    • Integrated in the ufunc and multi-dimensional indexing
    infrastructure in NumPy code (see later)

    • Indices are broadcasted as well in fancy indexing (see
    later)

    • You can use np.broadcast_arrays to explicitly build arrays
    as if they were broadcasted

    View Slide

  17. Indexing: views
    • One can use slices any time
    one needs to extract “regular”
    subarrays

    • If arrays are solely indexed
    through slices, the returned
    array is a view (no data
    copied)
    import numpy as np
    x = np.arange(6).reshape(2, 3)
    print(x)
    print(x[:, ::2])
    print(x[::2, ::2])

    View Slide

  18. Examples

    View Slide

  19. Indexing: fancy indexing
    • As soon as you index an array with an array, you are using
    fancy indexing

    • Fancy indexing always returns a copy (why ?)

    • 2 main cases of fancy indexing:

    • Use an array of boolean (aka mask)

    • Use an array of integers

    • Fancy indexing can get too fancy…

    View Slide

  20. Fancy indexing with masks
    • Indexing with array of booleans

    • Appears naturally with comparison

    View Slide

  21. Fancy indexing with integer
    arrays
    • Indexing with array of integers

    • Appears naturally to select specific values from their
    indices

    View Slide

  22. Does not sound that
    fancy ?

    View Slide

  23. See Jaime Fernández - The Future of NumPy Indexing presentation
    Sebastian Berge: new fancy indexing NEP

    View Slide

  24. How to go further
    • From Python to NumPy by Nicolas Rougier: http://
    www.labri.fr/perso/nrougier/from-python-to-numpy

    • 100 NumPy exercises by Nicolas Rougier: https://
    github.com/rougier/numpy-100/blob/master/
    100%20Numpy%20exercises.md

    • Guide to NumPy: http://web.mit.edu/dvp/Public/
    numpybook.pdf

    • “New” ND index by Mark Wiebe, with notes about speeding
    up indexing, etc.: https://github.com/numpy/numpy/blob/
    master/doc/neps/nep-0010-new-iterator-ufunc.rst

    View Slide

  25. Thank you

    View Slide