Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Number Crunching in Python (with Numpy) @ Budapest BI Meetup 2015

Number Crunching in Python (with Numpy) @ Budapest BI Meetup 2015

The numpy package takes a central role in Python data science code for numerical processing.This is mainly because numpy code has been designed with high performance in mind. This short talk will highlight the most essential features of numpy by discussing some concrete examples where numpy takes a central role.

Valerio Maggio

October 13, 2015
Tweet

More Decks by Valerio Maggio

Other Decks in Programming

Transcript

  1. Number Crunching in Python (with Numpy) Postdoc @ University of

    Salerno, Italy Valerio Maggio @leriomaggio
  2. Outline • Scientific and Engineering Computing • Common FP pitfalls

    • Intro to Numpy • Why it is useful • ndarray 
 (Memory and Indexing) • Case Studies
  3. number-crunching: n. [common] Computations of a numerical nature, esp. those

    that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch. We are not evil. Just chaotic neutral.
  4. Alternatives • Matlab (IDE, numeric computations oriented, high quality algorithms,

    lots of packages, poor GP programming support, commercial) • Octave (Matlab clone) • R (stats oriented, poor general purpose programming support) • Fortran/C++ (very low level, very fast, more complex to use) • In general, these tools either are low level GP or high level DSLs
  5. Python • Numpy (low-level numerical computations) + 
 Scipy (lots

    of additional packages) • IPython (wonderful command line interpreter) + 
 IPython Notebook (“Mathematica-like” interactive documents) • HDF5 (PyTables, H5Py), Databases • Specific libraries for machine learning, etc. • General Purpose Object Oriented Programming
  6. What is Numpy? • a powerful Python extension for N-dimensional

    arrays • a tool for integrating C/C++ and Fortran Code • designed for scientific computation 
 (e.g., linear algebra ops) NumPy is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized, performance is very good. So, in a nutshell:
  7. The core NumPy's main object is the homogeneous multidimensional array.

    It is a table of elements (usually numbers), all of the same type. In Numpy: • Dimensions are called axes • The number of axes is called rank
  8. Numpy (nd)Arrays The most important attributes of an ndarray object

    are: • ndarray.ndim
 the number of axes (dimensions) of the array. • ndarray.shape
 the dimensions of the array. • For a matrix with n rows and m columns, shape will be (n,m). • ndarray.size
 the total number of elements of the array. • ndarray.dtype
 numpy.int32, numpy.int16, and numpy.float64 are some examples. • ndarray.itemsize
 the size in bytes of elements of the array. • For example, elements of type float64 has itemsize 8 (=64/8)
  9. So what’s the point? • So far, numpy arrays look

    very similar to Python lists • (nested) lists in case of ndim > 1 • Why not simply use Python lists for computations instead of creating a new array type?
  10. vs

  11. vs

  12. Our Code Numpy Atlas/MKL Improvements Improvements Algorithms are fast because

    of highly optimized C/Fortran code 4 30 LOAD_GLOBAL 1 (dot) 33 LOAD_FAST 0 (a) 36 LOAD_FAST 1 (b) 39 CALL_FUNCTION 2 42 STORE_FAST 2 (c) Numpy Stack c = a · b
  13. ndarray Memory behavior shape, stride, flags (i0, . . .

    , in 1) ! I Shape: (d 0 , …, d n-1 ) 4x3 An n-dimensional array references some (usually contiguous memory area) An n-dimensional array has property such as its shape or the data-type of the elements containes Is an object, so there is some behavior, e.g., the def. of __add__ and similar stuff N-dimensional arrays are homogeneous ndarray
  14. Element Layout in Memory (i0, . . . , in

    1) ! I C-contiguous F-contiguous Shape: (d 0 , …, d n ) IC = n 1 X k=0 ik n 1 Y j=k+1 dj IF = n 1 X k=0 ik k 1 Y j=0 dj Shape: (d 0 , …, d k ,…, d n-1 ) Shape: (d 0 , …, d k ,…, d n-1 ) IC = i0 · d0 + i1 4x3 IF = i0 + i1 · d1
  15. C-contiguous F-contiguous sF (k) = k 1 Y j=0 dj

    IF = n X k=0 ik · sF (k) sC(k) = n 1 Y j=k+1 dj IC = n 1 X k=0 ik · sC(k) Stride C-contiguous F-contiguous C-contiguous (s0 = d0, s1 = 1) (s0 = 1, s1 = d1) IC = n 1 X k=0 ik n 1 Y j=k+1 dj IF = n 1 X k=0 ik k 1 Y j=0 dj Stride
  16. ndarray Memory behavior shape, stride, flags ndarray behavior shape, stride,

    flags View View View View Views ndarray Memory behavior shape, stride, flags ndarray behavior shape, stride, flags View View View View