Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sarah_Masud_PyD19.pdf

 Sarah_Masud_PyD19.pdf

What is common between Pandas, Scipy, Sklearn, Matplotlib, Keras? Apart from the fact that they are famous Python libraries? Well, all of these along with 1.56K other packages [1], have Numpy as a dependency. This is a huge feat! It will not be wrong to say that Numpy is the biggest reason for the success of Machine Learning in Python. But how did Numpy achieve this position? How is Numpy is able to handle both the scale and dimension of data with ease? While there are many factors that have gone into the design of this library, this talk will focus on 3 design decisions, that makes Numpy the magical, powerful library we know of.

_themessier

August 03, 2019
Tweet

More Decks by _themessier

Other Decks in Programming

Transcript

  1. Overview • Refresher ◦ SIMD ◦ Row Major/Column Major •

    Why we need NumPy? What is NumPy? • What makes NumPy effective ◦ Strict types ◦ Memory Models and Views ◦ Vectorization and Universal Functions. 2
  2. SIMD Single Instruction Multiple Data • Class of Parallel Computing.

    • Data Controlled Parallelization. • A control unit issues same instruction to multiple Processing Units(PUs) • Mask bit in PUs to provide logical operation. • Every core has its own independent SIMD execution units. Fig.: http://www.new-npac.org/projects/cdroms/cewes-1999-06-vol1/nhse/hpccsurvey/architecture/slide6.html 3
  3. Row/Column Major It is just a matter of perspective ;)

    • In Row Major, the columns values are faster to access. • In Column Major, the row values are faster to access. • Row offset = rth*NumCols + cth • Col offset = rth+ cth*NumRows FIg: https://en.wikipedia.org/wiki/Row-_and_column-major_order 5
  4. Row/Column Major Array = { [ 1 2 3 ]

    → 1st Row [ 4 5 6 ] → 2nd Row } Row Major = {1,2,3,4,5,6} Column Major = {1,4,2,5,3,6} Array.Transpose = { [ 1 4 ] → 1st Row [ 2 5 ] → 2nd Row [ 3 6] → 3rd Row } Row Major = {1,4,2,5,3,6} Column Major = {1,2,3,4,5,6} Transpose does not involve data movement! 6
  5. 7 NUMerical PYthon • Numeric -- NumArray -- NumPy. •

    Pandas, Scipy, OpenCV, ScikitLearn. • Fast execution of Vectorized Data. • ND-array and high-level mathematical functions. • Limitations in terms of modifications. 7 NumPy is the fundamental package for scientific computing with Python.
  6. 9 Memory Models and Views Fig: https://www.python-course.eu/numpy.php Two Essential Components:

    1. Metadata: stores information about dtype, shape, stride, Row/Col Major 2. Contiguous, fixed size data block, referred as databuffer.
  7. 10 Meta Data 1 Meta Data 2 Meta Data 3

    Reference_Counter = 3 Data Buffer View 1 View 2 View 3 Single Data Array Multiple Views
  8. 11 11 Reference_Counter = 3 List Object View 1 View

    2 View 3 Analogous Concept in Lists
  9. 13 Vectorization • Derives from the vector notation of SIMD

    architecture. • Instead of operating on a single element at a time in one loop, operate multiple elements(sometimes even all elements at once.) • Makes use of the underlying CPU optimizations for loops. • Numpy Uses C based array implement.
  10. 14 Vectorization for(i=0, i<arr_len, i++): C[i] = A[i]*B[i] for(i=0, i<arr_len,

    i+=n): C[i:i+n] = A[i:i+n] * B[i:i+n] # i.e # C[i] = A[i]*B[i] # C[i+1] = A[i]*B[i] # C[i+2] = A[i]*B[i] # ….. # C[i+n-1] = A[i+n-1]*B[i+n-1] Scalar Code Vector Code(n times faster) n vector multiplications complete in the same time as one scalar multiplication, since n PUs perform the same task simultaneously.
  11. 15 Universal Functions (uFunc) • Perform element-wise operation on all

    elements of the Nd-array. • Inherent support for broadcasting, typecasting, error-handling. • Numpy has pre-compiled implementations of these functions in C.(can be user defined as well.)
  12. 17 • https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/noteb ooks/02.05-Computation-on-arrays-broadcasting.ipynb • https://scipy-lectures.org/ • https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html • https://www.python-course.eu/numerical_programming_with_python.php

    • https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.ht ml • https://jalammar.github.io/visual-numpy/ • https://docs.scipy.org/doc/numpy-1.17.0/reference/ • https://jakevdp.github.io/PythonDataScienceHandbook/ Additional Reading