Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Sarah_Masud_PyD19.pdf

 Sarah_Masud_PyD19.pdf

What is common between Pandas, Scipy, Sklearn, Matplotlib, Keras? Apart from the fact that they are famous Python libraries? Well, all of these along with 1.56K other packages [1], have Numpy as a dependency. This is a huge feat! It will not be wrong to say that Numpy is the biggest reason for the success of Machine Learning in Python. But how did Numpy achieve this position? How is Numpy is able to handle both the scale and dimension of data with ease? While there are many factors that have gone into the design of this library, this talk will focus on 3 design decisions, that makes Numpy the magical, powerful library we know of.

4af679ea7716884dc09bf8b42488bfbb?s=128

_themessier

August 03, 2019
Tweet

Transcript

  1. The Magic of NumPy Sarah Masud Github: sara-02 1

  2. Overview • Refresher ◦ SIMD ◦ Row Major/Column Major •

    Why we need NumPy? What is NumPy? • What makes NumPy effective ◦ Strict types ◦ Memory Models and Views ◦ Vectorization and Universal Functions. 2
  3. SIMD Single Instruction Multiple Data • Class of Parallel Computing.

    • Data Controlled Parallelization. • A control unit issues same instruction to multiple Processing Units(PUs) • Mask bit in PUs to provide logical operation. • Every core has its own independent SIMD execution units. Fig.: http://www.new-npac.org/projects/cdroms/cewes-1999-06-vol1/nhse/hpccsurvey/architecture/slide6.html 3
  4. Fig http://www.new-npac.org/projects/cdroms/cewes-1999-06-vol1/nhse/hpccsurvey/architecture/slide6.html SIMD Architecture 4

  5. Row/Column Major It is just a matter of perspective ;)

    • In Row Major, the columns values are faster to access. • In Column Major, the row values are faster to access. • Row offset = rth*NumCols + cth • Col offset = rth+ cth*NumRows FIg: https://en.wikipedia.org/wiki/Row-_and_column-major_order 5
  6. Row/Column Major Array = { [ 1 2 3 ]

    → 1st Row [ 4 5 6 ] → 2nd Row } Row Major = {1,2,3,4,5,6} Column Major = {1,4,2,5,3,6} Array.Transpose = { [ 1 4 ] → 1st Row [ 2 5 ] → 2nd Row [ 3 6] → 3rd Row } Row Major = {1,4,2,5,3,6} Column Major = {1,2,3,4,5,6} Transpose does not involve data movement! 6
  7. 7 NUMerical PYthon • Numeric -- NumArray -- NumPy. •

    Pandas, Scipy, OpenCV, ScikitLearn. • Fast execution of Vectorized Data. • ND-array and high-level mathematical functions. • Limitations in terms of modifications. 7 NumPy is the fundamental package for scientific computing with Python.
  8. Code Slide Part-1 Strict types

  9. 9 Memory Models and Views Fig: https://www.python-course.eu/numpy.php Two Essential Components:

    1. Metadata: stores information about dtype, shape, stride, Row/Col Major 2. Contiguous, fixed size data block, referred as databuffer.
  10. 10 Meta Data 1 Meta Data 2 Meta Data 3

    Reference_Counter = 3 Data Buffer View 1 View 2 View 3 Single Data Array Multiple Views
  11. 11 11 Reference_Counter = 3 List Object View 1 View

    2 View 3 Analogous Concept in Lists
  12. Code Slide Part-2 Models and Views

  13. 13 Vectorization • Derives from the vector notation of SIMD

    architecture. • Instead of operating on a single element at a time in one loop, operate multiple elements(sometimes even all elements at once.) • Makes use of the underlying CPU optimizations for loops. • Numpy Uses C based array implement.
  14. 14 Vectorization for(i=0, i<arr_len, i++): C[i] = A[i]*B[i] for(i=0, i<arr_len,

    i+=n): C[i:i+n] = A[i:i+n] * B[i:i+n] # i.e # C[i] = A[i]*B[i] # C[i+1] = A[i]*B[i] # C[i+2] = A[i]*B[i] # ….. # C[i+n-1] = A[i+n-1]*B[i+n-1] Scalar Code Vector Code(n times faster) n vector multiplications complete in the same time as one scalar multiplication, since n PUs perform the same task simultaneously.
  15. 15 Universal Functions (uFunc) • Perform element-wise operation on all

    elements of the Nd-array. • Inherent support for broadcasting, typecasting, error-handling. • Numpy has pre-compiled implementations of these functions in C.(can be user defined as well.)
  16. Code Slide Part-3 Universal Func. Samples Lists vs Nd-Array

  17. 17 • https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/noteb ooks/02.05-Computation-on-arrays-broadcasting.ipynb • https://scipy-lectures.org/ • https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html • https://www.python-course.eu/numerical_programming_with_python.php

    • https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.ht ml • https://jalammar.github.io/visual-numpy/ • https://docs.scipy.org/doc/numpy-1.17.0/reference/ • https://jakevdp.github.io/PythonDataScienceHandbook/ Additional Reading
  18. Thank You [How can people contact you? Twitter: @_themessier Email:

    sarahmasud02@gmail.com 18