Sarah_Masud_PyD19.pdf

The Magic of NumPy Sarah Masud Github: sara-02 1

Overview • Refresher ◦ SIMD ◦ Row Major/Column Major •
Why we need NumPy? What is NumPy? • What makes NumPy effective ◦ Strict types ◦ Memory Models and Views ◦ Vectorization and Universal Functions. 2

SIMD Single Instruction Multiple Data • Class of Parallel Computing.
• Data Controlled Parallelization. • A control unit issues same instruction to multiple Processing Units(PUs) • Mask bit in PUs to provide logical operation. • Every core has its own independent SIMD execution units. Fig.: http://www.new-npac.org/projects/cdroms/cewes-1999-06-vol1/nhse/hpccsurvey/architecture/slide6.html 3

Fig http://www.new-npac.org/projects/cdroms/cewes-1999-06-vol1/nhse/hpccsurvey/architecture/slide6.html SIMD Architecture 4

Row/Column Major It is just a matter of perspective ;)
• In Row Major, the columns values are faster to access. • In Column Major, the row values are faster to access. • Row offset = rth*NumCols + cth • Col offset = rth+ cth*NumRows FIg: https://en.wikipedia.org/wiki/Row-_and_column-major_order 5

Row/Column Major Array = { [ 1 2 3 ]
→ 1st Row [ 4 5 6 ] → 2nd Row } Row Major = {1,2,3,4,5,6} Column Major = {1,4,2,5,3,6} Array.Transpose = { [ 1 4 ] → 1st Row [ 2 5 ] → 2nd Row [ 3 6] → 3rd Row } Row Major = {1,4,2,5,3,6} Column Major = {1,2,3,4,5,6} Transpose does not involve data movement! 6

7 NUMerical PYthon • Numeric -- NumArray -- NumPy. •
Pandas, Scipy, OpenCV, ScikitLearn. • Fast execution of Vectorized Data. • ND-array and high-level mathematical functions. • Limitations in terms of modiﬁcations. 7 NumPy is the fundamental package for scientiﬁc computing with Python.

Code Slide Part-1 Strict types

9 Memory Models and Views Fig: https://www.python-course.eu/numpy.php Two Essential Components:
1. Metadata: stores information about dtype, shape, stride, Row/Col Major 2. Contiguous, ﬁxed size data block, referred as databuffer.

10 Meta Data 1 Meta Data 2 Meta Data 3
Reference_Counter = 3 Data Buffer View 1 View 2 View 3 Single Data Array Multiple Views

11 11 Reference_Counter = 3 List Object View 1 View
2 View 3 Analogous Concept in Lists

Code Slide Part-2 Models and Views

13 Vectorization • Derives from the vector notation of SIMD
architecture. • Instead of operating on a single element at a time in one loop, operate multiple elements(sometimes even all elements at once.) • Makes use of the underlying CPU optimizations for loops. • Numpy Uses C based array implement.

14 Vectorization for(i=0, i<arr_len, i++): C[i] = A[i]*B[i] for(i=0, i<arr_len,
i+=n): C[i:i+n] = A[i:i+n] * B[i:i+n] # i.e # C[i] = A[i]*B[i] # C[i+1] = A[i]*B[i] # C[i+2] = A[i]*B[i] # ….. # C[i+n-1] = A[i+n-1]*B[i+n-1] Scalar Code Vector Code(n times faster) n vector multiplications complete in the same time as one scalar multiplication, since n PUs perform the same task simultaneously.

15 Universal Functions (uFunc) • Perform element-wise operation on all
elements of the Nd-array. • Inherent support for broadcasting, typecasting, error-handling. • Numpy has pre-compiled implementations of these functions in C.(can be user deﬁned as well.)

Code Slide Part-3 Universal Func. Samples Lists vs Nd-Array

17 • https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/noteb ooks/02.05-Computation-on-arrays-broadcasting.ipynb • https://scipy-lectures.org/ • https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html • https://www.python-course.eu/numerical_programming_with_python.php
• https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.ht ml • https://jalammar.github.io/visual-numpy/ • https://docs.scipy.org/doc/numpy-1.17.0/reference/ • https://jakevdp.github.io/PythonDataScienceHandbook/ Additional Reading

Thank You [How can people contact you? Twitter: @_themessier Email:
sarahmasud02@gmail.com 18

Sarah_Masud_PyD19.pdf

Sarah_Masud_PyD19.pdf

_themessier

More Decks by _themessier

Other Decks in Programming

Featured

Transcript

The Magic of NumPy Sarah Masud Github: sara-02 1

Overview • Refresher ◦ SIMD ◦ Row Major/Column Major •

SIMD Single Instruction Multiple Data • Class of Parallel Computing.

Fig http://www.new-npac.org/projects/cdroms/cewes-1999-06-vol1/nhse/hpccsurvey/architecture/slide6.html SIMD Architecture 4

Row/Column Major It is just a matter of perspective ;)

Row/Column Major Array = { [ 1 2 3 ]

7 NUMerical PYthon • Numeric -- NumArray -- NumPy. •

Code Slide Part-1 Strict types

9 Memory Models and Views Fig: https://www.python-course.eu/numpy.php Two Essential Components:

10 Meta Data 1 Meta Data 2 Meta Data 3

11 11 Reference_Counter = 3 List Object View 1 View

Code Slide Part-2 Models and Views

13 Vectorization • Derives from the vector notation of SIMD

14 Vectorization for(i=0, i<arr_len, i++): C[i] = A[i]*B[i] for(i=0, i<arr_len,

15 Universal Functions (uFunc) • Perform element-wise operation on all

Code Slide Part-3 Universal Func. Samples Lists vs Nd-Array

17 • https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/noteb ooks/02.05-Computation-on-arrays-broadcasting.ipynb • https://scipy-lectures.org/ • https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html • https://www.python-course.eu/numerical_programming_with_python.php

Thank You [How can people contact you? Twitter: @_themessier Email: