Slide 1

Slide 1 text

/semut-io /Semutio /semut_io /semut.io

Slide 2

Slide 2 text

/semut-io /Semutio /semut_io /semut.io Summary Python is rapidly becoming the language of choice for many people, being adopted in the Scientific Computing community. The AI/Machine Learning community also loves the simplicity of the language, but it all comes at a cost. The question remains, whether or not we can bear the cost, or somehow minimise it. We will discuss 4-5 ways in which we can possibly minimise it.

Slide 3

Slide 3 text

/semut-io /Semutio /semut_io /semut.io Outline ➔ Is Python really slow? ➔ “Why” is Python so slow? ➔ Why do I care about Python? ➔ What do I do when my code is slow?

Slide 4

Slide 4 text

/semut-io /Semutio /semut_io /semut.io Who am I? ➔ SDE 1 @ Semut.io ➔ Electrical Engineer from IIT Mandi ➔ Managing Member @ PSF ➔ Lead Developer @ EinsteinPy ➔ Passionate about Black Holes and Space (Scientific Computing) ➔ Love Mountains, and Himachal Pradesh (India)

Slide 5

Slide 5 text

/semut-io /Semutio /semut_io /semut.io Python is fast Python vs. Java

Slide 6

Slide 6 text

/semut-io /Semutio /semut_io /semut.io Python is slow CPython has constant overhead per operation

Slide 7

Slide 7 text

/semut-io /Semutio /semut_io /semut.io Python is slow

Slide 8

Slide 8 text

/semut-io /Semutio /semut_io /semut.io Python is slow Fortran is 100x faster for this simple task which computes fibonacci numbers.

Slide 9

Slide 9 text

/semut-io /Semutio /semut_io /semut.io Can we get the best of both worlds? ➔ The simplicity and speed of writing Python Code ➔ Faster Execution

Slide 10

Slide 10 text

/semut-io /Semutio /semut_io /semut.io Elephant in the room: Why is Python slow? ➔ Python is slower than C or Fortran for a variety of reasons. ➔ Some reasons are well known ➔ Some reasons are often ignored/ill-known ➔ Let’s dive right in...

Slide 11

Slide 11 text

/semut-io /Semutio /semut_io /semut.io 1. Python is dynamically typed At the time of execution, the interpreter doesn’t know the type of variables that are defined. Let’s just have a look at both Python and C codes for comparison, and understand:

Slide 12

Slide 12 text

/semut-io /Semutio /semut_io /semut.io C Addition 1. Assign 1 to a ○ 1a. Set a->PyObject_HEAD->typecode to integer ○ 1b. Set a->val = 1 2. Assign 2 to b ○ 2a. Set b->PyObject_HEAD->typecode to integer ○ 2b. Set b->val = 2 3. call binary_add(a, b) ○ 3a. find typecode in a->PyObject_HEAD ○ 3b. a is an integer; value is a->val ○ 3c. find typecode in b->PyObject_HEAD ○ 3d. b is an integer; value is b->val ○ 3e. call binary_add(a->val, b->val) ○ 3f. result of this is result, and is an integer. 4. Create a Python object c ○ 4a. set c->PyObject_HEAD->typecode to integer ○ 4b. set c->val to result 1. Assign 1 to a 2. Assign 2 to b 3. call binary_add(a, b) 4. Assign the result to c Python Addition

Slide 13

Slide 13 text

/semut-io /Semutio /semut_io /semut.io 2. Python is interpreted, rather than compiled ➔ A smart compiler can look ahead and optimize for repeated and unneeded operations. ➔ These can result in massive speed-ups.

Slide 14

Slide 14 text

/semut-io /Semutio /semut_io /semut.io A lot of you may say, “But Python is not as slow, I’ve used it.”

Slide 15

Slide 15 text

/semut-io /Semutio /semut_io /semut.io 3. Python’s object model can lead to inefficient memory access ➔ While in C, you might use some buffer based array(pretty fast), Python’s List is complicated, and clearly, slow. ➔ A NumPy array in its simplest form is a Python object build around a C array. I.e. it has a pointer to a contiguous data buffer of values. ➔ A Python list, on the other hand, has a pointer to a contiguous buffer of pointers, each of which points to a Python object which in turn has references to its data (in this case, integers). ➔ It’s easy to see how numpy substantially boosts the performance.

Slide 16

Slide 16 text

/semut-io /Semutio /semut_io /semut.io 3. Python’s object model can lead to inefficient memory access

Slide 17

Slide 17 text

/semut-io /Semutio /semut_io /semut.io Why am I even using Python? ➔ Dynamic typing makes Python easier to use than C ➔ It's extremely flexible and forgiving. ➔ This flexibility leads to efficient use of development time. ➔ Python offers easy hooks into compiled libraries, for when you need that Fortranish performance. (See fortran-magic on PyPI) ➔ Python ends up being an extremely efficient language for the overall task of doing science with code. ➔ Meta-Programming

Slide 18

Slide 18 text

/semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged

Slide 19

Slide 19 text

/semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged

Slide 20

Slide 20 text

/semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged

Slide 21

Slide 21 text

/semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged

Slide 22

Slide 22 text

/semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged

Slide 23

Slide 23 text

/semut-io /Semutio /semut_io /semut.io Pure Python Snippet 7.44 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - 5000 points

Slide 24

Slide 24 text

/semut-io /Semutio /semut_io /semut.io What can you do when Python is too slow?

Slide 25

Slide 25 text

/semut-io /Semutio /semut_io /semut.io 1. Line Profiling “Premature optimization is the root of all evil” ~ Donald Knuth

Slide 26

Slide 26 text

/semut-io /Semutio /semut_io /semut.io Repeated Operations on Arrays causing a lot of delay in computations.

Slide 27

Slide 27 text

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL)

Slide 28

Slide 28 text

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL)

Slide 29

Slide 29 text

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL)

Slide 30

Slide 30 text

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL) 131 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Down from 7.44 seconds to 0.13 seconds (57X Faster) Key: Repeated operations pushed into a compiled layer Python overhead per array rather than per array element.

Slide 31

Slide 31 text

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL) Advantages: ➔ Python overhead per array rather than per array element ➔ Compact domain specific language for array operations ➔ NumPy is widely available Disadvantages: ➔ Batch operations can lead to excessive memory usage ➔ Different way of thinking about writing code Recommendation: Use Numpy Everywhere

Slide 32

Slide 32 text

/semut-io /Semutio /semut_io /semut.io 3. Use specialised Data Structures Some popular examples: - Pandas - Scipy Same operations on Pandas Dataframe(groupby) and Scipy(KDTree): 102 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) As compared to: - 7.44 Seconds in Python - 131 ms with NumPy

Slide 33

Slide 33 text

/semut-io /Semutio /semut_io /semut.io Some interesting Data Structures ➔ scipy.spatial for spatial queries like distances, nearest neighbors, etc. ➔ pandas for SQL-like grouping & aggregation ➔ xarray for grouping across multiple dimensions ➔ scipy.sparse.csgraph for graph-like problems (e.g. finding shortest paths) Recommendation: Use whenever applicable and possible

Slide 34

Slide 34 text

/semut-io /Semutio /semut_io /semut.io 4. Cython Few changes to the code, doesn’t quite look like Python. Steep Learning Curve.

Slide 35

Slide 35 text

/semut-io /Semutio /semut_io /semut.io 4. Cython 97.7 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Slide 36

Slide 36 text

/semut-io /Semutio /semut_io /semut.io 4. Cython 1.72 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 56.8X Faster than Pure Python

Slide 37

Slide 37 text

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT ➔ Numba is an open source, NumPy-aware optimizing compiler for Python sponsored by Anaconda, Inc. It uses the LLVM compiler project to generate machine code from Python syntax. ➔ GPU Support ➔ All you have to do is, add a @jit(nopython=True) decorator on function you want to accelerate. ➔ Little bit buggy sometimes, but still works 99% of the time.

Slide 38

Slide 38 text

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT

Slide 39

Slide 39 text

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT Pure Python: 97.7 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Numba JIT: 1.47 ms ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 66X Performance Boost in Numba JIT Again, this comes with a cost.

Slide 40

Slide 40 text

/semut-io /Semutio /semut_io /semut.io Problems with using Numba JIT ➔ Two modes: “nopython” and “object”, only former is truly optimised. ➔ Functions JITted in nopython mode can only call functions JITted in nopython mode. ➔ Avoid “object” mode. It is in process of being deprecated. ➔ Passing functions as arguments is even slower than JITing them, which is a huge blocker.

Slide 41

Slide 41 text

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT High level API: Simple structure, but complex data structures. Can be anything. Dangerous Algorithms: Broken into small chunks which can easily be accelerated using JIT. Avoid object mode. Nice, High level API Dangerous Algorithms

Slide 42

Slide 42 text

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT ➔ If things stop working, export NUMBA_DISABLE_JIT=1 ➔ You might have to rewrite some stuff, makes the code a little less dynamic ➔ Numba evolves very fast, keep an eye on their webpage.

Slide 43

Slide 43 text

/semut-io /Semutio /semut_io /semut.io Other ways of making Python faster - CuPy - Dask There are no limits to what Python can do, thanks to the meta-programming possible. Even you can write such a tool. Recommendation: Use Numpy with Numba (Best for Scientific Computing)

Slide 44

Slide 44 text

/semut-io /Semutio /semut_io /semut.io Resources Pure Python Implementation: https:/ /gist.github.com/shreyasbapat/bbc8f468dacbeb5b9baf7ead2c0d5077 Complete Notebook: https:/ /gist.github.com/shreyasbapat/b7135d38273f72c6ce827256f4daedeb

Slide 45

Slide 45 text

/semut-io /Semutio /semut_io /semut.io Thanks for joining!