Accelerating the "slow" Python

/semut-io /Semutio /semut_io /semut.io

/semut-io /Semutio /semut_io /semut.io Summary Python is rapidly becoming the
language of choice for many people, being adopted in the Scientiﬁc Computing community. The AI/Machine Learning community also loves the simplicity of the language, but it all comes at a cost. The question remains, whether or not we can bear the cost, or somehow minimise it. We will discuss 4-5 ways in which we can possibly minimise it.

/semut-io /Semutio /semut_io /semut.io Outline ➔ Is Python really slow?
➔ “Why” is Python so slow? ➔ Why do I care about Python? ➔ What do I do when my code is slow?

/semut-io /Semutio /semut_io /semut.io Who am I? ➔ SDE 1
@ Semut.io ➔ Electrical Engineer from IIT Mandi ➔ Managing Member @ PSF ➔ Lead Developer @ EinsteinPy ➔ Passionate about Black Holes and Space (Scientiﬁc Computing) ➔ Love Mountains, and Himachal Pradesh (India)

/semut-io /Semutio /semut_io /semut.io Python is fast Python vs. Java

/semut-io /Semutio /semut_io /semut.io Python is slow CPython has constant
overhead per operation

/semut-io /Semutio /semut_io /semut.io Python is slow

/semut-io /Semutio /semut_io /semut.io Python is slow Fortran is 100x
faster for this simple task which computes ﬁbonacci numbers.

/semut-io /Semutio /semut_io /semut.io Can we get the best of
both worlds? ➔ The simplicity and speed of writing Python Code ➔ Faster Execution

/semut-io /Semutio /semut_io /semut.io Elephant in the room: Why is
Python slow? ➔ Python is slower than C or Fortran for a variety of reasons. ➔ Some reasons are well known ➔ Some reasons are often ignored/ill-known ➔ Let’s dive right in...

/semut-io /Semutio /semut_io /semut.io 1. Python is dynamically typed At
the time of execution, the interpreter doesn’t know the type of variables that are deﬁned. Let’s just have a look at both Python and C codes for comparison, and understand:

/semut-io /Semutio /semut_io /semut.io C Addition 1. Assign 1 to
a ◦ 1a. Set a->PyObject_HEAD->typecode to integer ◦ 1b. Set a->val = 1 2. Assign 2 to b ◦ 2a. Set b->PyObject_HEAD->typecode to integer ◦ 2b. Set b->val = 2 3. call binary_add(a, b) ◦ 3a. ﬁnd typecode in a->PyObject_HEAD ◦ 3b. a is an integer; value is a->val ◦ 3c. ﬁnd typecode in b->PyObject_HEAD ◦ 3d. b is an integer; value is b->val ◦ 3e. call binary_add<int, int>(a->val, b->val) ◦ 3f. result of this is result, and is an integer. 4. Create a Python object c ◦ 4a. set c->PyObject_HEAD->typecode to integer ◦ 4b. set c->val to result 1. Assign <int> 1 to a 2. Assign <int> 2 to b 3. call binary_add<int, int>(a, b) 4. Assign the result to c Python Addition

/semut-io /Semutio /semut_io /semut.io 2. Python is interpreted, rather than
compiled ➔ A smart compiler can look ahead and optimize for repeated and unneeded operations. ➔ These can result in massive speed-ups.

/semut-io /Semutio /semut_io /semut.io A lot of you may say,
“But Python is not as slow, I’ve used it.”

/semut-io /Semutio /semut_io /semut.io 3. Python’s object model can lead
to inefficient memory access ➔ While in C, you might use some buffer based array(pretty fast), Python’s List is complicated, and clearly, slow. ➔ A NumPy array in its simplest form is a Python object build around a C array. I.e. it has a pointer to a contiguous data buffer of values. ➔ A Python list, on the other hand, has a pointer to a contiguous buffer of pointers, each of which points to a Python object which in turn has references to its data (in this case, integers). ➔ It’s easy to see how numpy substantially boosts the performance.

/semut-io /Semutio /semut_io /semut.io 3. Python’s object model can lead
to ineﬃcient memory access

/semut-io /Semutio /semut_io /semut.io Why am I even using Python?
➔ Dynamic typing makes Python easier to use than C ➔ It's extremely flexible and forgiving. ➔ This flexibility leads to efficient use of development time. ➔ Python offers easy hooks into compiled libraries, for when you need that Fortranish performance. (See fortran-magic on PyPI) ➔ Python ends up being an extremely efficient language for the overall task of doing science with code. ➔ Meta-Programming

/semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose
some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged

/semut-io /Semutio /semut_io /semut.io Pure Python Snippet 7.44 s ±
122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - 5000 points

/semut-io /Semutio /semut_io /semut.io What can you do when Python
is too slow?

/semut-io /Semutio /semut_io /semut.io 1. Line Proﬁling “Premature optimization is
the root of all evil” ~ Donald Knuth

/semut-io /Semutio /semut_io /semut.io Repeated Operations on Arrays causing a
lot of delay in computations.

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL)

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL) 131 ms
± 3.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Down from 7.44 seconds to 0.13 seconds (57X Faster) Key: Repeated operations pushed into a compiled layer Python overhead per array rather than per array element.

/semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL) Advantages: ➔
Python overhead per array rather than per array element ➔ Compact domain speciﬁc language for array operations ➔ NumPy is widely available Disadvantages: ➔ Batch operations can lead to excessive memory usage ➔ Diﬀerent way of thinking about writing code Recommendation: Use Numpy Everywhere

/semut-io /Semutio /semut_io /semut.io 3. Use specialised Data Structures Some
popular examples: - Pandas - Scipy Same operations on Pandas Dataframe(groupby) and Scipy(KDTree): 102 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) As compared to: - 7.44 Seconds in Python - 131 ms with NumPy

/semut-io /Semutio /semut_io /semut.io Some interesting Data Structures ➔ scipy.spatial
for spatial queries like distances, nearest neighbors, etc. ➔ pandas for SQL-like grouping & aggregation ➔ xarray for grouping across multiple dimensions ➔ scipy.sparse.csgraph for graph-like problems (e.g. ﬁnding shortest paths) Recommendation: Use whenever applicable and possible

/semut-io /Semutio /semut_io /semut.io 4. Cython Few changes to the
code, doesn’t quite look like Python. Steep Learning Curve.

/semut-io /Semutio /semut_io /semut.io 4. Cython 97.7 ms ± 12.2
ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

/semut-io /Semutio /semut_io /semut.io 4. Cython 1.72 ms ± 27.3
µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 56.8X Faster than Pure Python

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT ➔ Numba is
an open source, NumPy-aware optimizing compiler for Python sponsored by Anaconda, Inc. It uses the LLVM compiler project to generate machine code from Python syntax. ➔ GPU Support ➔ All you have to do is, add a @jit(nopython=True) decorator on function you want to accelerate. ➔ Little bit buggy sometimes, but still works 99% of the time.

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT Pure Python: 97.7
ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Numba JIT: 1.47 ms ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 66X Performance Boost in Numba JIT Again, this comes with a cost.

/semut-io /Semutio /semut_io /semut.io Problems with using Numba JIT ➔
Two modes: “nopython” and “object”, only former is truly optimised. ➔ Functions JITted in nopython mode can only call functions JITted in nopython mode. ➔ Avoid “object” mode. It is in process of being deprecated. ➔ Passing functions as arguments is even slower than JITing them, which is a huge blocker.

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT High level API:
Simple structure, but complex data structures. Can be anything. Dangerous Algorithms: Broken into small chunks which can easily be accelerated using JIT. Avoid object mode. Nice, High level API Dangerous Algorithms

/semut-io /Semutio /semut_io /semut.io 5. Numba JIT ➔ If things
stop working, export NUMBA_DISABLE_JIT=1 ➔ You might have to rewrite some stuﬀ, makes the code a little less dynamic ➔ Numba evolves very fast, keep an eye on their webpage.

/semut-io /Semutio /semut_io /semut.io Other ways of making Python faster
- CuPy - Dask There are no limits to what Python can do, thanks to the meta-programming possible. Even you can write such a tool. Recommendation: Use Numpy with Numba (Best for Scientiﬁc Computing)

/semut-io /Semutio /semut_io /semut.io Resources Pure Python Implementation: https:/ /gist.github.com/shreyasbapat/bbc8f468dacbeb5b9baf7ead2c0d5077
Complete Notebook: https:/ /gist.github.com/shreyasbapat/b7135d38273f72c6ce827256f4daedeb

/semut-io /Semutio /semut_io /semut.io Thanks for joining!

Accelerating the "slow" Python

Accelerating the "slow" Python

More Decks by Shreyas Bapat

Other Decks in Technology

Featured

Transcript