Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Accelerating the "slow" Python

Shreyas Bapat
September 04, 2020

Accelerating the "slow" Python

Tech talk on Semut.io

Python is a very handy tool when it comes to working with data, but at the same time, it can be pretty slow to work with. With its rise in the Scientific Computing World and languages like Julia also coming in, it's important to understand how can one make Python Fast (Sometimes Even as fast as compiled languages like C). Learn more about Python Performance on this tech talk with Shreyas Bapat, B.Tech. IIT Mandi, who is spearheading efficiency initiatives at Semut.io
Watch this talk here: https://youtu.be/yYWcWQGL-Po

Follow us on our social media channels to stay updated.
Website: https://semut.io
Twitter: https://bit.ly/semut_twitter
LinkedIn: https://bit.ly/semut_linkedin
Youtube: https://bit.ly/semut_youtube
Twitch: https://bit.ly/semut_twitch

Shreyas Bapat

September 04, 2020
Tweet

More Decks by Shreyas Bapat

Other Decks in Technology

Transcript

  1. /semut-io /Semutio /semut_io /semut.io Summary Python is rapidly becoming the

    language of choice for many people, being adopted in the Scientific Computing community. The AI/Machine Learning community also loves the simplicity of the language, but it all comes at a cost. The question remains, whether or not we can bear the cost, or somehow minimise it. We will discuss 4-5 ways in which we can possibly minimise it.
  2. /semut-io /Semutio /semut_io /semut.io Outline ➔ Is Python really slow?

    ➔ “Why” is Python so slow? ➔ Why do I care about Python? ➔ What do I do when my code is slow?
  3. /semut-io /Semutio /semut_io /semut.io Who am I? ➔ SDE 1

    @ Semut.io ➔ Electrical Engineer from IIT Mandi ➔ Managing Member @ PSF ➔ Lead Developer @ EinsteinPy ➔ Passionate about Black Holes and Space (Scientific Computing) ➔ Love Mountains, and Himachal Pradesh (India)
  4. /semut-io /Semutio /semut_io /semut.io Python is slow Fortran is 100x

    faster for this simple task which computes fibonacci numbers.
  5. /semut-io /Semutio /semut_io /semut.io Can we get the best of

    both worlds? ➔ The simplicity and speed of writing Python Code ➔ Faster Execution
  6. /semut-io /Semutio /semut_io /semut.io Elephant in the room: Why is

    Python slow? ➔ Python is slower than C or Fortran for a variety of reasons. ➔ Some reasons are well known ➔ Some reasons are often ignored/ill-known ➔ Let’s dive right in...
  7. /semut-io /Semutio /semut_io /semut.io 1. Python is dynamically typed At

    the time of execution, the interpreter doesn’t know the type of variables that are defined. Let’s just have a look at both Python and C codes for comparison, and understand:
  8. /semut-io /Semutio /semut_io /semut.io C Addition 1. Assign 1 to

    a ◦ 1a. Set a->PyObject_HEAD->typecode to integer ◦ 1b. Set a->val = 1 2. Assign 2 to b ◦ 2a. Set b->PyObject_HEAD->typecode to integer ◦ 2b. Set b->val = 2 3. call binary_add(a, b) ◦ 3a. find typecode in a->PyObject_HEAD ◦ 3b. a is an integer; value is a->val ◦ 3c. find typecode in b->PyObject_HEAD ◦ 3d. b is an integer; value is b->val ◦ 3e. call binary_add<int, int>(a->val, b->val) ◦ 3f. result of this is result, and is an integer. 4. Create a Python object c ◦ 4a. set c->PyObject_HEAD->typecode to integer ◦ 4b. set c->val to result 1. Assign <int> 1 to a 2. Assign <int> 2 to b 3. call binary_add<int, int>(a, b) 4. Assign the result to c Python Addition
  9. /semut-io /Semutio /semut_io /semut.io 2. Python is interpreted, rather than

    compiled ➔ A smart compiler can look ahead and optimize for repeated and unneeded operations. ➔ These can result in massive speed-ups.
  10. /semut-io /Semutio /semut_io /semut.io A lot of you may say,

    “But Python is not as slow, I’ve used it.”
  11. /semut-io /Semutio /semut_io /semut.io 3. Python’s object model can lead

    to inefficient memory access ➔ While in C, you might use some buffer based array(pretty fast), Python’s List is complicated, and clearly, slow. ➔ A NumPy array in its simplest form is a Python object build around a C array. I.e. it has a pointer to a contiguous data buffer of values. ➔ A Python list, on the other hand, has a pointer to a contiguous buffer of pointers, each of which points to a Python object which in turn has references to its data (in this case, integers). ➔ It’s easy to see how numpy substantially boosts the performance.
  12. /semut-io /Semutio /semut_io /semut.io Why am I even using Python?

    ➔ Dynamic typing makes Python easier to use than C ➔ It's extremely flexible and forgiving. ➔ This flexibility leads to efficient use of development time. ➔ Python offers easy hooks into compiled libraries, for when you need that Fortranish performance. (See fortran-magic on PyPI) ➔ Python ends up being an extremely efficient language for the overall task of doing science with code. ➔ Meta-Programming
  13. /semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose

    some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged
  14. /semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose

    some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged
  15. /semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose

    some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged
  16. /semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose

    some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged
  17. /semut-io /Semutio /semut_io /semut.io Example: K-Means Clustering Algorithm: 1. Choose

    some Cluster Centers 2. Repeat: a. Assign points to nearest center b. Update center to mean of points c. Check if Converged
  18. /semut-io /Semutio /semut_io /semut.io Pure Python Snippet 7.44 s ±

    122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - 5000 points
  19. /semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL) 131 ms

    ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Down from 7.44 seconds to 0.13 seconds (57X Faster) Key: Repeated operations pushed into a compiled layer Python overhead per array rather than per array element.
  20. /semut-io /Semutio /semut_io /semut.io 2. Numpy Vectorization (DSL) Advantages: ➔

    Python overhead per array rather than per array element ➔ Compact domain specific language for array operations ➔ NumPy is widely available Disadvantages: ➔ Batch operations can lead to excessive memory usage ➔ Different way of thinking about writing code Recommendation: Use Numpy Everywhere
  21. /semut-io /Semutio /semut_io /semut.io 3. Use specialised Data Structures Some

    popular examples: - Pandas - Scipy Same operations on Pandas Dataframe(groupby) and Scipy(KDTree): 102 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) As compared to: - 7.44 Seconds in Python - 131 ms with NumPy
  22. /semut-io /Semutio /semut_io /semut.io Some interesting Data Structures ➔ scipy.spatial

    for spatial queries like distances, nearest neighbors, etc. ➔ pandas for SQL-like grouping & aggregation ➔ xarray for grouping across multiple dimensions ➔ scipy.sparse.csgraph for graph-like problems (e.g. finding shortest paths) Recommendation: Use whenever applicable and possible
  23. /semut-io /Semutio /semut_io /semut.io 4. Cython Few changes to the

    code, doesn’t quite look like Python. Steep Learning Curve.
  24. /semut-io /Semutio /semut_io /semut.io 4. Cython 97.7 ms ± 12.2

    ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
  25. /semut-io /Semutio /semut_io /semut.io 4. Cython 1.72 ms ± 27.3

    µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 56.8X Faster than Pure Python
  26. /semut-io /Semutio /semut_io /semut.io 5. Numba JIT ➔ Numba is

    an open source, NumPy-aware optimizing compiler for Python sponsored by Anaconda, Inc. It uses the LLVM compiler project to generate machine code from Python syntax. ➔ GPU Support ➔ All you have to do is, add a @jit(nopython=True) decorator on function you want to accelerate. ➔ Little bit buggy sometimes, but still works 99% of the time.
  27. /semut-io /Semutio /semut_io /semut.io 5. Numba JIT Pure Python: 97.7

    ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) Numba JIT: 1.47 ms ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 66X Performance Boost in Numba JIT Again, this comes with a cost.
  28. /semut-io /Semutio /semut_io /semut.io Problems with using Numba JIT ➔

    Two modes: “nopython” and “object”, only former is truly optimised. ➔ Functions JITted in nopython mode can only call functions JITted in nopython mode. ➔ Avoid “object” mode. It is in process of being deprecated. ➔ Passing functions as arguments is even slower than JITing them, which is a huge blocker.
  29. /semut-io /Semutio /semut_io /semut.io 5. Numba JIT High level API:

    Simple structure, but complex data structures. Can be anything. Dangerous Algorithms: Broken into small chunks which can easily be accelerated using JIT. Avoid object mode. Nice, High level API Dangerous Algorithms
  30. /semut-io /Semutio /semut_io /semut.io 5. Numba JIT ➔ If things

    stop working, export NUMBA_DISABLE_JIT=1 ➔ You might have to rewrite some stuff, makes the code a little less dynamic ➔ Numba evolves very fast, keep an eye on their webpage.
  31. /semut-io /Semutio /semut_io /semut.io Other ways of making Python faster

    - CuPy - Dask There are no limits to what Python can do, thanks to the meta-programming possible. Even you can write such a tool. Recommendation: Use Numpy with Numba (Best for Scientific Computing)