$30 off During Our Annual Pro Sale. View Details »

Accelerating the "slow" Python

Shreyas Bapat
September 04, 2020

Accelerating the "slow" Python

Tech talk on Semut.io

Python is a very handy tool when it comes to working with data, but at the same time, it can be pretty slow to work with. With its rise in the Scientific Computing World and languages like Julia also coming in, it's important to understand how can one make Python Fast (Sometimes Even as fast as compiled languages like C). Learn more about Python Performance on this tech talk with Shreyas Bapat, B.Tech. IIT Mandi, who is spearheading efficiency initiatives at Semut.io
Watch this talk here: https://youtu.be/yYWcWQGL-Po

Follow us on our social media channels to stay updated.
Website: https://semut.io
Twitter: https://bit.ly/semut_twitter
LinkedIn: https://bit.ly/semut_linkedin
Youtube: https://bit.ly/semut_youtube
Twitch: https://bit.ly/semut_twitch

Shreyas Bapat

September 04, 2020
Tweet

More Decks by Shreyas Bapat

Other Decks in Technology

Transcript

  1. /semut-io /Semutio /semut_io /semut.io

    View Slide

  2. /semut-io /Semutio /semut_io /semut.io
    Summary
    Python is rapidly becoming the language of choice for many people, being
    adopted in the Scientific Computing community.
    The AI/Machine Learning community also loves the simplicity of the language,
    but it all comes at a cost.
    The question remains, whether or not we can bear the cost, or somehow
    minimise it.
    We will discuss 4-5 ways in which we can possibly minimise it.

    View Slide

  3. /semut-io /Semutio /semut_io /semut.io
    Outline
    ➔ Is Python really slow?
    ➔ “Why” is Python so slow?
    ➔ Why do I care about Python?
    ➔ What do I do when my code is slow?

    View Slide

  4. /semut-io /Semutio /semut_io /semut.io
    Who am I?
    ➔ SDE 1 @ Semut.io
    ➔ Electrical Engineer from IIT Mandi
    ➔ Managing Member @ PSF
    ➔ Lead Developer @ EinsteinPy
    ➔ Passionate about Black Holes and
    Space (Scientific Computing)
    ➔ Love Mountains, and Himachal
    Pradesh (India)

    View Slide

  5. /semut-io /Semutio /semut_io /semut.io
    Python is fast
    Python vs. Java

    View Slide

  6. /semut-io /Semutio /semut_io /semut.io
    Python is slow
    CPython has constant overhead per operation

    View Slide

  7. /semut-io /Semutio /semut_io /semut.io
    Python is slow

    View Slide

  8. /semut-io /Semutio /semut_io /semut.io
    Python is slow
    Fortran is 100x faster for this simple task which computes fibonacci
    numbers.

    View Slide

  9. /semut-io /Semutio /semut_io /semut.io
    Can we get the best of both worlds?
    ➔ The simplicity and speed of writing Python Code
    ➔ Faster Execution

    View Slide

  10. /semut-io /Semutio /semut_io /semut.io
    Elephant in the room: Why is Python slow?
    ➔ Python is slower than C or Fortran for a variety of reasons.
    ➔ Some reasons are well known
    ➔ Some reasons are often ignored/ill-known
    ➔ Let’s dive right in...

    View Slide

  11. /semut-io /Semutio /semut_io /semut.io
    1. Python is dynamically typed
    At the time of execution, the interpreter doesn’t know the type of variables
    that are defined.
    Let’s just have a look at both Python and C codes for comparison, and
    understand:

    View Slide

  12. /semut-io /Semutio /semut_io /semut.io
    C Addition
    1. Assign 1 to a
    ○ 1a. Set a->PyObject_HEAD->typecode to
    integer
    ○ 1b. Set a->val = 1
    2. Assign 2 to b
    ○ 2a. Set b->PyObject_HEAD->typecode to
    integer
    ○ 2b. Set b->val = 2
    3. call binary_add(a, b)
    ○ 3a. find typecode in a->PyObject_HEAD
    ○ 3b. a is an integer; value is a->val
    ○ 3c. find typecode in b->PyObject_HEAD
    ○ 3d. b is an integer; value is b->val
    ○ 3e. call binary_add(a->val, b->val)
    ○ 3f. result of this is result, and is an
    integer.
    4. Create a Python object c
    ○ 4a. set c->PyObject_HEAD->typecode to
    integer
    ○ 4b. set c->val to result
    1. Assign 1 to a
    2. Assign 2 to b
    3. call binary_add(a, b)
    4. Assign the result to c
    Python Addition

    View Slide

  13. /semut-io /Semutio /semut_io /semut.io
    2. Python is interpreted, rather than compiled
    ➔ A smart compiler can look ahead and optimize for repeated and unneeded
    operations.
    ➔ These can result in massive speed-ups.

    View Slide

  14. /semut-io /Semutio /semut_io /semut.io
    A lot of you may say,
    “But Python is not as slow, I’ve used it.”

    View Slide

  15. /semut-io /Semutio /semut_io /semut.io
    3. Python’s object model can lead to inefficient
    memory access
    ➔ While in C, you might use some buffer based array(pretty fast), Python’s
    List is complicated, and clearly, slow.
    ➔ A NumPy array in its simplest form is a Python object build around a C
    array. I.e. it has a pointer to a contiguous data buffer of values.
    ➔ A Python list, on the other hand, has a pointer to a contiguous buffer of
    pointers, each of which points to a Python object which in turn has
    references to its data (in this case, integers).
    ➔ It’s easy to see how numpy substantially boosts the performance.

    View Slide

  16. /semut-io /Semutio /semut_io /semut.io
    3. Python’s object model can lead to inefficient
    memory access

    View Slide

  17. /semut-io /Semutio /semut_io /semut.io
    Why am I even using Python?
    ➔ Dynamic typing makes Python easier to use than C
    ➔ It's extremely flexible and forgiving.
    ➔ This flexibility leads to efficient use of development time.
    ➔ Python offers easy hooks into compiled libraries, for when you need
    that Fortranish performance. (See fortran-magic on PyPI)
    ➔ Python ends up being an extremely efficient language for the overall task
    of doing science with code.
    ➔ Meta-Programming

    View Slide

  18. /semut-io /Semutio /semut_io /semut.io
    Example: K-Means Clustering
    Algorithm:
    1. Choose some Cluster Centers
    2. Repeat:
    a. Assign points to nearest center
    b. Update center to mean of points
    c. Check if Converged

    View Slide

  19. /semut-io /Semutio /semut_io /semut.io
    Example: K-Means Clustering
    Algorithm:
    1. Choose some Cluster Centers
    2. Repeat:
    a. Assign points to nearest center
    b. Update center to mean of points
    c. Check if Converged

    View Slide

  20. /semut-io /Semutio /semut_io /semut.io
    Example: K-Means Clustering
    Algorithm:
    1. Choose some Cluster Centers
    2. Repeat:
    a. Assign points to nearest center
    b. Update center to mean of points
    c. Check if Converged

    View Slide

  21. /semut-io /Semutio /semut_io /semut.io
    Example: K-Means Clustering
    Algorithm:
    1. Choose some Cluster Centers
    2. Repeat:
    a. Assign points to nearest center
    b. Update center to mean of points
    c. Check if Converged

    View Slide

  22. /semut-io /Semutio /semut_io /semut.io
    Example: K-Means Clustering
    Algorithm:
    1. Choose some Cluster Centers
    2. Repeat:
    a. Assign points to nearest center
    b. Update center to mean of points
    c. Check if Converged

    View Slide

  23. /semut-io /Semutio /semut_io /semut.io
    Pure Python Snippet
    7.44 s ± 122 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - 5000 points

    View Slide

  24. /semut-io /Semutio /semut_io /semut.io
    What can you do when Python is too slow?

    View Slide

  25. /semut-io /Semutio /semut_io /semut.io
    1. Line Profiling
    “Premature optimization is the root of all evil”
    ~ Donald Knuth

    View Slide

  26. /semut-io /Semutio /semut_io /semut.io
    Repeated Operations on Arrays causing a lot of delay in computations.

    View Slide

  27. /semut-io /Semutio /semut_io /semut.io
    2. Numpy Vectorization (DSL)

    View Slide

  28. /semut-io /Semutio /semut_io /semut.io
    2. Numpy Vectorization (DSL)

    View Slide

  29. /semut-io /Semutio /semut_io /semut.io
    2. Numpy Vectorization (DSL)

    View Slide

  30. /semut-io /Semutio /semut_io /semut.io
    2. Numpy Vectorization (DSL)
    131 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    Down from 7.44 seconds to 0.13 seconds (57X Faster)
    Key: Repeated operations pushed into a compiled layer
    Python overhead per array rather than per array element.

    View Slide

  31. /semut-io /Semutio /semut_io /semut.io
    2. Numpy Vectorization (DSL)
    Advantages:
    ➔ Python overhead per array rather than per array element
    ➔ Compact domain specific language for array operations
    ➔ NumPy is widely available
    Disadvantages:
    ➔ Batch operations can lead to excessive memory usage
    ➔ Different way of thinking about writing code
    Recommendation: Use Numpy Everywhere

    View Slide

  32. /semut-io /Semutio /semut_io /semut.io
    3. Use specialised Data Structures
    Some popular examples:
    - Pandas
    - Scipy
    Same operations on Pandas Dataframe(groupby) and Scipy(KDTree):
    102 ms ± 2.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    As compared to:
    - 7.44 Seconds in Python
    - 131 ms with NumPy

    View Slide

  33. /semut-io /Semutio /semut_io /semut.io
    Some interesting Data Structures
    ➔ scipy.spatial for spatial queries like distances, nearest neighbors, etc.
    ➔ pandas for SQL-like grouping & aggregation
    ➔ xarray for grouping across multiple dimensions
    ➔ scipy.sparse.csgraph for graph-like problems (e.g. finding shortest
    paths)
    Recommendation: Use whenever applicable and possible

    View Slide

  34. /semut-io /Semutio /semut_io /semut.io
    4. Cython
    Few changes to the code, doesn’t quite look like Python. Steep Learning Curve.

    View Slide

  35. /semut-io /Semutio /semut_io /semut.io
    4. Cython
    97.7 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

    View Slide

  36. /semut-io /Semutio /semut_io /semut.io
    4. Cython
    1.72 ms ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    56.8X Faster than Pure Python

    View Slide

  37. /semut-io /Semutio /semut_io /semut.io
    5. Numba JIT
    ➔ Numba is an open source, NumPy-aware optimizing compiler for Python
    sponsored by Anaconda, Inc. It uses the LLVM compiler project to
    generate machine code from Python syntax.
    ➔ GPU Support
    ➔ All you have to do is, add a @jit(nopython=True) decorator on function
    you want to accelerate.
    ➔ Little bit buggy sometimes, but still works 99% of the time.

    View Slide

  38. /semut-io /Semutio /semut_io /semut.io
    5. Numba JIT

    View Slide

  39. /semut-io /Semutio /semut_io /semut.io
    5. Numba JIT
    Pure Python: 97.7 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 10
    loops each)
    Numba JIT: 1.47 ms ± 14.2 µs per loop (mean ± std. dev. of 7 runs, 1000
    loops each)
    66X Performance Boost in Numba JIT
    Again, this comes with a cost.

    View Slide

  40. /semut-io /Semutio /semut_io /semut.io
    Problems with using Numba JIT
    ➔ Two modes: “nopython” and “object”, only former is truly optimised.
    ➔ Functions JITted in nopython mode can only call functions JITted in
    nopython mode.
    ➔ Avoid “object” mode. It is in process of being deprecated.
    ➔ Passing functions as arguments is even slower than JITing them, which is
    a huge blocker.

    View Slide

  41. /semut-io /Semutio /semut_io /semut.io
    5. Numba JIT
    High level API: Simple structure, but complex data structures. Can be
    anything.
    Dangerous Algorithms: Broken into small chunks which can easily be
    accelerated using JIT. Avoid object mode.
    Nice, High level API
    Dangerous Algorithms

    View Slide

  42. /semut-io /Semutio /semut_io /semut.io
    5. Numba JIT
    ➔ If things stop working, export NUMBA_DISABLE_JIT=1
    ➔ You might have to rewrite some stuff, makes the code a little less dynamic
    ➔ Numba evolves very fast, keep an eye on their webpage.

    View Slide

  43. /semut-io /Semutio /semut_io /semut.io
    Other ways of making Python faster
    - CuPy
    - Dask
    There are no limits to what Python can do, thanks to the meta-programming
    possible. Even you can write such a tool.
    Recommendation: Use Numpy with Numba (Best for Scientific Computing)

    View Slide

  44. /semut-io /Semutio /semut_io /semut.io
    Resources
    Pure Python Implementation:
    https:/
    /gist.github.com/shreyasbapat/bbc8f468dacbeb5b9baf7ead2c0d5077
    Complete Notebook:
    https:/
    /gist.github.com/shreyasbapat/b7135d38273f72c6ce827256f4daedeb

    View Slide

  45. /semut-io /Semutio /semut_io /semut.io
    Thanks for joining!

    View Slide