Your Escape Plan From Numpy + Cython

55517518e318b28906211d115c24208b?s=47 clyang
September 09, 2020

Your Escape Plan From Numpy + Cython

55517518e318b28906211d115c24208b?s=128

clyang

September 09, 2020
Tweet

Transcript

  1. CyCraft Proprietary and Confidential Information Your Escape Plan From NumPy

    + Cython Cheng-Lin Yang, PhD CyCraft Japan
  2. CyCraft Proprietary and Confidential Information $ whoami • Cheng-Lin Yang

    : @clyang • Taiwanese and live in Taipei • Working for Cybersecurity company: CyCraft Japan • Member of Machine Learning team
  3. CyCraft Proprietary and Confidential Information Before we start, one quick

    question for you. Which code runs faster? A. np.power(x, 8) B. x ** 8 C. x * x * x * x * x * x * x * x
  4. CyCraft Proprietary and Confidential Information Answer: C x * x

    * x * x * x * x * x * x
  5. CyCraft Proprietary and Confidential Information Why not Cython?

  6. CyCraft Proprietary and Confidential Information Cython • Advantage • Utilising

    3rd party C library can execute faster • Releasing GIL • Still have the run-time check for common problem provided by Python • Cython syntax is very similar to Python • Disadvantage • You have to handle memory by yourself (if malloc is used) • To get ultimate performance, writing C code with low-level intrinsics CANNOT be avoided (this can be painful)
  7. CyCraft Proprietary and Confidential Information You have to write something

    like this, and it’s painful
  8. CyCraft Proprietary and Confidential Information Today’s example

  9. CyCraft Proprietary and Confidential Information logsumexp (LSE) - I •

    softmax function is defined as: for j = 1, … , k where Z is a K-dimensional vector • logumexp is a log-sum-trick which prevents over/underflow during softmax calculation
  10. CyCraft Proprietary and Confidential Information logsumexp (LSE) - II •

    However, floating point underflow will occur during summation. For example: 134217728 1 134217728
  11. CyCraft Proprietary and Confidential Information logsumexp (LSE) - III •

    The problem can be solved by this simple trick • Applying previous example:
  12. CyCraft Proprietary and Confidential Information SciPy has it. Why rebuild

    the wheel? • Too many checks drag performance • For general purpose usage • Caveats to improve performance: • Assuming the input data is following the conditions, so we can remove the unnecessary checks. • Verify what you actual need and simplify the code as per your requirements. • For example: only 1-D arrays will be used in my following scenario
  13. CyCraft Proprietary and Confidential Information logsumexp in NumPy

  14. CyCraft Proprietary and Confidential Information logsumexp in NumPy • Based

    on my scenario. Logsumexp can be implemented as follows:
  15. CyCraft Proprietary and Confidential Information NumPy vs. SciPy • Results:

  16. CyCraft Proprietary and Confidential Information Solution 1: CuPy

  17. CyCraft Proprietary and Confidential Information CuPy • https://github.com/cupy/cupy • Providing

    NumPy-compatible ND-array on CUDA • Utilising GPU power • Compatible with Existing CUDA kernel • Providing many NumPy equivalent functions so you can minimize code refactoring effort • Check the differences! • https://docs.cupy.dev/en/stable/reference/difference.html • Moving data between CPU and GPU is expensive!
  18. CyCraft Proprietary and Confidential Information logsumexp in CuPy • Result:

  19. CyCraft Proprietary and Confidential Information Solution 2: Numba

  20. CyCraft Proprietary and Confidential Information Numba • http://numba.pydata.org • Just-In-Time

    (JIT) approach • Translating a subset of Python and NumPy code to machine code • Utilising both CPU and GPU power • Support OpenMP • Near zero code modification • Simply put the “@jit” decorator before the function you want to speed up • Works best with functions not classes (early support) • Active development and large user community
  21. CyCraft Proprietary and Confidential Information Numba • Two modes you

    need to know • nopython mode (equals to @njit) • Allows you to get rid of Python’s GIL • object mode • @njit + OpenMP is easy to parallelize computation without GIL limitation
  22. CyCraft Proprietary and Confidential Information logsumexp by Numba • Results:

  23. CyCraft Proprietary and Confidential Information Solution 3: Pythran

  24. CyCraft Proprietary and Confidential Information Pythran • https://pythran.readthedocs.io/en/latest/ • Active

    development and has fast growing community • Using ahead-of-time compiling approach • LLVM + compiler does all the magic! • Supporting a subset of Python and NumPy functions • Works on Python 2.7 and 3.6/7/8 • Similar to Numba, you have to put a special decorator before the function you want to boost • OpenMP can also be used with Pythran
  25. CyCraft Proprietary and Confidential Information logsumexp in Pythran • First,

    write the Python code as usual. (pythran_logsumexp.py) • Compile it by using: • CXX=clang++ pythran -DUSE_XSIMD -march=native -O3 pythran_logsumexp.py
  26. CyCraft Proprietary and Confidential Information logsumexp in Pythran • Import

    the just compiled module and run it! • Result
  27. CyCraft Proprietary and Confidential Information So, which is better? (in

    numbers)
  28. CyCraft Proprietary and Confidential Information Benchmark • All benchmarks were

    run on a bare metal machine with the following specifications: • CPU: Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz • RAM: 256GB DDR4 with ECC • GPU: GeForce GTX 1080 Ti • Python and Library information: • Python 3.6.9 • Cuda 10.2 • NumPy 1.17.5 • CuPy 7.8.0 • Numba 0.51.0 • Pythran 0.9.6
  29. CyCraft Proprietary and Confidential Information 6.7171 6.2521 6.7264 6.0252 5.4564

    1.6152 0 1 2 3 4 5 6 7 8 SCIPY NUMPY NUMBA JIT NUMBA NJIT PYTHRAN CUPY seconds (lower is better)
  30. CyCraft Proprietary and Confidential Information My decision tree CuPy, Numba

    and Pythran
  31. CyCraft Proprietary and Confidential Information Has GPU Like compiler? Need

    Cuda kernel? CuPy Has CPU computation? Pythran Numba CuPy Numba yes yes yes yes no no no no
  32. CyCraft Proprietary and Confidential Information Three Takeaways • If you

    have GPU(s), try CuPy first! • If you only have CPU, use Numba first • Numba supports more NumPy functions • If it works, try Pythran to get more performance • Each solution supports different number of NumPy functions. • You can easily find out which function doesn’t work (program stops :P ) • Check its document to see which functions are provided • If A doesn’t work, B might work!
  33. CyCraft Proprietary and Confidential Information Thank You Stay Safe