Numba: A dynamic Python compiler for Science by Travis E. Oliphant, Jon Riehl Mark Florisson, and Siu Kwan Lam

Numba: A dynamic Python compiler for Science by Travis E. Oliphant, Jon Riehl Mark Florisson, and Siu Kwan Lam

Afcfefa1f067d10bd021de0cc2e5e806?s=128

PyCon 2013

March 17, 2013
Tweet

Transcript

  1. Numba: A dynamic Python compiler for Science (i.e. for NumPy

    and other typed containers) March 16, 2013 Travis E. Oliphant, Jon Riehl Mark Florisson, Siu Kwan Lam Saturday, March 16, 13
  2. Where I’m coming from After Before ⇢0 (2⇡f)2 Ui (a,

    f) = [Cijkl (a, f) Uk,l (a, f)] ,j Saturday, March 16, 13
  3.      1,000,000 to 2,000,000 users of

    NumPy! Saturday, March 16, 13
  4. NumFOCUS --- blatant ad! www.numfocus.org 501(c)3 Public Charity Join Us!

    http://numfocus.org/membership/ Saturday, March 16, 13
  5. Code that users might write xi = i 1 X

    j=0 ki j,jai jaj O = I ? F Slow!!!! Saturday, March 16, 13
  6. Why is Python slow? 1. Dynamic typing 2. Attribute lookups

    3. NumPy get-item (a[...]) Saturday, March 16, 13
  7. What are Scientists doing Now? • Writing critical parts in

    C/C++/Fortran and “wrapping” with • SWIG • ctypes • Cython • f2py (or fwrap) • hand-coded wrappers • Writing new code in Cython directly • Cython is “modified Python” with type information everywhere. • It produces a C-extension module which is then compiled Saturday, March 16, 13
  8. Cython is the most popular these days. But, speeding up

    NumPy-based codes should be even easier! Saturday, March 16, 13
  9. NumPy Array is “typed container” shape Saturday, March 16, 13

  10. Let’s use this! NumPy Users are already using “typed containers”

    with regular storage and access patterns. There is plenty of information to optimize the code if we either: • Provide type information for function inputs (jit) • Create a “call-site” for each function that compiles and caches the result the first time it gets called with new types. Saturday, March 16, 13
  11. Requirements Part I • Work with CPython (we need the

    full scientific Python stack!) • Minimal modifications to code (use type inference) • Programmer control over what and when to “jit” • Ability to build static extensions (for libraries) • Fall back to Python C-API for “object” types. Saturday, March 16, 13
  12. Requirements Part II • Produce code as fast as C

    (maybe even Fortran) • Support NumPy array-expressions and be able to produce universal functions (e.g. y = sin(x)) • Provide a tool that could adapt to provide parallelism and produce code for modern vector hardware (GPUs, accelerators, and many-core machines) Saturday, March 16, 13
  13. Do we have to write the full compiler?? No! LLVM

    has done much heavy lifting LLVM = Compilers for everybody Saturday, March 16, 13
  14. Face of a modern compiler Intermediate Representation (IR) x86 C++

    ARM PTX C Fortran ObjC Parsing Code Generation Front-End Back-End Saturday, March 16, 13
  15. Face of a modern compiler Intermediate Representation (IR) x86 ARM

    PTX Python Code Generation Back-End Numba LLVM Parsing Front-End Saturday, March 16, 13
  16. Example Numba Saturday, March 16, 13

  17. NumPy + Mamba = Numba LLVM Library Intel Nvidia Apple

    AMD OpenCL ISPC CUDA CLANG OpenMP LLVMPY Python Function Machine Code ARM Saturday, March 16, 13
  18. Simple API • jit --- provide type information (fastest to

    call at run-time) • autojit --- detects input types, infers output, generates code if needed, and dispatches (a little more run-time call overhead) #@jit('void(double[:,:], double, double)') @autojit def numba_update(u, dx2, dy2): nx, ny = u.shape for i in xrange(1,nx-1): for j in xrange(1, ny-1): u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 + (u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2)) Comment out one of jit or autojit (don’t use together) Saturday, March 16, 13
  19. Example @numba.jit(‘f8(f8)’) def sinc(x): if x==0.0: return 1.0 else: return

    sin(x*pi)/(pi*x) Numba Saturday, March 16, 13
  20. ~150x speed-up Real-time image processing (50 fps Mandelbrot) Saturday, March

    16, 13
  21. Speeding up Math Expressions xi = i 1 X j=0

    ki j,jai jaj Saturday, March 16, 13
  22. Image Processing @jit('void(f8[:,:],f8[:,:],f8[:,:])') def filter(image, filt, output): M, N =

    image.shape m, n = filt.shape for i in range(m//2, M-m//2): for j in range(n//2, N-n//2): result = 0.0 for k in range(m): for l in range(n): result += image[i+k-m//2,j+l-n//2]*filt[k, l] output[i,j] = result ~1500x speed-up Saturday, March 16, 13
  23. Compile NumPy array expressions from numba import autojit @autojit def

    formula(a, b, c): a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1:,:-1] @autojit def express(m1, m2): m2[1:-1:2,0,...,::2] = (m1[1:-1:2,...,::2] * m1[-2:1:-2,...,::2]) return m2 Saturday, March 16, 13
  24. Fast vectorize NumPy’s ufuncs take “kernels” and apply the kernel

    element-by-element over entire arrays Write kernels in Python! from numba.vectorize import vectorize from math import sin @vectorize([‘f8(f8)’, ‘f4(f4)’]) def sinc(x): if x==0.0: return 1.0 else: return sin(x*pi)/(pi*x) Saturday, March 16, 13
  25. Case-study -- j0 from scipy.special • scipy.special was one of

    the first libraries I wrote • extended “umath” module by adding new “universal functions” to compute many scientific functions by wrapping C and Fortran libs. • Bessel functions are solutions to a differential equation: x 2 d 2 y dx 2 + x dy dx + ( x 2 ↵ 2) y = 0 y = J↵ ( x ) Jn (x) = 1 ⇡ Z ⇡ 0 cos (n⌧ x sin (⌧)) d⌧ Saturday, March 16, 13
  26. scipy.special.j0 wraps cephes algorithm Saturday, March 16, 13

  27. Result --- equivalent to compiled code In [6]: %timeit vj0(x)

    10000 loops, best of 3: 75 us per loop In [7]: from scipy.special import j0 In [8]: %timeit j0(x) 10000 loops, best of 3: 75.3 us per loop But! Now code is in Python and can be experimented with more easily (and moved to the GPU / accelerator more easily)! Saturday, March 16, 13
  28. Laplace Example @jit('void(double[:,:], double, double)') def numba_update(u, dx2, dy2): nx,

    ny = u.shape for i in xrange(1,nx-1): for j in xrange(1, ny-1): u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 + (u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2)) Adapted from http://www.scipy.org/PerformancePython originally by Prabhu Ramachandran @jit('void(double[:,:], double, double)') def numbavec_update(u, dx2, dy2): u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 + (u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2)) Saturday, March 16, 13
  29. Results of Laplace example Version Time Speed Up NumPy 3.19

    1.0 Numba 2.32 1.38 Vect. Numba 2.33 1.37 Cython 2.38 1.34 Weave 2.47 1.29 Numexpr 2.62 1.22 Fortran Loops 2.30 1.39 Vect. Fortran 1.50 2.13 https://github.com/teoliphant/speed.git Saturday, March 16, 13
  30. Numba can change the game! LLVM IR x86 C++ ARM

    PTX C Fortran Python Numba turns Python into a “compiled language” (but much more flexible). You don’t have to reach for C/C++ Saturday, March 16, 13
  31. Many More Advanced Features • Extension classes (jit a class

    --- autojit coming soon!) • Struct support (NumPy arrays can be structs) • SSA --- can refer to local variables as different types • Typed lists and typed dictionaries and sets coming soon! • pointer support • calling ctypes and CFFI functions natively • pycc (create stand-alone dynamic library and executable) • pycc --python (create static extension module for Python) Saturday, March 16, 13
  32. Uses of Numba Python Function Framework accepting dynamic function pointers

    Ufuncs Generalized UFuncs Function- based Indexing Memory Filters Window Kernel Funcs I/O Filters Reduction Filters Computed Columns Numba function pointer Saturday, March 16, 13
  33. Accelerate/NumbaPro -- blatant ad! Python and NumPy compiled to Parallel

    Architectures (GPUs and multi-core machines) • Create parallel-for loops • Parallel execution of ufuncs • Run ufuncs on the GPU • Write CUDA directly in Python! • Free for Academics fast development and fast execution! Currently premium features will be contributed to open- source over time! Saturday, March 16, 13
  34. Numba Development 1260 Mark Florisson 203 Jon Riehl 181 Siu

    Kwan Lam 110 Travis E. Oliphant 30 Dag Sverre Seljebotn 28 Hernan Grecco 19 Ilan Schnell 11 Mark Wiebe 8 James Bergstra 4 Alberto Valverde 3 Thomas Kluyver 2 Maggie Mari 2 Dan Yamins 2 Dan Christensen 1 timo 1 Yaroslav Halchenko 1 Phillip Cloud 1 Ondřej Čertík 1 Martin Spacek 1 Lars Buitinck 1 Juan Luis Cano Rodríguez git log --format=format:%an | sort | uniq -c | sort -r Siu Mark Jon Saturday, March 16, 13
  35. Milestone Roadmap • Rapid progress this year • Still some

    bugs -- needs users! • Version 0.7 end of Feb. • Version 0.8 in April • Version 0.9 June • Version 1.0 by end of August • Stable API (jit, autojit) easy to use • Should be able to write equivalent of NumPy and SciPy with Numba and memory-views. http://numba.pydata.org http://llvmpy.org http://compilers.pydata.org We need you: • your use-cases • your tests • developer help Saturday, March 16, 13
  36. Architectural Overview Python Source Python Parser Python AST Numba Stage

    1 Numba Stage n Numba Code Generator Numba Environment Numba AST LLVM Saturday, March 16, 13
  37. Numba Architecture l Entry points l …/numba/decorators.py l Environment l

    …/numba/environment.py l Pipeline l …/numba/pipeline.py l Code generation l …/numba/codegen/... Saturday, March 16, 13
  38. Development Roadmap l Better stage separation, better modularity l Untyped

    Intermediate Representation (IR) l Typed IR l Specialized IR l Module level entry points l Better Array Specialization Saturday, March 16, 13
  39. Community Involvement l ~/git/numba$ wc AUTHORS 25 88 1470 AUTHORS

    l (4 lines are blank or instructions) l Github https://github.com/numba/numba l Mailing list --- numba-users@continuum.io l Sprints --- contact Jon Riehl l Examples: l Hernan Grecco just contributed Python 3 support (Yeah!) l Dag collaborating on autojit classes with Mark F. l We need you to show off your amazing demo! Saturday, March 16, 13