Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Numba: A dynamic Python compiler for Science by Travis E. Oliphant, Jon Riehl Mark Florisson, and Siu Kwan Lam

Numba: A dynamic Python compiler for Science by Travis E. Oliphant, Jon Riehl Mark Florisson, and Siu Kwan Lam

PyCon 2013

March 17, 2013
Tweet

More Decks by PyCon 2013

Other Decks in Programming

Transcript

  1. Numba: A dynamic Python
    compiler for Science (i.e. for
    NumPy and other typed containers)
    March 16, 2013
    Travis E. Oliphant, Jon Riehl
    Mark Florisson, Siu Kwan Lam
    Saturday, March 16, 13

    View Slide

  2. Where I’m coming from
    After
    Before
    ⇢0
    (2⇡f)2 Ui
    (a, f) = [Cijkl
    (a, f) Uk,l
    (a, f)]
    ,j
    Saturday, March 16, 13

    View Slide

  3. 
    
    
    
    
    1,000,000 to 2,000,000 users of NumPy!
    Saturday, March 16, 13

    View Slide

  4. NumFOCUS --- blatant ad!
    www.numfocus.org
    501(c)3 Public Charity
    Join Us! http://numfocus.org/membership/
    Saturday, March 16, 13

    View Slide

  5. Code that users might write
    xi =
    i 1
    X
    j=0
    ki j,jai jaj
    O = I ? F
    Slow!!!!
    Saturday, March 16, 13

    View Slide

  6. Why is Python slow?
    1. Dynamic typing
    2. Attribute lookups
    3. NumPy get-item (a[...])
    Saturday, March 16, 13

    View Slide

  7. What are Scientists doing Now?
    • Writing critical parts in C/C++/Fortran and
    “wrapping” with
    • SWIG
    • ctypes
    • Cython
    • f2py (or fwrap)
    • hand-coded wrappers
    • Writing new code in Cython directly
    • Cython is “modified Python” with type information everywhere.
    • It produces a C-extension module which is then compiled
    Saturday, March 16, 13

    View Slide

  8. Cython is the most popular
    these days. But, speeding up
    NumPy-based codes should be
    even easier!
    Saturday, March 16, 13

    View Slide

  9. NumPy Array is “typed container”
    shape
    Saturday, March 16, 13

    View Slide

  10. Let’s use this!
    NumPy Users are already using “typed
    containers” with regular storage and access
    patterns. There is plenty of information to
    optimize the code if we either:
    • Provide type information for function
    inputs (jit)
    • Create a “call-site” for each function that
    compiles and caches the result the first
    time it gets called with new types.
    Saturday, March 16, 13

    View Slide

  11. Requirements Part I
    • Work with CPython (we need the full scientific
    Python stack!)
    • Minimal modifications to code (use type inference)
    • Programmer control over what and when to “jit”
    • Ability to build static extensions (for libraries)
    • Fall back to Python C-API for “object” types.
    Saturday, March 16, 13

    View Slide

  12. Requirements Part II
    • Produce code as fast as C (maybe even Fortran)
    • Support NumPy array-expressions and be able to
    produce universal functions (e.g. y = sin(x))
    • Provide a tool that could adapt to provide
    parallelism and produce code for modern vector
    hardware (GPUs, accelerators, and many-core
    machines)
    Saturday, March 16, 13

    View Slide

  13. Do we have to write the full compiler??
    No!
    LLVM has
    done much
    heavy lifting
    LLVM =
    Compilers for
    everybody
    Saturday, March 16, 13

    View Slide

  14. Face of a modern compiler
    Intermediate
    Representation
    (IR)
    x86
    C++
    ARM
    PTX
    C
    Fortran
    ObjC
    Parsing Code Generation
    Front-End Back-End
    Saturday, March 16, 13

    View Slide

  15. Face of a modern compiler
    Intermediate
    Representation
    (IR)
    x86
    ARM
    PTX
    Python
    Code Generation
    Back-End
    Numba LLVM
    Parsing
    Front-End
    Saturday, March 16, 13

    View Slide

  16. Example
    Numba
    Saturday, March 16, 13

    View Slide

  17. NumPy + Mamba = Numba
    LLVM Library
    Intel Nvidia Apple
    AMD
    OpenCL
    ISPC CUDA CLANG
    OpenMP
    LLVMPY
    Python Function Machine Code
    ARM
    Saturday, March 16, 13

    View Slide

  18. Simple API

    • jit --- provide type information (fastest to call at run-time)
    • autojit --- detects input types, infers output, generates code
    if needed, and dispatches (a little more run-time call
    overhead)
    #@jit('void(double[:,:], double, double)')
    @autojit
    def numba_update(u, dx2, dy2):
    nx, ny = u.shape
    for i in xrange(1,nx-1):
    for j in xrange(1, ny-1):
    u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +
    (u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2))
    Comment out one of jit or autojit (don’t use together)
    Saturday, March 16, 13

    View Slide

  19. Example
    @numba.jit(‘f8(f8)’)
    def sinc(x):
    if x==0.0:
    return 1.0
    else:
    return sin(x*pi)/(pi*x)
    Numba
    Saturday, March 16, 13

    View Slide

  20. ~150x speed-up Real-time image
    processing (50 fps
    Mandelbrot)
    Saturday, March 16, 13

    View Slide

  21. Speeding up Math Expressions
    xi =
    i 1
    X
    j=0
    ki j,jai jaj
    Saturday, March 16, 13

    View Slide

  22. Image Processing
    @jit('void(f8[:,:],f8[:,:],f8[:,:])')
    def filter(image, filt, output):
    M, N = image.shape
    m, n = filt.shape
    for i in range(m//2, M-m//2):
    for j in range(n//2, N-n//2):
    result = 0.0
    for k in range(m):
    for l in range(n):
    result += image[i+k-m//2,j+l-n//2]*filt[k, l]
    output[i,j] = result
    ~1500x speed-up
    Saturday, March 16, 13

    View Slide

  23. Compile NumPy array expressions
    from numba import autojit
    @autojit
    def formula(a, b, c):
    a[1:,1:] = a[1:,1:] + b[1:,:-1] + c[1:,:-1]
    @autojit
    def express(m1, m2):
    m2[1:-1:2,0,...,::2] = (m1[1:-1:2,...,::2] *
    m1[-2:1:-2,...,::2])
    return m2
    Saturday, March 16, 13

    View Slide

  24. Fast vectorize
    NumPy’s ufuncs take “kernels” and
    apply the kernel element-by-element
    over entire arrays
    Write kernels in
    Python!
    from numba.vectorize import vectorize
    from math import sin
    @vectorize([‘f8(f8)’, ‘f4(f4)’])
    def sinc(x):
    if x==0.0:
    return 1.0
    else:
    return sin(x*pi)/(pi*x)
    Saturday, March 16, 13

    View Slide

  25. Case-study -- j0 from scipy.special
    • scipy.special was one of the first libraries I wrote
    • extended “umath” module by adding new
    “universal functions” to compute many scientific
    functions by wrapping C and Fortran libs.
    • Bessel functions are solutions to a differential
    equation:
    x
    2 d
    2
    y
    dx
    2
    +
    x
    dy
    dx
    + (
    x
    2

    2)
    y
    = 0
    y
    =
    J↵ (
    x
    )
    Jn (x) =
    1

    Z ⇡
    0
    cos (n⌧ x sin (⌧)) d⌧
    Saturday, March 16, 13

    View Slide

  26. scipy.special.j0 wraps cephes algorithm
    Saturday, March 16, 13

    View Slide

  27. Result --- equivalent to compiled code
    In [6]: %timeit vj0(x)
    10000 loops, best of 3: 75 us per loop
    In [7]: from scipy.special import j0
    In [8]: %timeit j0(x)
    10000 loops, best of 3: 75.3 us per loop
    But! Now code is in Python and can be
    experimented with more easily (and moved to
    the GPU / accelerator more easily)!
    Saturday, March 16, 13

    View Slide

  28. Laplace Example
    @jit('void(double[:,:], double, double)')
    def numba_update(u, dx2, dy2):
    nx, ny = u.shape
    for i in xrange(1,nx-1):
    for j in xrange(1, ny-1):
    u[i,j] = ((u[i+1,j] + u[i-1,j]) * dy2 +
    (u[i,j+1] + u[i,j-1]) * dx2) / (2*(dx2+dy2))
    Adapted from http://www.scipy.org/PerformancePython
    originally by Prabhu Ramachandran
    @jit('void(double[:,:], double, double)')
    def numbavec_update(u, dx2, dy2):
    u[1:-1,1:-1] = ((u[2:,1:-1]+u[:-2,1:-1])*dy2 +
    (u[1:-1,2:] + u[1:-1,:-2])*dx2) / (2*(dx2+dy2))
    Saturday, March 16, 13

    View Slide

  29. Results of Laplace example
    Version Time Speed Up
    NumPy 3.19 1.0
    Numba 2.32 1.38
    Vect. Numba 2.33 1.37
    Cython 2.38 1.34
    Weave 2.47 1.29
    Numexpr 2.62 1.22
    Fortran Loops 2.30 1.39
    Vect. Fortran 1.50 2.13
    https://github.com/teoliphant/speed.git
    Saturday, March 16, 13

    View Slide

  30. Numba can change the game!
    LLVM IR
    x86
    C++
    ARM
    PTX
    C
    Fortran
    Python
    Numba turns Python into a “compiled
    language” (but much more flexible). You don’t
    have to reach for C/C++
    Saturday, March 16, 13

    View Slide

  31. Many More Advanced Features
    • Extension classes (jit a class --- autojit coming soon!)
    • Struct support (NumPy arrays can be structs)
    • SSA --- can refer to local variables as different types
    • Typed lists and typed dictionaries and sets coming soon!
    • pointer support
    • calling ctypes and CFFI functions natively
    • pycc (create stand-alone dynamic library and executable)
    • pycc --python (create static extension module for Python)
    Saturday, March 16, 13

    View Slide

  32. Uses of Numba
    Python
    Function
    Framework accepting dynamic function pointers
    Ufuncs
    Generalized
    UFuncs
    Function-
    based
    Indexing
    Memory
    Filters
    Window
    Kernel
    Funcs
    I/O Filters
    Reduction
    Filters
    Computed
    Columns
    Numba
    function pointer
    Saturday, March 16, 13

    View Slide

  33. Accelerate/NumbaPro -- blatant ad!
    Python and NumPy compiled to
    Parallel Architectures
    (GPUs and multi-core
    machines)
    • Create parallel-for loops
    • Parallel execution of
    ufuncs
    • Run ufuncs on the GPU
    • Write CUDA directly in
    Python!
    • Free for Academics
    fast development and fast
    execution!
    Currently premium
    features will be
    contributed to open-
    source over time!
    Saturday, March 16, 13

    View Slide

  34. Numba Development 1260 Mark Florisson
    203 Jon Riehl
    181 Siu Kwan Lam
    110 Travis E. Oliphant
    30 Dag Sverre Seljebotn
    28 Hernan Grecco
    19 Ilan Schnell
    11 Mark Wiebe
    8 James Bergstra
    4 Alberto Valverde
    3 Thomas Kluyver
    2 Maggie Mari
    2 Dan Yamins
    2 Dan Christensen
    1 timo
    1 Yaroslav Halchenko
    1 Phillip Cloud
    1 Ondřej Čertík
    1 Martin Spacek
    1 Lars Buitinck
    1 Juan Luis Cano Rodríguez
    git log --format=format:%an | sort | uniq -c | sort -r
    Siu
    Mark
    Jon
    Saturday, March 16, 13

    View Slide

  35. Milestone Roadmap
    • Rapid progress this year
    • Still some bugs -- needs users!
    • Version 0.7 end of Feb.
    • Version 0.8 in April
    • Version 0.9 June
    • Version 1.0 by end of August
    • Stable API (jit, autojit) easy to use
    • Should be able to write equivalent of
    NumPy and SciPy with Numba and
    memory-views.
    http://numba.pydata.org
    http://llvmpy.org
    http://compilers.pydata.org
    We need you:
    • your use-cases
    • your tests
    • developer help
    Saturday, March 16, 13

    View Slide

  36. Architectural Overview
    Python
    Source
    Python Parser
    Python
    AST
    Numba Stage 1 Numba Stage n
    Numba Code
    Generator
    Numba
    Environment
    Numba
    AST
    LLVM
    Saturday, March 16, 13

    View Slide

  37. Numba Architecture
    l Entry points
    l …/numba/decorators.py
    l Environment
    l …/numba/environment.py
    l Pipeline
    l …/numba/pipeline.py
    l Code generation
    l …/numba/codegen/...
    Saturday, March 16, 13

    View Slide

  38. Development Roadmap
    l Better stage separation, better modularity
    l Untyped Intermediate Representation (IR)
    l Typed IR
    l Specialized IR
    l Module level entry points
    l Better Array Specialization
    Saturday, March 16, 13

    View Slide

  39. Community Involvement
    l ~/git/numba$ wc AUTHORS
    25 88 1470 AUTHORS
    l (4 lines are blank or instructions)
    l Github https://github.com/numba/numba
    l Mailing list --- [email protected]
    l Sprints --- contact Jon Riehl
    l Examples:
    l Hernan Grecco just contributed Python 3 support (Yeah!)
    l Dag collaborating on autojit classes with Mark F.
    l We need you to show off your amazing demo!
    Saturday, March 16, 13

    View Slide