$30 off During Our Annual Pro Sale. View Details »

DyND - Enabling Complex Analytics Across the Language Barrier, PyData London 2016

Irwin Zaid
May 08, 2016
470

DyND - Enabling Complex Analytics Across the Language Barrier, PyData London 2016

Irwin Zaid

May 08, 2016
Tweet

Transcript

  1. DYND: ENABLING COMPLEX ANALYTICS
    ACROSS THE LANGUAGE BARRIER
    IRWIN ZAID, CONTINUUM ANALYTICS

    View Slide

  2. Part 1: The What and Why of DyND
    Part 2: DyND Snippets
    DYND: ENABLING COMPLEX ANALYTICS
    ACROSS THE LANGUAGE BARRIER

    View Slide

  3. PYDATA LONDON 2016
    What is DyND?
    DyND aims to be a modern library for array-oriented
    computing

    View Slide

  4. PYDATA LONDON 2016
    What is DyND?
    ‣ A library, not a language
    ‣ Helps do computation with arrays
    ‣ “Modern” means “uses software engineering practices
    from today, solving problems people have had for a
    while”

    View Slide

  5. PYDATA LONDON 2016
    What is DyND really?
    A type system and a callable object that operate
    together with an array container, engineered in C++
    and bound to dynamic languages like Python

    View Slide

  6. PYDATA LONDON 2016
    What is DyND not?
    ‣ DyND is not NumPy 2.0

    - NumPy is really good at what it was designed to do: broadcasting 

    computation and stride-based manipulation

    - NumPy may not be “replaceable” — Fortran is still pretty widely used
    ‣ DyND is not a JIT for Python

    - See Numba for that
    ‣ DyND is not Boost.MultiArray

    - DyND is dynamic C++, templates are a hidden detail

    View Slide

  7. PYDATA LONDON 2016
    It’s 2016. Why write another array library?
    Because there are problems that are not being
    otherwise solved

    View Slide

  8. PYDATA LONDON 2016
    It’s 2016. Why write another array library?
    ‣ Example: Missing data

    - Values that may or may not be present

    - The masked arrays of numpy.ma are not sufficient, there is overhead related 

    to how the masked array is stored and NumPy is not always consistent 

    with how it treats mask arrays

    - Discussed at length in 2011, remains unsolved in 2016

    View Slide

  9. PYDATA LONDON 2016
    It’s 2016. Why write another array library?
    ‣ Example: Variable-length strings

    - NumPy can only efficiently represent strings of a predefined length

    - Variable-length strings have to be stored as Python objects

    View Slide

  10. PYDATA LONDON 2016
    It’s 2016. Why write another array library?
    ‣ Example: Custom types

    - NumPy dtypes are too primitive

    - How does one represent categorical data? Ragged dimensions? GPU data?

    - Cannot define user overloads on ufuncs, e.g. string concatenation

    View Slide

  11. PYDATA LONDON 2016
    It’s 2016. Why write another array library?
    ‣ Example: NumPy without the “Py”

    - Sometimes we don’t want to use Python

    - Why not have a representation of data that can go between several 

    languages? (R, Julia, Javascript, …)

    View Slide

  12. PYDATA LONDON 2016
    Four Traits of DyND
    ‣ Expressive
    ‣ Generic
    ‣ Extendable
    ‣ Pluggable



    View Slide

  13. PYDATA LONDON 2016
    Expressive
    ‣ DyND implements Datashape as its type system

    - A structured data description language, http://datashape.pydata.org

    type
    dimension * dtype


    var * int32

    3 * string

    4 * float64
    Datashape:
    Struct type
    {x: int32, y: string, z: float64}
    Tabular data
    var * {x: int32, y: string, z: float64}

    View Slide

  14. PYDATA LONDON 2016
    Generic
    ‣ DyND’s type, callable, and array objects are reference-
    counted smart pointers that dynamically interpret data
    ‣ Types can be parameterized on other types

    - N * T, var[T], option[T]
    ‣ Callables can be transformed (in a functional sense) from
    inner operations to higher-order patterns 

    - Define the innermost operation, then build out the behavior you want with 

    predefined generic patterns

    - nd::functional::elwise([](int x, int y) { return x + y; });

    View Slide

  15. PYDATA LONDON 2016
    Extendable
    ‣ Types and callables are first-class objects that users should
    create directly

    View Slide

  16. PYDATA LONDON 2016
    Extendable
    ‣ Types and callables are first-class objects that users should
    create directly

    View Slide

  17. PYDATA LONDON 2016
    Pluggable
    ‣ DyND supports plug-in libraries

    - Define custom types and callables (or namespaces thereof) directly
    ‣ Use nd::set(“my_amazing_callable”, f) for a
    custom callable or nd::set(“my_amazing_namespace”,
    {{“my_amazing_callable”, f},
    {“my_other_amazing_callable”, g}}) for a custom
    namespace
    ‣ Callables are dynamically propagated to Python, entirely
    removing the need for any user wrapper code

    View Slide

  18. Part 1: The What and Why of DyND
    Part 2: DyND Snippets
    DYND: ENABLING COMPLEX ANALYTICS
    ACROSS THE LANGUAGE BARRIER

    View Slide

  19. PYDATA LONDON 2016
    Types
    ‣ Types are instances of simple classes

    - Write a class, get a type
    ‣ Types expose dynamic features to arrays

    - Either properties, like .real or .imag, or behavior, like .conj()
    ‣ Types can be kinds or patterns

    - Int, Scalar, Fixed, or Any; Fixed * T or (N * T, T) -> T

    View Slide

  20. PYDATA LONDON 2016
    Metadata and Data
    ‣ Array metadata can describe data other than strided

    - Offset (tuple or struct), indirect (pointer), ragged (variable-sized dimensions)
    ‣ Array data is poolable or allocatable in custom memory
    spaces

    - Variable-sized strings or dimensions; CUDA

    View Slide

  21. PYDATA LONDON 2016
    Fundamental Types

    View Slide

  22. PYDATA LONDON 2016
    Dimension Types

    View Slide

  23. PYDATA LONDON 2016
    Aggregate Types

    View Slide

  24. PYDATA LONDON 2016
    Option Type

    View Slide

  25. PYDATA LONDON 2016
    Symbolic Types

    View Slide

  26. PYDATA LONDON 2016
    Callable and Functionals
    ‣ Share functions alongside data

    - Callables are first-class objects that can be dynamically published
    ‣ Enable user-defined functions with generic patterns

    - Functionals like apply, elwise, reduction, multidispatch, outer, neighborhood, 

    and rolling transform one callable into another
    ‣ Built-in callable are overloadable

    - Users can define +, -, *, /, … for custom types

    View Slide

  27. PYDATA LONDON 2016
    Elementwise Functional

    View Slide

  28. PYDATA LONDON 2016
    Reduction Functional

    View Slide

  29. PYDATA LONDON 2016
    Multidispatch Functional

    View Slide

  30. PYDATA LONDON 2016
    Option Operations

    View Slide

  31. PYDATA LONDON 2016
    JSON Processing

    View Slide

  32. PYDATA LONDON 2016
    Thanks to…
    Mark Wiebe Ian Henriksen Stefan Krah
    Irwin Zaid

    View Slide

  33. PYDATA LONDON 2016
    Thanks to…

    View Slide

  34. PYDATA LONDON 2016
    Get DyND!
    conda install dynd-python -c dynd/channel/dev

    View Slide