Upgrade to Pro — share decks privately, control downloads, hide ads and more …

DyND - Enabling Complex Analytics Across the Language Barrier, PyData London 2016

Irwin Zaid
May 08, 2016
500

DyND - Enabling Complex Analytics Across the Language Barrier, PyData London 2016

Irwin Zaid

May 08, 2016
Tweet

Transcript

  1. Part 1: The What and Why of DyND Part 2:

    DyND Snippets DYND: ENABLING COMPLEX ANALYTICS ACROSS THE LANGUAGE BARRIER
  2. PYDATA LONDON 2016 What is DyND? DyND aims to be

    a modern library for array-oriented computing
  3. PYDATA LONDON 2016 What is DyND? ‣ A library, not

    a language ‣ Helps do computation with arrays ‣ “Modern” means “uses software engineering practices from today, solving problems people have had for a while”
  4. PYDATA LONDON 2016 What is DyND really? A type system

    and a callable object that operate together with an array container, engineered in C++ and bound to dynamic languages like Python
  5. PYDATA LONDON 2016 What is DyND not? ‣ DyND is

    not NumPy 2.0
 - NumPy is really good at what it was designed to do: broadcasting 
 computation and stride-based manipulation
 - NumPy may not be “replaceable” — Fortran is still pretty widely used ‣ DyND is not a JIT for Python
 - See Numba for that ‣ DyND is not Boost.MultiArray
 - DyND is dynamic C++, templates are a hidden detail
  6. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    Because there are problems that are not being otherwise solved
  7. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: Missing data
 - Values that may or may not be present
 - The masked arrays of numpy.ma are not sufficient, there is overhead related 
 to how the masked array is stored and NumPy is not always consistent 
 with how it treats mask arrays
 - Discussed at length in 2011, remains unsolved in 2016
  8. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: Variable-length strings
 - NumPy can only efficiently represent strings of a predefined length
 - Variable-length strings have to be stored as Python objects
  9. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: Custom types
 - NumPy dtypes are too primitive
 - How does one represent categorical data? Ragged dimensions? GPU data?
 - Cannot define user overloads on ufuncs, e.g. string concatenation
  10. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: NumPy without the “Py”
 - Sometimes we don’t want to use Python
 - Why not have a representation of data that can go between several 
 languages? (R, Julia, Javascript, …)
  11. PYDATA LONDON 2016 Four Traits of DyND ‣ Expressive ‣

    Generic ‣ Extendable ‣ Pluggable
 
 

  12. PYDATA LONDON 2016 Expressive ‣ DyND implements Datashape as its

    type system
 - A structured data description language, http://datashape.pydata.org
 type dimension * dtype
 
 var * int32
 3 * string
 4 * float64 Datashape: Struct type {x: int32, y: string, z: float64} Tabular data var * {x: int32, y: string, z: float64}
  13. PYDATA LONDON 2016 Generic ‣ DyND’s type, callable, and array

    objects are reference- counted smart pointers that dynamically interpret data ‣ Types can be parameterized on other types
 - N * T, var[T], option[T] ‣ Callables can be transformed (in a functional sense) from inner operations to higher-order patterns 
 - Define the innermost operation, then build out the behavior you want with 
 predefined generic patterns
 - nd::functional::elwise([](int x, int y) { return x + y; });
  14. PYDATA LONDON 2016 Pluggable ‣ DyND supports plug-in libraries
 -

    Define custom types and callables (or namespaces thereof) directly ‣ Use nd::set(“my_amazing_callable”, f) for a custom callable or nd::set(“my_amazing_namespace”, {{“my_amazing_callable”, f}, {“my_other_amazing_callable”, g}}) for a custom namespace ‣ Callables are dynamically propagated to Python, entirely removing the need for any user wrapper code
  15. Part 1: The What and Why of DyND Part 2:

    DyND Snippets DYND: ENABLING COMPLEX ANALYTICS ACROSS THE LANGUAGE BARRIER
  16. PYDATA LONDON 2016 Types ‣ Types are instances of simple

    classes
 - Write a class, get a type ‣ Types expose dynamic features to arrays
 - Either properties, like .real or .imag, or behavior, like .conj() ‣ Types can be kinds or patterns
 - Int, Scalar, Fixed, or Any; Fixed * T or (N * T, T) -> T
  17. PYDATA LONDON 2016 Metadata and Data ‣ Array metadata can

    describe data other than strided
 - Offset (tuple or struct), indirect (pointer), ragged (variable-sized dimensions) ‣ Array data is poolable or allocatable in custom memory spaces
 - Variable-sized strings or dimensions; CUDA
  18. PYDATA LONDON 2016 Callable and Functionals ‣ Share functions alongside

    data
 - Callables are first-class objects that can be dynamically published ‣ Enable user-defined functions with generic patterns
 - Functionals like apply, elwise, reduction, multidispatch, outer, neighborhood, 
 and rolling transform one callable into another ‣ Built-in callable are overloadable
 - Users can define +, -, *, /, … for custom types