DyND - Enabling Complex Analytics Across the Language Barrier, PyData London 2016

209f54f0b8edd93e50c7bf9870037ac5?s=47 Irwin Zaid
May 08, 2016
400

DyND - Enabling Complex Analytics Across the Language Barrier, PyData London 2016

209f54f0b8edd93e50c7bf9870037ac5?s=128

Irwin Zaid

May 08, 2016
Tweet

Transcript

  1. DYND: ENABLING COMPLEX ANALYTICS ACROSS THE LANGUAGE BARRIER IRWIN ZAID,

    CONTINUUM ANALYTICS
  2. Part 1: The What and Why of DyND Part 2:

    DyND Snippets DYND: ENABLING COMPLEX ANALYTICS ACROSS THE LANGUAGE BARRIER
  3. PYDATA LONDON 2016 What is DyND? DyND aims to be

    a modern library for array-oriented computing
  4. PYDATA LONDON 2016 What is DyND? ‣ A library, not

    a language ‣ Helps do computation with arrays ‣ “Modern” means “uses software engineering practices from today, solving problems people have had for a while”
  5. PYDATA LONDON 2016 What is DyND really? A type system

    and a callable object that operate together with an array container, engineered in C++ and bound to dynamic languages like Python
  6. PYDATA LONDON 2016 What is DyND not? ‣ DyND is

    not NumPy 2.0
 - NumPy is really good at what it was designed to do: broadcasting 
 computation and stride-based manipulation
 - NumPy may not be “replaceable” — Fortran is still pretty widely used ‣ DyND is not a JIT for Python
 - See Numba for that ‣ DyND is not Boost.MultiArray
 - DyND is dynamic C++, templates are a hidden detail
  7. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    Because there are problems that are not being otherwise solved
  8. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: Missing data
 - Values that may or may not be present
 - The masked arrays of numpy.ma are not sufficient, there is overhead related 
 to how the masked array is stored and NumPy is not always consistent 
 with how it treats mask arrays
 - Discussed at length in 2011, remains unsolved in 2016
  9. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: Variable-length strings
 - NumPy can only efficiently represent strings of a predefined length
 - Variable-length strings have to be stored as Python objects
  10. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: Custom types
 - NumPy dtypes are too primitive
 - How does one represent categorical data? Ragged dimensions? GPU data?
 - Cannot define user overloads on ufuncs, e.g. string concatenation
  11. PYDATA LONDON 2016 It’s 2016. Why write another array library?

    ‣ Example: NumPy without the “Py”
 - Sometimes we don’t want to use Python
 - Why not have a representation of data that can go between several 
 languages? (R, Julia, Javascript, …)
  12. PYDATA LONDON 2016 Four Traits of DyND ‣ Expressive ‣

    Generic ‣ Extendable ‣ Pluggable
 
 

  13. PYDATA LONDON 2016 Expressive ‣ DyND implements Datashape as its

    type system
 - A structured data description language, http://datashape.pydata.org
 type dimension * dtype
 
 var * int32
 3 * string
 4 * float64 Datashape: Struct type {x: int32, y: string, z: float64} Tabular data var * {x: int32, y: string, z: float64}
  14. PYDATA LONDON 2016 Generic ‣ DyND’s type, callable, and array

    objects are reference- counted smart pointers that dynamically interpret data ‣ Types can be parameterized on other types
 - N * T, var[T], option[T] ‣ Callables can be transformed (in a functional sense) from inner operations to higher-order patterns 
 - Define the innermost operation, then build out the behavior you want with 
 predefined generic patterns
 - nd::functional::elwise([](int x, int y) { return x + y; });
  15. PYDATA LONDON 2016 Extendable ‣ Types and callables are first-class

    objects that users should create directly
  16. PYDATA LONDON 2016 Extendable ‣ Types and callables are first-class

    objects that users should create directly
  17. PYDATA LONDON 2016 Pluggable ‣ DyND supports plug-in libraries
 -

    Define custom types and callables (or namespaces thereof) directly ‣ Use nd::set(“my_amazing_callable”, f) for a custom callable or nd::set(“my_amazing_namespace”, {{“my_amazing_callable”, f}, {“my_other_amazing_callable”, g}}) for a custom namespace ‣ Callables are dynamically propagated to Python, entirely removing the need for any user wrapper code
  18. Part 1: The What and Why of DyND Part 2:

    DyND Snippets DYND: ENABLING COMPLEX ANALYTICS ACROSS THE LANGUAGE BARRIER
  19. PYDATA LONDON 2016 Types ‣ Types are instances of simple

    classes
 - Write a class, get a type ‣ Types expose dynamic features to arrays
 - Either properties, like .real or .imag, or behavior, like .conj() ‣ Types can be kinds or patterns
 - Int, Scalar, Fixed, or Any; Fixed * T or (N * T, T) -> T
  20. PYDATA LONDON 2016 Metadata and Data ‣ Array metadata can

    describe data other than strided
 - Offset (tuple or struct), indirect (pointer), ragged (variable-sized dimensions) ‣ Array data is poolable or allocatable in custom memory spaces
 - Variable-sized strings or dimensions; CUDA
  21. PYDATA LONDON 2016 Fundamental Types

  22. PYDATA LONDON 2016 Dimension Types

  23. PYDATA LONDON 2016 Aggregate Types

  24. PYDATA LONDON 2016 Option Type

  25. PYDATA LONDON 2016 Symbolic Types

  26. PYDATA LONDON 2016 Callable and Functionals ‣ Share functions alongside

    data
 - Callables are first-class objects that can be dynamically published ‣ Enable user-defined functions with generic patterns
 - Functionals like apply, elwise, reduction, multidispatch, outer, neighborhood, 
 and rolling transform one callable into another ‣ Built-in callable are overloadable
 - Users can define +, -, *, /, … for custom types
  27. PYDATA LONDON 2016 Elementwise Functional

  28. PYDATA LONDON 2016 Reduction Functional

  29. PYDATA LONDON 2016 Multidispatch Functional

  30. PYDATA LONDON 2016 Option Operations

  31. PYDATA LONDON 2016 JSON Processing

  32. PYDATA LONDON 2016 Thanks to… Mark Wiebe Ian Henriksen Stefan

    Krah Irwin Zaid
  33. PYDATA LONDON 2016 Thanks to…

  34. PYDATA LONDON 2016 Get DyND! conda install dynd-python -c dynd/channel/dev