Brief History Person Package Year Jim Fulton Matrix Object in Python 1994 Jim Hugunin Numeric 1995 Perry Greenfield, Rick White, Todd Miller Numarray 2001 Travis Oliphant NumPy 2005
1999 : Early SciPy emerges Discussions on the matrix-sig from 1997 to 1999 wanting a complete data analysis environment: Paul Barrett, Joe Harrington, Perry Greenfield, Paul Dubois, Konrad Hinsen, and others. Activity in 1998, led to increased interest in 1999.
! In response on 15 Jan, 1999, I posted to matrix-sig a list of routines I felt needed to be present and began wrapping / writing in earnest. On 6 April 1999, I announced I would be creating this uber-package which eventually became SciPy Gaussian quadrature 5 Jan 1999 cephes 1.0 30 Jan 1999 sigtools 0.40 23 Feb 1999 Numeric docs March 1999 cephes 1.1 9 Mar 1999 multipack 0.3 13 Apr 1999 Helper routines 14 Apr 1999 multipack 0.6 (leastsq, ode, fsolve, quad) 29 Apr 1999 sparse plan described 30 May 1999 multipack 0.7 14 Jun 1999 SparsePy 0.1 5 Nov 1999 cephes 1.2 (vectorize) 29 Dec 1999 Plotting??
Community effort • Chuck Harris • Pauli Virtanen • David Cournapeau • Stefan van der Walt • Dag Sverre Seljebotn • Robert Kern • Warren Weckesser • Ralf Gommers • Mark Wiebe • Nathaniel Smith
Zen of NumPy • strided is better than scattered • contiguous is better than strided • descriptive is better than imperative • array-oriented is better than object-oriented • broadcasting is a great idea • vectorized is better than an explicit loop • unless it’s too complicated --- then use Cython/Numba • think in higher dimensions
Conway’s game of Life • Dead cell with exactly 3 live neighbors will come to life • A live cell with 2 or 3 neighbors will survive • With too few or too many neighbors, the cell dies
APL : the first array-oriented language • Appeared in 1964 • Originated by Ken Iverson • Direct descendants (J, K, Matlab) are still used heavily and people pay a lot of money for them • NumPy is a descendent APL J K Matlab Numeric NumPy
Example of Object-defined Dtype @np.dtype class Stock(np.DType): symbol = np.Str(4) open = np.Int(2) close = np.Int(2) high = np.Int(2) low = np.Int(2) @np.Int(2) def mid(self): return (self.high + self.low) / 2.0
Improvements needed • Ufunc improvements • Generalized ufuncs support more than just contiguous arrays • Specification of ufuncs in Python • Move most dtype “array functions” to ufuncs • Unify error-handling for all computations • Allow lazy-evaluation and remote computation --- streaming and generator data • Structured and string dtype ufuncs • Multi-core and GPU optimized ufuncs • Group-by reduction
Data URLs • Variables in script are global addresses (DATA URLs). All the world’s data you can see via web can be in used as part of an algorithm by referencing it as a part of an array.
• Dynamically interpret bytes as data-type
• Scheduler will push code based on data-type to the data instead of pulling data to the code.
NumFOCUS • Mission • To initiate and support educational programs furthering the use of open source software in science. • To promote the use of high-level languages and open source in science, engineering, and math research • To encourage reproducible scientific research • To provide infrastructure and support for open source projects for technical computing
NumFOCUS • Activites • Sponsor sprints and conferences • Provide scholarships and grants for people using these tools • Pay for documentation development and basic course development • Fund continuous integration and build systems • Work with domain-specific organizations • Raise funds from industries using Python and NumPy
NumFOCUS • Directors • Perry Greenfield • John Hunter • Jarrod Millman • Travis Oliphant • Fernando Perez • Members • Basically people who donate for now. In time, a body that elects directors.