Slide 1

Slide 1 text

IPython Open Source Academia Wrapup IPython A modern vision of interactive computing Fernando Pérez http://fperez.org, @fperez_org [email protected] Henry H. Wheeler Jr. Brain Imaging Center, UC Berkeley PyData 2013, Silicon Valley March 20, 2013

Slide 2

Slide 2 text

IPython Open Source Academia Wrapup Outline 1 IPython: Interactive Python 2 The Life of an Open Source Project 3 Academia vs Open Source 4 Wrapup FP (UC Berkeley) IPython 3/20/13 2 / 34

Slide 3

Slide 3 text

In the beginning, IBM said... Let there be FORTRAN

Slide 4

Slide 4 text

In the beginning, IBM said... Let there be FORTRAN

Slide 5

Slide 5 text

Beyond (Floating Point) Number Crunching Hardware floating point Arbitrary precision integers Rationals Interval arithmetic Symbolic manipulation FORTRAN Extended precision floating point Text processing Databases Graphical user interfaces Web interfaces Hardware control Multi-language integration Data formats: HDF5, XML, ...

Slide 6

Slide 6 text

The purpose of computing is insight, not numbers. Richard Hamming, 1962

Slide 7

Slide 7 text

IPython Open Source Academia Wrapup The computer as microscope Exploratory: Problem’s definition evolves as we understand it. No ‘requirements’ to build an application against. Mathematica, Maple, Matlab, IDL, etc. All have an interactive environment. Applications Languages FP (UC Berkeley) IPython 3/20/13 6 / 34

Slide 8

Slide 8 text

IPython: part of a Rich Ecosystem IPython NetworkX

Slide 9

Slide 9 text

IPython Open Source Academia Wrapup The Lifecycle of a Scientific Idea (schematically) 1 Individual exploratory work 2 Collaborative development 3 Parallel production runs (HPC, cloud, ...) 4 Publication (with reproducible results!) 5 Education 6 Goto 1. The Problem with most tools Barriers and discontinuities in workflow in between all the steps FP (UC Berkeley) IPython 3/20/13 8 / 34

Slide 10

Slide 10 text

IPython Open Source Academia Wrapup The Lifecycle of a Scientific Idea (schematically) 1 Individual exploratory work 2 Collaborative development 3 Parallel production runs (HPC, cloud, ...) 4 Publication (with reproducible results!) 5 Education 6 Goto 1. The Problem with most tools Barriers and discontinuities in workflow in between all the steps FP (UC Berkeley) IPython 3/20/13 8 / 34

Slide 11

Slide 11 text

IPython’s goal: Fluid transitions in all these steps

Slide 12

Slide 12 text

Demo

Slide 13

Slide 13 text

IPython Open Source Academia Wrapup Pillar #1: An architecture for interactive computing FP (UC Berkeley) IPython 3/20/13 11 / 34

Slide 14

Slide 14 text

IPython Open Source Academia Wrapup Pillar #2: the Notebook Format JSON but version control-friendly Easy for machine processing, fixable by hand if need be. Lots of hooks for metadata Not Python-specific (Ruby, JS notebooks exist, R, Julia planned) Produce Markdown, reST, L A TEX, HTML, etc... An open format for sharing, publishing and archiving executable computational work FP (UC Berkeley) IPython 3/20/13 12 / 34

Slide 15

Slide 15 text

IPython Open Source Academia Wrapup Outline 1 IPython: Interactive Python 2 The Life of an Open Source Project 3 Academia vs Open Source 4 Wrapup FP (UC Berkeley) IPython 3/20/13 13 / 34

Slide 16

Slide 16 text

Documented protocols and formats: a growing ecosystem around IPython

Slide 17

Slide 17 text

An Emacs Notebook Client! Takafumi Arakaki http://tkf.github.com/emacs-ipython-notebook

Slide 18

Slide 18 text

Microsoft Visual Studio 2010 integrated console Dino Viehland and Shahrokh Mortazavi (Microsoft) http://pytools.codeplex.com

Slide 19

Slide 19 text

A vim client to control an IPython kernel/console Paul Ivanov (Berkeley) https://github.com/ivanov/vim-ipython

Slide 20

Slide 20 text

Notebooks on Windows Azure Cloud Shahrokh Mortazavi (Microsoft), B.G., F.P. http://bit.ly/JQeojD

Slide 21

Slide 21 text

Star Cluster: IPython parallel+Notebook on Amazon EC2 Justin Riley (MIT) http://web.mit.edu/star/cluster

Slide 22

Slide 22 text

NBViewer: easy notebook sharing Matthias Bussonnier http://nbviewer.ipython.org

Slide 23

Slide 23 text

Other projects using IPython Scientific EPD: Enthought Python Distribution. Anaconda: Continuum Python Distribution. Sage: open source mathematics. PyRAF: Space Telescope Science Institute CASA: Nat. Radio Astronomy Observatory Ganga: CERN PyMAD: neutron spectrom., Laue Langevin Sardana: European Synchrotron Radiation ASCEND: eng. modeling (Carnegie Mellon). JModelica: dynamical systems. DASH: Denver Aerosol Sources and Health. Trilinos: Sandia National Lab. DoD: baseline configuration. NiPype: computational pipelines, MIT. PyIMSL Studio, by Visual Numerics. ... Web/Other Visual Studio 2010: MS. Django. Turbo Gears. Pylons web framework Zope and Plone CMS. Axon Shell, BBC Kamaelia. Schevo database. Pitz: distributed task/bug tracking. iVR (interactive Virtual Reality). Movable Python (portable Python environment). ...

Slide 24

Slide 24 text

How did we get here? A brief history of IPython October 2001: “just a little afternoon hack” My own $PYTHONSTARTUP: ipython-0.0.1.py: 259 lines. In [N]: prompts and _N results cache. IPP (Interactive Python Prompt) by Janko Hauser (Oceanography) LazyPython by Nathan Gray (CS Caltech) 2002: Ignore John Hunter’s Gnuplot support patches ... let there be matplotlib (actually finish my PhD!) 2005: Brian Granger, Min Ragan-Kelley First parallel tools, Twisted-based 2005-2008: Ville Vainio, Gaël Varoquaux, Laurent Dufréchou Core maintenance, Wx integration.

Slide 25

Slide 25 text

Summer 2009: NIH-funded cleanup by Brian. March 2010: prototype networked shell using ØMQ 2-day sprint with Brian Enthought funds Qt console. Min ports parallel code to ØMQ Core architecture ready, foundation for Notebook Fall 2010 James Gao at Berkeley builds (5th!) Notebook Prototype. Summer 2011 Brian rebuids James’ prototype into today’s Notebook.

Slide 26

Slide 26 text

An important plot http://www.ohloh.net/p/ipython

Slide 27

Slide 27 text

(Incomplete) Cast of Characters Brian Granger - Physics, Cal State San Luis Obispo Min Ragan-Kelley - Nuclear Engineering, UC Berkeley Matthias Bussonnier - Physics, Institut Curie, Paris Brad Froehle - Mathematics, UC Berkeley Paul Ivanov - Neuroscience, UC Berkeley. Robert Kern - Enthought Thomas Kluyver - Biology, U. Sheffield Jonathan March- Enthought Evan Patterson - Physics, Caltech/Enthought Jörgen Stenarson - Elect. Engineering, Sweden. Stefan van der Walt - UC Berkeley John Hunter - TradeLink Securities, Chicago. Prabhu Ramachandran - Aerospace Engineering, IIT Bombay. Satra Ghosh- MIT Neuroscience Gaël Varoquaux - Neurospin (Orsay, France) Ville Vainio - CS, Tampere University of Technology, Finland Barry Wark - Neuroscience, U. Washington. Ondrej Certik - Physics, U Nevada Reno Darren Dale - Cornell Justin Riley - MIT Mark Voorhies - UC San Francisco Nicholas Rougier - INRIA Nancy Grand Est Thomas Spura - Fedora project Many more! (~220 commit authors)

Slide 28

Slide 28 text

IPython Open Source Academia Wrapup Outline 1 IPython: Interactive Python 2 The Life of an Open Source Project 3 Academia vs Open Source 4 Wrapup FP (UC Berkeley) IPython 3/20/13 26 / 34

Slide 29

Slide 29 text

Support at the edges of academic funding Enthought, Austin, TX: Lots! Microsoft: WinHPC support, Visual Studio integration, Azure (thanks to Shahrokh Mortazavi). DoD/DRC Inc: funding through Sept. 2012 (thanks to Jose Unpingco and Chris Keees). NIH: via NiPy grant NSF: via Sage compmath grant Google: summer of code 2005, 2010. Tech-X Corp., Boulder, CO: Parallel/notebook (previous versions) Recent stable funding (2 years, 7 people, J. Taylor):

Slide 30

Slide 30 text

Open Source: skills, tools and practices we need! A culture where things get done. Wildly collaborative Reproducible by necessity Version control, testing, documentation, public peer review, etc.

Slide 31

Slide 31 text

Reward Structure in academia: we punish all of the above Departmental boundaries: interdisciplinary work is a great buzzword, not such a great career path. Computational heritage is built on code not on citations Continuous evolution vs publication milestones Authorship in collaborative works vs the first-author paper. Scholarship and intellectual effort embedded in the code.

Slide 32

Slide 32 text

NumFOCUS: Open Code, Better Science Promote the health of our open source scientific computing ecosystem Support the development of multiple projects. Community-created and driven. A neutral ground for industry, academia and government to support scientific open source. 501(c)3 - donations are tax-exempt in the USA http://numfocus.org

Slide 33

Slide 33 text

IPython Open Source Academia Wrapup Outline 1 IPython: Interactive Python 2 The Life of an Open Source Project 3 Academia vs Open Source 4 Wrapup FP (UC Berkeley) IPython 3/20/13 31 / 34

Slide 34

Slide 34 text

The future of IPython: a 2-year roadmap Spring/summer 2013: IPython 1.0 Notebook document management (nbconvert) JavaScript internals cleanup Fall 2013 Interactive JavaScript API With callbacks to remote kernels. 2014 Multiuser server Simple to deploy Trusted (shell OK) Unix users in a lab, group, class, etc. https://github.com/ipython/ipython/wiki/Roadmap:-IPython

Slide 35

Slide 35 text

In closing: our vision of scientific computing Build on the right abstractions The kernel: unify interactive and parallel computing → you only have one brain! A single protocol: many kernels, many clients. Communications and logging the protocol is the notebook file format. Insight and communication (Hamming) “Literate computing” vs “literate programming”. Build a community and an ecosystem “How to Scale a Code in the Human Dimension”, M. Turk, http://arxiv.org/abs/1301.7064.

Slide 36

Slide 36 text

In closing: our vision of scientific computing Build on the right abstractions The kernel: unify interactive and parallel computing → you only have one brain! A single protocol: many kernels, many clients. Communications and logging the protocol is the notebook file format. Insight and communication (Hamming) “Literate computing” vs “literate programming”. Build a community and an ecosystem “How to Scale a Code in the Human Dimension”, M. Turk, http://arxiv.org/abs/1301.7064.

Slide 37

Slide 37 text

In closing: our vision of scientific computing Build on the right abstractions The kernel: unify interactive and parallel computing → you only have one brain! A single protocol: many kernels, many clients. Communications and logging the protocol is the notebook file format. Insight and communication (Hamming) “Literate computing” vs “literate programming”. Build a community and an ecosystem “How to Scale a Code in the Human Dimension”, M. Turk, http://arxiv.org/abs/1301.7064.

Slide 38

Slide 38 text

John D. Hunter, 1968-2012: http://matplotlib.org Memorial fund: http://numfocus.org/johnhunter