JupyterHub for Interactive Data Science Collaboration

CineGrid 2015 Conference

Carol Willing

December 10, 2015

  1. CAROL WILLING ➤ Python Software Foundation, Director ➤ Project Jupyter,

    Contributor ➤ Fab Lab San Diego, Geek in Residence
  2. The Notebook: “Literate Computing” Computational Narratives ❖ Computers deal with

    code and data. ❖ Humans deal with narratives that communicate. Literate Computing (not Literate Programming) narratives anchored in a live computation, that communicate a story based on data and results. Cf: Mathematica, Maple, MuPad, Sage…
  3. “Project Jupyter serves not only the academic and scientific communities

    but also a much broader constituency of data scientists in research, education, industry and journalism… - Fernando Pérez UC Berkeley
  4. “…we see uses of our tools that range from high

    school education in programming to the nation’s supercomputing facilities and the leaders of the tech industry. - Fernando Pérez UC Berkeley
  5. “ More than a million people are currently using Jupyter

    for everything from… -Prof. Brian Granger Cal Poly
  6. “…analyzing massive gene sequencing datasets to processing images from the

    Hubble Space Telescope and developing models of financial markets. -Prof. Brian Granger Cal Poly
  7. “We are excited by the potential of Project Jupyter to

    reach even wider audiences and to contribute to increased cross-disciplinary collaboration in the sciences. -Betsy Fader Helmsley Charitable Trust
  8. “Jupyter Notebook… will enable data exploration, visualization, and analysis in

    a way that encourages sound science and speeds progress. -Chris Mentzel The Gordon and Betty Moore Foundation
  9. The Lifecycle of a Scientific Idea (schematically) 1. Individual exploratory

    work 2. Collaborative development 3. Parallel production runs (HPC, cloud, ...) 4. Publication & communication (reproducibly!) 5. Education 6. Goto 1.
  10. Executable books ❖ Springer hardcover book ❖ Chapters: IPython Notebooks

    ❖ Posted as a blog entry ❖ All available as a Github repo Python for Signal Processing, by José Unpingco
  11. A collaborative MOOC on OpenEdX http://lorenabarba.com/news/announcing-practical-numerical-methods-with-python-mooc ❖ Lorena Barba at

    George Washington University, USA. ❖ Ian Hawke at Southampton, UK ❖ Carlos Jerez at Pontifical Catholic University of Chile. ❖ All materials on Gihtub.
  12. Shreyas Cholia & ! Oliver Ruebel! NERSC Data & Analytics

    Services Group! Jupyterhub Day, July 17 2015 Jupyterhub at NERSC and OpenMSI
  13. NERSC is the Production HPC & Data Facility for DOE

    Office of Science Research Bio$Energy,$$Environment$ Compu2ng$ Materials,$Chemistry,$$ Geophysics$ Par2cle$Physics,$ Astrophysics$ Largest$funder$of$physical$ science$research$in$U.S.$$ Nuclear$Physics$ Fusion$Energy,$ Plasma$Physics$ D$2$D$
  14. ART

  15. JupyterHub: multiuser support ❖ Out of the box ❖ Unix

    accounts ❖ Local single-user notebooks ❖ Customizable ❖ Authentication: OAuth, LDAP, etc. ❖ Subprocess control: Docker, VMs, etc.
  16. JupyterHub in Education @ Berkeley https://developer.rackspace.com/blog/deploying-jupyterhub-for-education ❖ Computationally intensive course,

    ~220 students ❖ Fully hosted environment, zero-install ❖ Homework management and grading (w B. Granger) Jess Hamrick @ Cal K. Kelley Rackspace M. Ragan-Kelley Cal B. Granger Cal Poly
