Upgrade to Pro — share decks privately, control downloads, hide ads and more …

2016 - Jamie Whitacre - Project Jupyter.

PyBay
August 20, 2016

2016 - Jamie Whitacre - Project Jupyter.

Description
An overview of Project Jupyter.

Abstract
Jupyter is an open source, language agnostic, interactive computing platform used in scientific computing and data science that provides multiple tools tailored for different workflows, from traditional terminal-style control to the popular web-based Notebook. The Jupyter Notebook is a web application that allows users to create and share documents that contain live code, equations, visualizations and explanatory text. Jupyter is the evolution of the original ideas in the IPython interactive shell, as we generalized them into a language agnostic protocol that has now been implemented in over 50 separate languages.

One project within the Jupyter ecosystem, JupyterHub, is a multi-user environment for Jupyter Notebooks that runs off a central server and that can be used to serve Notebooks to classes of students, corporate workgroups, or scientific research groups. JupyterHub is the backbone for UC Berkeley’s new Undergraduate Data Science Education Program, an ambitious program that aims to provide every freshman with core knowledge and skills in data science.

In this talk we will discuss and demonstrate the many development activities underway at Project Jupyter, including IPython 5.0, JupyterHub, and JupyterLab, and how these tools are used in data science, industry, scientific research, and education.

Bio
Jamie Whitacre is the technical project manager for Project Jupyter, an open-source scientific computing and data science ecosystem used extensively in academia and industry. Project Jupyter operates out of the Berkeley Institute for Data Science (BIDS) at UC Berkeley. Matthias Bussonnier is a postdoctoral researcher at BIDS and a core developer for Jupyter and IPython.

https://youtu.be/kgSf62XNNxk

PyBay

August 20, 2016
Tweet

More Decks by PyBay

Other Decks in Programming

Transcript

  1. Jupyter’s heart: interactive computing • A dialogue between the human

    and the computer. • Assemble ideas using the computer as playground, as “data microscope”. • Communicate with the computer and with humans.
  2. IPython: CU Boulder, 2001 “an afternoon hack”, or how to

    best procrastinate on a Physics dissertation
  3. 2011: The IPython Notebook • Rich web client • Text

    & math • Code • Results • Share, reproduce.
  4. The Lifecycle of a Scientific Idea (schematically) 1. Individual exploratory

    work 2. Collaborative development 3. Parallel production runs (HPC, cloud, ...) 4. Publication & communication (reproducibly!) 5. Education 6. Go to 1.
  5. The Lifecycle of a Scientific Idea (schematically) 1. Individual exploratory

    work 2. Collaborative development 3. Parallel production runs (HPC, cloud, ...) 4. Publication & communication (reproducibly!) 5. Education 6. Goto 1 We treat this as a single, coherent problem
  6. Data Analysis Workflow 1. Pose a question or problem 2.

    Acquire data 3. Explore the data by writing & running code 4. Prepare and clean the data 5. Complete final analysis & visualize results 6. Write up analysis for publication (blogs, journals, etc.) 7. Share what you’ve done 8. Reproduce what other people have done
  7. JupyterHub: multiuser support • Out of the box • Unix

    accounts • Local single-user notebooks • Customizable • Authentication: OAuth, LDAP, etc. • Subprocess control: Docker, VMs, etc.
  8. Berkeley’s Foundations of Data Science http://data8.org • New curriculum aimed

    at all freshmen at UC Berkeley • Interactive textbook is Jupyter Notebooks • Course deployment is JupyterHub
  9. JupyterHub in Education @ Berkeley https://developer.rackspace.com/blog/deploying-jupyterhub-for-education • Computationally intensive course,

    ~220 students • Fully hosted environment, zero-install, spring 2015. • Homework management and grading (w B. Granger) • Now powers data8.org - Cal’s new Foundations of Data Science, (fall 2015). Jess Hamrick @ Cal K. Kelley Rackspace M. Ragan-Kelley Cal B. Granger Cal Poly
  10. Diffing and Merging Notebooks with nbdime Martin Sandve Alnaes Vidar

    Tonaas Fauske Min Ragan-Kelley Simula Research Lab
  11. nbdime notebook diff & merge • tools for diffing and

    merging notebooks • command-line rendering of diffs (outputs elided) • html rendering of rich diffs (JupyterLab) • git integration https://github.com/jupyter/nbdime nbdiff nbdiff-web git-nbdiffdriver
  12. • nbdime repo: https://github.com/jupyter/nbdime • Watch Min’s nbdime SciPy 2016

    talk: scipy2016.scipy.org or youtube.com/watch?v=tKAmwC8ay8E nbdime notebook diff & merge
  13. nbflow Jess Hamrick • Reproducible, One‐Button Workflows with the Jupyter

    Notebook and SCons • Ideal analysis pipelines • Notebooks can be SCons commands (Python 2 only for now) • Jess’ nbflow repo: http://tinyurl.com/nbflow-example • For Jess’ SciPy 2016 talk: youtube.com/watch?v=Fc2W930NJs8
  14. Jupyter Dashboards & Declarative Widgets Will help users to: •

    Pull interactive web component widgets into a notebook • Arrange widgets in a grid- or report-like layout • Bundle notebooks and widgets for deployment as dashboards • Serve notebook-defined dashboards as standalone web apps
  15. JupyterLab: Building Blocks for Interactive Computing Brian E. Granger, Cal

    Poly Jason Grout, Bloomberg LP Chris Colbert, Continuum Sylvain Corlay, Bloomberg Afshin Darian, Continuum Cameron Oelsen, Cal Poly Fernando Perez, LBNL/ Berkeley Steven Silvester, Continuum David Willmer
  16. JupyterLab • JupyterLab is the natural evolution of the Jupyter

    Notebook user interface • JupyterLab is an IDE: Interactive Development Environment • Flexible user interface for assembling the fundamental building blocks of interactive computing • Modernized JavaScript architecture based on npm/webpack, plugin system, model/view separation • Built using PhosphorJS (http://phosphorjs.github.io/) • Design-driven development process https://github.com/jupyter/jupyterlab
  17. *JupyterLab is a very early developer preview, and is not

    suitable for general usage yet. Features and implementation are subject to change. jupyter/jupyterlab*
  18. *JupyterLab is a very early developer preview, and is not

    suitable for general usage yet. Features and implementation are subject to change.
  19. Prerequisite Jupyter notebook version 4.2 or later. To check your

    notebook version: jupyter notebook —version User installation From the command line: pip install jupyterlab jupyter serverextension enable --py jupyterlab (or conda install -c condaforge jupyterlab)- no server enable step Start up JupyterLab jupyter lab JupyterLab will open automatically in your browser. Getting started
  20. Roadmap • Today (August 2016) JupyterLab is an early preview

    only • Not suggested for general usage: • Visual design, UI, UX, interactions, code all still changing rapidly! • Phases: • 1) Series of alpha/beta releases of JupyterLab available as an alternative UI alongside the classic notebook • 2) JupyterLab 1.0 = Lab notebook component has feature parity with classic notebook • 3) JupyterLab becomes the default UI, but classic notebook is still available • 4) Classic notebook only available as a separate download
  21. How to Contribute • Regular JupyterLab progress meetings on Fridays

    • Follow repo on Github https://github.com/jupyter/jupyterlab • Step-by-step instructions on how to contribute in our documentation! http://jupyter.readthedocs.org
  22. Types of Contributor Guides • Jupyter Developer Guide • IPython

    Development Guide (source: old IPython wiki) • Documentation Guide • Communications Guide
  23. Learn More About Jupyter • jupyter.readthedocs.org • GitHub repos: Jupyter,

    JupyterHub, and IPython • O’Reilly Tutorials • Advanced Jupyter Notebook Deployment by Jonathan Frederic • Jupyter Notebook for Data Science Teams by Jonathan Whitmore • Mailing lists • Conferences: PyData, SciPy, JupyterDays, PyCon, Strata • YouTube • Google it = )
  24. What kind of information? How many people? Complex permissions? Mutable?

    Notebook interactive 1-3 no yes nbviewer static 1-many no, readonly no tmpnb interactive 1-many no, containerized yes, containerized JupyterHub interactive 1-many yes yes Adapted from J. Frederic’s “Advanced Jupyter Notebook Deployment” published by Infinite Skills (O’Reilly), 2016