Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Project Jupyter: Architecture and Evolution of an Open Platform for Modern Data Science

Project Jupyter: Architecture and Evolution of an Open Platform for Modern Data Science

Lecture presented at BIDS, the Berkeley Institute for Data Science on April 17, 2018. Description and video of the talk:

https://bids.berkeley.edu/resources/videos/project-jupyter-architecture-and-evolution-open-platform-modern-data-science

Code for Polyglot DS demo: http://nbviewer.jupyter.org/gist/fperez/5b49246af4e340c37549265a90894ce6/polyglot-ds.ipynb

Fernando Perez

April 17, 2018
Tweet

More Decks by Fernando Perez

Other Decks in Technology

Transcript

  1. A few bits about me Medellín, Colombia University of Colorado,

    Boulder Physics Applied Math Computation
  2. Statistics & me: then and now If your result needs

    a statistician then you should design a better experiment (prob. mis-attributed) E. Rutherford PhD: Lattice QCD Simulations
  3. Why? ❖ Ethical: openness as fairness ❖ Human/social: openness fosters

    collaboration. ❖ Epistemological: proprietary science is an oxymoron. ❖ Technical: Python was cool :)
  4. Python - The Beginning the most important lesson I learned

    was about sharing – Guido van Rossum http://neopythonic.blogspot.com/2016/04/kings-day-speech.html Slide credit: C. Willing
  5. Designed for Learning In reality, programming languages are how programmers

    express and communicate ideas — and the audience for those ideas is other programmers, not computers. http://neopythonic.blogspot.com/2016/04/kings-day-speech.html – Guido van Rossum Slide credit: C. Willing
  6. IPython: Interactive Python, 2001 A humble start: IPython 0.0.1, 259

    LOC “Just an afternoon hack” https://gist.github.com/fperez/1579699
  7. Team today: where all the credit goes Plus ~ 1500

    more Open source contributors!
  8. The IPython/Jupyter Notebook ❖ Rich web client ❖ Text &

    math ❖ Code ❖ Results ❖ Share, reproduce.
  9. Core ideas of the web: HTTP & HTML HTML: format

    to represent content HyperText Markup Language HTTP: protocol to connect clients and servers HyperText Transport Protocol Image credit: eviltester.com
  10. Jupyter Protocol web-age capture of the process of interactive computing

    any mime-type output ❖ text ❖ svg, png, jpeg ❖ latex, pdf ❖ html, javascript ❖ interactive widgets
  11. Jupyter Protocol is language agnostic u a l j i

    ~100 different kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
  12. JupyterLab: a grand unified theory of Jupyter Huge Team Effort!

    C. Colbert, S. Corlay, A. Darian, B. Granger, J. Grout, P. Ivanov, I. Rose, S. Silvester, C. Willing, J. Zosa-Forde …
  13. Reproducible Research An article about computational science in a scientific

    publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. Buckheit and Donoho, WaveLab and Reproducible Research, 1995
  14. What does this mean for science + education? ❖ Can

    utilize… ❖ ...shared hardware/compute for running code ❖ ...shared data storage for big datasets ❖ ...shared environments for doing work ❖ ...shared workflows, ideas, and results
  15. A long time ago in a galaxy far, far away…

    Rµ⌫ 1 2 R gµ⌫ + ⇤gµ⌫ = 8⇡G c4 Tµ⌫ <latexit sha1_base64="YC1B4aBScjwbH91PFK5cn2nrvCY=">AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=</latexit> <latexit sha1_base64="YC1B4aBScjwbH91PFK5cn2nrvCY=">AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=</latexit> <latexit sha1_base64="YC1B4aBScjwbH91PFK5cn2nrvCY=">AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=</latexit> <latexit sha1_base64="YC1B4aBScjwbH91PFK5cn2nrvCY=">AAACXHicbVFLSysxGM2Meq9WvbcquHETLIJwtcyIoC4EwYUuXKi0KjS9JZPJ1GAmMyTfCEPIn3SnG/+K6WNRHx8EDufBl5wkpRQGoug1COfmF379XlxqLK+s/vnbXFu/M0WlGe+yQhb6IaGGS6F4FwRI/lBqTvNE8vvk6Xyk3z9zbUShOlCXvJ/ToRKZYBQ8NWiCJakwpaS1gVpyfDuwJK8wURV2+5ZApinDNnb2wLlbsjeckf+RK78npXiWPLVkEjkmpcAXzrL/9tA515nxuEGzFbWj8eDvIJ6CFprO9aD5QtKCVTlXwCQ1phdHJfQt1SCY5K5BKsNLyp7okPc8VDTnpm/H7Ti845kUZ4X2RwEes7MJS3Nj6jzxzpzCo/mqjciftF4F2XHfClVWwBWbLMoqiaHAo6pxKjRnIGsPKNPC3xWzR+rbAf8hDV9C/PXJ30H3oH3Sjm8OW2edaRuLaAtto10UoyN0hi7RNeoiht4CFCwFjeA9XAiXw9WJNQymmQ30acLNDwCLtUM=</latexit> Einstein’s Field Equations of General Relativity Annalen der Physik, 1916
  16. Two identical detectors: Hanford, WA and Livingston, LA LIGO: a

    feat of science & engineering Detection problem: • ~ 1/1000 proton over 4 km. • Sensitivity ~ 1e-21 • Milky Way: 1e+21m across!
  17. Binder: reproducible, executable scholarship from averaging ~150 people per week

    to averaging ~2,900 people per week Berkeley: Yuvi Panda, Chris Holdgraf Cal Poly: Carol Willing Simula: Min Ragan-Kelley Jessica Zosa-Forde, Tim Head
  18. Berkeley’s Data Science Courses http://data8.org ❖ Freshmen & upper division

    ❖ Interactive textbooks: Jupyter Notebooks ❖ Course deployment: JupyterHub http://ds100.org
  19. DataHub datahub.berkeley.edu Supporting 2,500+ users Being used for Data 8,

    as well as several other courses Requires @berkeley.edu to access Running on Azure with almost zero maintenance Slide: C. Holdgraf
  20. Fastest growing courses in Berkeley history Thanks to
 Yuvi Panda

    (DSEP), Ryan Lovett (Statistics), DSEP team
  21. Berkeley in a few years… “We are witnessing a monumental

    phase shift in data science knowledge on campus - undergrads are extremely well trained…” Ciera Martinez, BIDS Fellow