Start Small and Scale: Big Data and Jupyter's Ecosystem

C8eedb2bca5728f0f73294b5b5a0222e?s=47 Carol Willing
December 05, 2019

Start Small and Scale: Big Data and Jupyter's Ecosystem

Keynote Presentation at PyData LA conference held at Cal State Los Angeles.

Jupyter notebooks have become the de-facto standard as a scientific and data science tool for producing computational narratives. Over five million Jupyter notebooks exist on GitHub today. Beyond the classic Jupyter notebook, Project Jupyter's tools have evolved to provide end to end workflows for research that enable scientists to prototype, collaborate, and scale with ease. JupyterLab, a web-based, extensible, next generation interactive development environment enables researchers to combine Jupyter notebooks, code and data to form computational narratives. JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Binder builds upon JupyterHub and provides free, sharable, interactive computing environments to people all around the world.

C8eedb2bca5728f0f73294b5b5a0222e?s=128

Carol Willing

December 05, 2019
Tweet

Transcript

  1. @WillingCarol Start Small and Scale Carol Willing PyData LA December

    5, 2019 https://speakerdeck.com/willingc Big Data and Jupyter's Ecosystem
  2. @WillingCarol Hi! I'm Carol. • Python Steering Council • Jupyter

    Steering Council • Core Developer, Python, Jupyter, nteract • PSF Fellow and Former Director • Frank Willison Award 2019 • Open Source Directions Podcast Co- host 2
  3. @WillingCarol 3 Core maintainer Papermill, Scrapbook, Bookstore, Commuter Steering Council,

    Core Developer JupyterHub, BinderHub, mybinder.org I love creating tools which educate and empower people.
  4. @WillingCarol What is Data Science 4 Start Here

  5. @WillingCarol 5 Using data responsibly to solve real world issues

    and improve human lives
  6. @WillingCarol Predictions at Scale 6 A real world tale

  7. @WillingCarol 7 San Diego, CA

  8. @WillingCarol 8 Tokyo

  9. @WillingCarol 9 Sunday Oct 6 Source: ECMWF

  10. Copyright: 2019 European Union, contains modified Copernicus Sentinel data 2019,

    processed by EUMETSAT Super Typhoon Hagibis View of Super Typhoon Hagibis south-west of Japan, as captured by the Copernicus Sentinel-3 satellite on 08 October at 00:16 UTC.
  11. Title Typhoon Hagibis Released 10/10/2019 4:45 pm Copyright contains modified

    Copernicus Sentinel data (2019), processed by ESA, CC BY-SA 3.0 IGO
  12. Source:Twitter

  13. @WillingCarol 13

  14. @WillingCarol 14

  15. A sign is partially submerged as the Tama River floods

    during Typhoon Hagibis. Source:Getty Images Source:Japan Times
  16. @WillingCarol Preparation Evacuation Safety 16

  17. @WillingCarol With scale comes responsibility 17 Reality Check

  18. @WillingCarol 18 "Diabetes awareness: character design" by Carlos Hernandez is

    licensed under CC BY-NC-ND 4.0
  19. @WillingCarol –Kevin Sayer, DexCom CEO This whole integration of health

    care data is really going to be the next frontier. 19 https://www.cnbc.com/2019/11/13/big-data-is-the-next-frontier-for-medicine-says-dexcom-ceo.html https://www.businesswire.com/news/home/20191106005764/en/Dexcom-Reports-Quarter-2019-Financial-Results
  20. @WillingCarol Outage Midnight Friday: mysterious outage Dexcom did not announce

    there was an outage until about 8 a.m. Pacific time Saturday, which is 11 a.m. on the East Coast, when it posted a brief notice on its Facebook page. Monday morning: Dexcom Follow partly restored 20 https://www.nytimes.com/2019/12/02/well/live/Dexcom-G6-diabetes-monitor-outage.html Source: https://www.dexcom.com/
  21. @WillingCarol Getting from Start to Scale 21 The Challenge

  22. @WillingCarol 22 jupyter.org

  23. @WillingCarol 23 2014 Now, 5 years later... 23

  24. 
 Millions of Notebooks https://github.com/trending/jupyter-notebook Over 5 million on GitHub

  25. @WillingCarol 25 ‣ Growth ‣ ACM Award ‣ Industry adoption

    ‣ Creative uses ‣ Open Source Book https://www.youtube.com/watch?v=qbtDVdEr8SY
  26. jupyter.org

  27. @WillingCarol 27 Start Small Deploy and Sustain Explore Paths to

    Scale The Roadmap
  28. @WillingCarol Start Small 28 Step 1

  29. @WillingCarol 29 Source: xkcd

  30. @WillingCarol 30 small

  31. @WillingCarol 31 Binder 2.0 blog post elifesciences: Share your interactive

    research environment Nature article about Binder 31 mybinder.org Try it. No install needed.
  32. @WillingCarol 32 Scale in Production

  33. @WillingCarol Choose your Tools 33

  34. @WillingCarol JupyterLab 34

  35. 35 jupyter.org demo

  36. 36 jupyter.org demo

  37. @WillingCarol 37 ReactJS front end nteract nteract.io

  38. @WillingCarol VS Code 38 PyCharm

  39. @WillingCarol Avoid reinventing the wheel 39

  40. @WillingCarol ecosystem 40

  41. @WillingCarol Install Promising Libraries 41 Use Anaconda Use pip Use

    miniconda, conda, and conda-forge
  42. @WillingCarol Start 42 Try it in the browser Install Libraries

    Choose your tools Avoid reinventing the wheel Step 1
  43. @WillingCarol Explore Paths to Scale 43 Step 2

  44. @WillingCarol Turn no way into it's possible 44

  45. @WillingCarol Community 45 Conferences Meetups PyLadies Carpentries Photo: Python Sul

  46. @WillingCarol 46 Ten Simple Rules for Reproducible Research in Jupyter

    Notebooks Adam Rule et al. https://github.com/jupyter-guide/ ten-rules-jupyter https://github.com/jupyter-guide/ jupyter-guide
  47. @WillingCarol 47 build, try, change, repeat

  48. @WillingCarol 48 ipyvolume https://towardsdatascience.com/multivolume- rendering-in-jupyter-with-ipyvolume-cross- language-3d-visualization-64389047634a

  49. @WillingCarol 49

  50. @WillingCarol 50 github.com/napari/napari napari https://ilovesymposia.com/2019/10/24/introducing-napari- a-fast-n-dimensional-image-viewer-in-python/

  51. A pictorial representation of the different tools constituting BinderHub. This

    image was created by Scriberia for The Turing Way community and is used under a CC-BY licence. Zenodo record. https://blog.jupyter.org/diving-into- leadership-to-build-push-button-code- df2a075c9914
  52. @WillingCarol What's new 52 Talk Python to Me Tracking Jupyter

    Newsletter https://tinyletter.com/TrackingJupyter/archive Open Source Directions GitHub Trending Follow projects on Social Media
  53. @WillingCarol Explore 53 Use the ecosystem to learn Best practices

    Infrastructure/Analysis What's new Step 2
  54. @WillingCarol Deploy and Sustain 54 Step 3

  55. @WillingCarol 55 "Digital World" by NBroekzitter86 is licensed under CC

    BY 2.0
  56. @WillingCarol Notebooks to web 56 https://blog.jupyter.org/and- voil%C3%A0-f6a2c08a4a93

  57. @WillingCarol 57 nteract Papermill Scrapbook Bookstore Commuter Production data at

    scale 57 https://medium.com/netflix-techblog/notebook-innovation-591ee3221233 Bookstore
  58. @WillingCarol 58 Papermill - parameterize / run Scrapbook - recording

    / reading Bookstore - store notebooks Commuter - share notebooks Production data at scale 58
  59. @WillingCarol 59 Enterprise data workflows 59

  60. zero-to-jupyterhub.readthedocs.io

  61. @WillingCarol 61 61 Deploy your own BinderHub

  62. 62 Juliette Taka

  63. @WillingCarol 63 From a phone in the park! 63

  64. @WillingCarol Pangeo 64 https://pangeo.io

  65. @WillingCarol 65

  66. @WillingCarol Deploy and Sustain 66 Workflows Document Monitor Involvement Step

    3
  67. @WillingCarol 67 From Small to Scale Keys for Success

  68. @WillingCarol Choose to Start 68

  69. @WillingCarol Why > how 69

  70. @WillingCarol Automate the Boring Stuff 70

  71. @WillingCarol Plan Execute Change 71 https://jupyterhub-team-compass.readthedocs.io https://github.com/jupyterhub/team-compass

  72. @WillingCarol Consider complexity and observability 72

  73. @WillingCarol People = Responsibility 73

  74. @WillingCarol 74 Call to Action

  75. @WillingCarol ecosystem 75

  76. @WillingCarol 76 Using data responsibly to solve real world issues

    and improve human lives
  77. @WillingCarol 77 Justine Dupont surfs the greatest wave of her

    life in Nazaré, Portugal © Rafael G. Riancho / Red Bull Content Pool
  78. @WillingCarol 78 Thank you https://speakerdeck.com/willingc @WillingCarol

  79. @WillingCarol 79 Questions https://speakerdeck.com/willingc

  80. @WillingCarol 80 Thank you PyData LA Project Jupyter Team Core

    Python Team PSF NumFOCUS
  81. @WillingCarol Attributions 81 Attributions on slides. Photos [7-8] Carol Willing

    and Linnea Willing [14] The Carpentries, Tracy Teal, Bérénice Batut [14] Godzilla By Toho Company Ltd. (東宝株式会社, Tōhō Kabushiki-kaisha) © 1954 - movie poster made by Toho Company Ltd. (東宝株式会社, Tōhō Kabushiki-kaisha), Public Domain, https://commons.wikimedia.org/w/index.php?curid=3648684