Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Start Small and Scale: Big Data and Jupyter's Ecosystem

Start Small and Scale: Big Data and Jupyter's Ecosystem

Keynote Presentation at PyData LA conference held at Cal State Los Angeles.

Jupyter notebooks have become the de-facto standard as a scientific and data science tool for producing computational narratives. Over five million Jupyter notebooks exist on GitHub today. Beyond the classic Jupyter notebook, Project Jupyter's tools have evolved to provide end to end workflows for research that enable scientists to prototype, collaborate, and scale with ease. JupyterLab, a web-based, extensible, next generation interactive development environment enables researchers to combine Jupyter notebooks, code and data to form computational narratives. JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Binder builds upon JupyterHub and provides free, sharable, interactive computing environments to people all around the world.

Carol Willing
PRO

December 05, 2019
Tweet

More Decks by Carol Willing

Other Decks in Technology

Transcript

  1. @WillingCarol
    Start Small and Scale
    Carol Willing
    PyData LA
    December 5, 2019
    https://speakerdeck.com/willingc
    Big Data and Jupyter's Ecosystem

    View Slide

  2. @WillingCarol
    Hi! I'm Carol.
    • Python Steering Council
    • Jupyter Steering Council
    • Core Developer, Python, Jupyter,
    nteract
    • PSF Fellow and Former Director
    • Frank Willison Award 2019
    • Open Source Directions Podcast Co-
    host
    2

    View Slide

  3. @WillingCarol 3
    Core maintainer
    Papermill, Scrapbook, Bookstore, Commuter
    Steering Council, Core Developer
    JupyterHub, BinderHub, mybinder.org
    I love creating tools which
    educate and empower
    people.

    View Slide

  4. @WillingCarol
    What is
    Data Science
    4
    Start Here

    View Slide

  5. @WillingCarol 5
    Using data responsibly to
    solve real world issues and
    improve human lives

    View Slide

  6. @WillingCarol
    Predictions
    at Scale
    6
    A real world tale

    View Slide

  7. @WillingCarol 7
    San Diego, CA

    View Slide

  8. @WillingCarol 8
    Tokyo

    View Slide

  9. @WillingCarol 9
    Sunday Oct 6
    Source: ECMWF

    View Slide

  10. Copyright: 2019 European
    Union, contains modified
    Copernicus Sentinel data
    2019, processed by
    EUMETSAT
    Super Typhoon
    Hagibis
    View of Super Typhoon
    Hagibis south-west of
    Japan, as captured by
    the Copernicus
    Sentinel-3 satellite on
    08 October at 00:16
    UTC.

    View Slide

  11. Title Typhoon Hagibis
    Released 10/10/2019 4:45 pm
    Copyright contains modified Copernicus
    Sentinel data (2019), processed by ESA,
    CC BY-SA 3.0 IGO

    View Slide

  12. Source:Twitter

    View Slide

  13. @WillingCarol 13

    View Slide

  14. @WillingCarol 14

    View Slide

  15. A sign is partially submerged as the Tama River floods during Typhoon Hagibis.
    Source:Getty Images
    Source:Japan Times

    View Slide

  16. @WillingCarol
    Preparation
    Evacuation
    Safety
    16

    View Slide

  17. @WillingCarol
    With scale comes
    responsibility
    17
    Reality Check

    View Slide

  18. @WillingCarol 18
    "Diabetes awareness: character design" by Carlos Hernandez is licensed under CC BY-NC-ND 4.0

    View Slide

  19. @WillingCarol
    –Kevin Sayer, DexCom CEO
    This whole integration of health care
    data is really going to be the next
    frontier.
    19
    https://www.cnbc.com/2019/11/13/big-data-is-the-next-frontier-for-medicine-says-dexcom-ceo.html
    https://www.businesswire.com/news/home/20191106005764/en/Dexcom-Reports-Quarter-2019-Financial-Results

    View Slide

  20. @WillingCarol
    Outage
    Midnight Friday: mysterious outage
    Dexcom did not announce there was an
    outage until about 8 a.m. Pacific
    time Saturday, which is 11 a.m. on
    the East Coast, when it posted a
    brief notice on its Facebook page.
    Monday morning: Dexcom Follow partly
    restored
    20
    https://www.nytimes.com/2019/12/02/well/live/Dexcom-G6-diabetes-monitor-outage.html
    Source: https://www.dexcom.com/

    View Slide

  21. @WillingCarol
    Getting from
    Start to Scale
    21
    The Challenge

    View Slide

  22. @WillingCarol 22
    jupyter.org

    View Slide

  23. @WillingCarol 23
    2014
    Now,
    5 years
    later...
    23

    View Slide


  24. Millions of
    Notebooks
    https://github.com/trending/jupyter-notebook
    Over 5 million
    on GitHub

    View Slide

  25. @WillingCarol 25
    ‣ Growth
    ‣ ACM Award
    ‣ Industry adoption
    ‣ Creative uses
    ‣ Open Source Book
    https://www.youtube.com/watch?v=qbtDVdEr8SY

    View Slide

  26. jupyter.org

    View Slide

  27. @WillingCarol 27
    Start Small
    Deploy and Sustain
    Explore Paths to Scale
    The Roadmap

    View Slide

  28. @WillingCarol
    Start Small
    28
    Step 1

    View Slide

  29. @WillingCarol 29
    Source: xkcd

    View Slide

  30. @WillingCarol 30
    small

    View Slide

  31. @WillingCarol 31
    Binder 2.0 blog post
    elifesciences: Share
    your interactive
    research environment
    Nature article about
    Binder
    31
    mybinder.org
    Try it. No install needed.

    View Slide

  32. @WillingCarol 32
    Scale in Production

    View Slide

  33. @WillingCarol
    Choose your Tools
    33

    View Slide

  34. @WillingCarol
    JupyterLab
    34

    View Slide

  35. 35
    jupyter.org demo

    View Slide

  36. 36
    jupyter.org demo

    View Slide

  37. @WillingCarol 37
    ReactJS front end
    nteract
    nteract.io

    View Slide

  38. @WillingCarol
    VS Code
    38
    PyCharm

    View Slide

  39. @WillingCarol
    Avoid reinventing the wheel
    39

    View Slide

  40. @WillingCarol
    ecosystem
    40

    View Slide

  41. @WillingCarol
    Install Promising Libraries
    41
    Use Anaconda
    Use pip
    Use miniconda, conda, and conda-forge

    View Slide

  42. @WillingCarol
    Start
    42
    Try it in the browser
    Install Libraries
    Choose your tools
    Avoid reinventing the wheel
    Step 1

    View Slide

  43. @WillingCarol
    Explore Paths to Scale
    43
    Step 2

    View Slide

  44. @WillingCarol
    Turn no way into
    it's possible
    44

    View Slide

  45. @WillingCarol
    Community
    45
    Conferences
    Meetups
    PyLadies
    Carpentries
    Photo: Python Sul

    View Slide

  46. @WillingCarol 46
    Ten Simple Rules
    for Reproducible
    Research in
    Jupyter Notebooks
    Adam Rule et al.
    https://github.com/jupyter-guide/
    ten-rules-jupyter
    https://github.com/jupyter-guide/
    jupyter-guide

    View Slide

  47. @WillingCarol 47
    build, try, change, repeat

    View Slide

  48. @WillingCarol 48
    ipyvolume
    https://towardsdatascience.com/multivolume-
    rendering-in-jupyter-with-ipyvolume-cross-
    language-3d-visualization-64389047634a

    View Slide

  49. @WillingCarol 49

    View Slide

  50. @WillingCarol 50
    github.com/napari/napari
    napari
    https://ilovesymposia.com/2019/10/24/introducing-napari-
    a-fast-n-dimensional-image-viewer-in-python/

    View Slide

  51. A pictorial representation of the different tools constituting BinderHub. This image was created by Scriberia for The Turing Way
    community and is used under a CC-BY licence. Zenodo record.
    https://blog.jupyter.org/diving-into-
    leadership-to-build-push-button-code-
    df2a075c9914

    View Slide

  52. @WillingCarol
    What's new
    52
    Talk Python to Me
    Tracking Jupyter Newsletter
    https://tinyletter.com/TrackingJupyter/archive
    Open Source Directions
    GitHub Trending
    Follow projects on Social Media

    View Slide

  53. @WillingCarol
    Explore
    53
    Use the ecosystem to learn
    Best practices
    Infrastructure/Analysis
    What's new
    Step 2

    View Slide

  54. @WillingCarol
    Deploy and Sustain
    54
    Step 3

    View Slide

  55. @WillingCarol 55
    "Digital World" by NBroekzitter86 is licensed under CC BY 2.0

    View Slide

  56. @WillingCarol
    Notebooks to web
    56
    https://blog.jupyter.org/and-
    voil%C3%A0-f6a2c08a4a93

    View Slide

  57. @WillingCarol 57
    nteract
    Papermill
    Scrapbook
    Bookstore
    Commuter
    Production data at scale
    57
    https://medium.com/netflix-techblog/notebook-innovation-591ee3221233
    Bookstore

    View Slide

  58. @WillingCarol 58
    Papermill - parameterize / run
    Scrapbook - recording / reading
    Bookstore - store notebooks
    Commuter - share notebooks
    Production data at scale
    58

    View Slide

  59. @WillingCarol 59
    Enterprise data workflows
    59

    View Slide

  60. zero-to-jupyterhub.readthedocs.io

    View Slide

  61. @WillingCarol 61 61
    Deploy
    your own
    BinderHub

    View Slide

  62. 62
    Juliette Taka

    View Slide

  63. @WillingCarol 63
    From a
    phone in
    the park!
    63

    View Slide

  64. @WillingCarol
    Pangeo
    64
    https://pangeo.io

    View Slide

  65. @WillingCarol 65

    View Slide

  66. @WillingCarol
    Deploy and Sustain
    66
    Workflows
    Document
    Monitor
    Involvement
    Step 3

    View Slide

  67. @WillingCarol 67
    From Small to Scale
    Keys for Success

    View Slide

  68. @WillingCarol
    Choose to Start
    68

    View Slide

  69. @WillingCarol
    Why > how
    69

    View Slide

  70. @WillingCarol
    Automate the
    Boring Stuff
    70

    View Slide

  71. @WillingCarol
    Plan
    Execute
    Change
    71
    https://jupyterhub-team-compass.readthedocs.io
    https://github.com/jupyterhub/team-compass

    View Slide

  72. @WillingCarol
    Consider complexity
    and observability
    72

    View Slide

  73. @WillingCarol
    People =
    Responsibility
    73

    View Slide

  74. @WillingCarol 74
    Call to Action

    View Slide

  75. @WillingCarol
    ecosystem
    75

    View Slide

  76. @WillingCarol 76
    Using data responsibly to
    solve real world issues and
    improve human lives

    View Slide

  77. @WillingCarol 77
    Justine Dupont surfs the greatest wave of her life in Nazaré, Portugal
    © Rafael G. Riancho / Red Bull Content Pool

    View Slide

  78. @WillingCarol 78
    Thank you
    https://speakerdeck.com/willingc
    @WillingCarol

    View Slide

  79. @WillingCarol 79
    Questions
    https://speakerdeck.com/willingc

    View Slide

  80. @WillingCarol 80
    Thank you PyData LA
    Project Jupyter Team
    Core Python Team
    PSF
    NumFOCUS

    View Slide

  81. @WillingCarol
    Attributions
    81
    Attributions on slides.
    Photos
    [7-8] Carol Willing and Linnea Willing
    [14] The Carpentries, Tracy Teal, Bérénice Batut
    [14] Godzilla
    By Toho Company Ltd. (東宝株式会社, Tōhō Kabushiki-kaisha) © 1954 - movie
    poster made by Toho Company Ltd. (東宝株式会社, Tōhō Kabushiki-kaisha),
    Public Domain, https://commons.wikimedia.org/w/index.php?curid=3648684

    View Slide