$30 off During Our Annual Pro Sale. View Details »

Interactive Exploration of 
Large-Scale Datasets with 
Jupyter-Scatter

Interactive Exploration of 
Large-Scale Datasets with 
Jupyter-Scatter

Slides related to my talk from the SciPy '23 conference: https://cfp.scipy.org/2023/talk/AXSVZ3/

Fritz Lekschas

July 13, 2023
Tweet

More Decks by Fritz Lekschas

Other Decks in Technology

Transcript

  1. July 13, 2023
    Interactive Exploration of
    Large-Scale Datasets with
    Jupyter-Scatter
    Fritz Lekschas
    @flekschas
    lekschas.de
    1
    SciPy '23

    View Slide

  2. 2
    WORK
    Head of Visualization Research at Ozette
    EDUCATION
    PhD '21 in CS from Harvard University
    RESEARCH
    Visualization Human-Centered ML Design

    View Slide

  3. 3
    PASSION
    Embeddings &
    Scatter plots!

    View Slide

  4. 3
    PASSION
    Embeddings &
    Scatter plots!

    View Slide

  5. 4
    PASSION
    Embeddings &
    Scatter plots!

    View Slide

  6. 5

    View Slide

  7. 5

    View Slide

  8. 6

    View Slide

  9. 7
    USEFUL FOR
    Overview of Data
    Explore & Compare Clusters

    View Slide

  10. 7
    USEFUL FOR
    Overview of Data
    Explore & Compare Clusters
    Data from Mair et al., 2022. Nature.

    View Slide

  11. 7
    USEFUL FOR
    Overview of Data
    Explore & Compare Clusters
    Data from Mair et al., 2022. Nature.

    View Slide

  12. 7
    USEFUL FOR
    Overview of Data
    Explore & Compare Clusters
    Data from Mair et al., 2022. Nature.

    View Slide

  13. Jupyter Scatter
    A widget for interactive exploration of large-scale
    scatter plots.
    8
    github.com/flekschas/jupyter-scatter
    pip install jupyter-scatter

    View Slide

  14. GOALS
    1. Scale to millions of points
    2. Support interactive pan+zoom and selections
    3. Offer perceptually-effective defaults
    4. Allow linking multiple scatter plots
    5. Expose via an easy-to-use API
    9

    View Slide

  15. GOALS
    1. Scale to millions of points
    2. Support interactive pan+zoom and selections
    3. Offer perceptually-effective defaults
    4. Allow linking multiple scatter plots
    5. Expose via an easy-to-use API
    10

    View Slide

  16. GOALS
    1. Scale to millions of points
    2. Support interactive pan+zoom and selections
    3. Offer perceptually-effective defaults
    4. Allow linking multiple scatter plots
    5. Expose via an easy-to-use API
    11

    View Slide

  17. GOALS
    1. Scale to millions of points
    2. Support interactive pan+zoom and selections
    3. Offer perceptually-effective defaults
    4. Allow linking multiple scatter plots
    5. Expose via an easy-to-use API
    12

    View Slide

  18. GOALS
    1. Scale to millions of points
    2. Support interactive pan+zoom and selections
    3. Offer perceptually-effective defaults
    4. Allow linking multiple scatter plots
    5. Expose via an easy-to-use API
    13

    View Slide

  19. GOALS
    1. Scale to millions of points
    2. Support interactive pan+zoom and selections
    3. Offer perceptually-effective defaults
    4. Allow linking of multiple scatter plots
    5. Expose via an easy-to-use API
    14

    View Slide

  20. Live Demos!
    https://github.com/flekschas/jupyter-scatter-tutorial
    15

    View Slide

  21. ARCHITECTURE
    1. WebGL Rendering via regl-scatterplot1
    for fast plotting
    2. Python API layer for integrating with Pandas
    and configuring regl-scatterplot1
    3. Ipywidgets for communication with Jupyter
    via anywidget2
    16
    1) https://github.com/flekschas/regl-scatterplot/
    2) https://github.com/manzt/anywidget/

    View Slide

  22. ! MASSIVE SHOUT OUTS!
    Trevor Manz for the codec design,
    anywidget integration, & tutorial setup
    Nezar Abdennur for feedback––––––––
    on the API design––––––––
    Ricky Reusser for his inspirational work
    on selecting the right point opacity
    Rye Terrell for his beautiful multi-instance––––––––
    WebGL rendering approach––––––––
    17

    View Slide

  23. Thanks!
    pip install jupyter-scatter
    github.com/flekschas/jupyter-scatter
    github.com/flekschas/jupyter-scatter-tutorial
    @flekschas
    lekschas.de
    July 13, 2023 SciPy '23

    View Slide