Slide 1

Slide 1 text

July 13, 2023 Interactive Exploration of Large-Scale Datasets with Jupyter-Scatter Fritz Lekschas @flekschas lekschas.de 1 SciPy '23

Slide 2

Slide 2 text

2 WORK Head of Visualization Research at Ozette EDUCATION PhD '21 in CS from Harvard University RESEARCH Visualization Human-Centered ML Design

Slide 3

Slide 3 text

3 PASSION Embeddings & Scatter plots!

Slide 4

Slide 4 text

3 PASSION Embeddings & Scatter plots!

Slide 5

Slide 5 text

4 PASSION Embeddings & Scatter plots!

Slide 6

Slide 6 text

5

Slide 7

Slide 7 text

5

Slide 8

Slide 8 text

6

Slide 9

Slide 9 text

7 USEFUL FOR Overview of Data Explore & Compare Clusters

Slide 10

Slide 10 text

7 USEFUL FOR Overview of Data Explore & Compare Clusters Data from Mair et al., 2022. Nature.

Slide 11

Slide 11 text

7 USEFUL FOR Overview of Data Explore & Compare Clusters Data from Mair et al., 2022. Nature.

Slide 12

Slide 12 text

7 USEFUL FOR Overview of Data Explore & Compare Clusters Data from Mair et al., 2022. Nature.

Slide 13

Slide 13 text

Jupyter Scatter A widget for interactive exploration of large-scale scatter plots. 8 github.com/flekschas/jupyter-scatter pip install jupyter-scatter

Slide 14

Slide 14 text

GOALS 1. Scale to millions of points 2. Support interactive pan+zoom and selections 3. Offer perceptually-effective defaults 4. Allow linking multiple scatter plots 5. Expose via an easy-to-use API 9

Slide 15

Slide 15 text

GOALS 1. Scale to millions of points 2. Support interactive pan+zoom and selections 3. Offer perceptually-effective defaults 4. Allow linking multiple scatter plots 5. Expose via an easy-to-use API 10

Slide 16

Slide 16 text

GOALS 1. Scale to millions of points 2. Support interactive pan+zoom and selections 3. Offer perceptually-effective defaults 4. Allow linking multiple scatter plots 5. Expose via an easy-to-use API 11

Slide 17

Slide 17 text

GOALS 1. Scale to millions of points 2. Support interactive pan+zoom and selections 3. Offer perceptually-effective defaults 4. Allow linking multiple scatter plots 5. Expose via an easy-to-use API 12

Slide 18

Slide 18 text

GOALS 1. Scale to millions of points 2. Support interactive pan+zoom and selections 3. Offer perceptually-effective defaults 4. Allow linking multiple scatter plots 5. Expose via an easy-to-use API 13

Slide 19

Slide 19 text

GOALS 1. Scale to millions of points 2. Support interactive pan+zoom and selections 3. Offer perceptually-effective defaults 4. Allow linking of multiple scatter plots 5. Expose via an easy-to-use API 14

Slide 20

Slide 20 text

Live Demos! https://github.com/flekschas/jupyter-scatter-tutorial 15

Slide 21

Slide 21 text

ARCHITECTURE 1. WebGL Rendering via regl-scatterplot1 for fast plotting 2. Python API layer for integrating with Pandas and configuring regl-scatterplot1 3. Ipywidgets for communication with Jupyter via anywidget2 16 1) https://github.com/flekschas/regl-scatterplot/ 2) https://github.com/manzt/anywidget/

Slide 22

Slide 22 text

! MASSIVE SHOUT OUTS! Trevor Manz for the codec design, anywidget integration, & tutorial setup Nezar Abdennur for feedback–––––––– on the API design–––––––– Ricky Reusser for his inspirational work on selecting the right point opacity Rye Terrell for his beautiful multi-instance–––––––– WebGL rendering approach–––––––– 17

Slide 23

Slide 23 text

Thanks! pip install jupyter-scatter github.com/flekschas/jupyter-scatter github.com/flekschas/jupyter-scatter-tutorial @flekschas lekschas.de July 13, 2023 SciPy '23