Slide 1

Slide 1 text

July 14, 2024 The Insight's in the Details. Fritz Lekschas @flekschas lekschas.de 1 ISMB - BioVis

Slide 2

Slide 2 text

Challenges and Opportunities for BioVis Software Tools 2 Fritz Lekschas @flekschas lekschas.de July 14, 2024 ISMB - BioVis

Slide 3

Slide 3 text

! MASSIVE SHOUT OUTS! Trevor Manz for feedback and thoughts on the talk and developing anywidget Ashley Wilson for providing examples–––––––– and talk feedback–––––––– Nezar Abdennur for talk feedback and ipylangchat Arpan Neupane for helping me better–––––––– understand immunology–––––––– 3

Slide 4

Slide 4 text

4 EDUCATION PhD '21 in CS from Harvard University MSc '16 in Bioinformatics from Freie Universität Berlin RESEARCH Visualization Human-Centered ML Design WORK Head of Visualization Research at Ozette

Slide 5

Slide 5 text

5 Data-Driven Discovery of High-Resolution and Interpretable Cell Phenotypes in Single-Cell Cytometry Data

Slide 6

Slide 6 text

6

Slide 7

Slide 7 text

6 Data from Mair et al., 2022. Nature. From To General Cell Types High-Resolution Cell Phenotypes Well-Resolved High-Resolution Cell Phenotypes Cytotoxic T Cells T Helper Cells B Cells Naïve T Cells

Slide 8

Slide 8 text

7 Data from Mair et al., 2022. Nature. Cytotoxic T Cells T Helper Cells B Cells Naïve T Cells From To How are cell types characterized? How are cells clustered into phenotypes? How are cell phenotypes visually resolved?

Slide 9

Slide 9 text

8

Slide 10

Slide 10 text

9

Slide 11

Slide 11 text

10

Slide 12

Slide 12 text

11

Slide 13

Slide 13 text

12

Slide 14

Slide 14 text

13 PhD Bachelor Master Ontology-Guided Visual Exploration of BioMedical Data Repositories Visualization Tools for Epigenomic Data CellFinder Semantic Body Browser Satori HiGlass Peax ABC Enhancer-Gene

Slide 15

Slide 15 text

14 PhD Ozette Visualization Tools for Epigenomic Data HiGlass Peax ABC Enhancer-Gene Abundance Embeddings Visual Exploration of Single-Cell Data Jupyter Scatter High-Dim Data Exploration

Slide 16

Slide 16 text

14 PhD Bachelor Master Ozette Ontology-Guided Visual Exploration of BioMedical Data Repositories Visualization Tools for Epigenomic Data CellFinder Semantic Body Browser Satori HiGlass Peax ABC Enhancer-Gene Abundance Embeddings Visual Exploration of Single-Cell Data Jupyter Scatter High-Dim Data Exploration Find a matching dataset in large repository Correlate motif patterns in genome wide context Identify & understand rare cell phenotypes

Slide 17

Slide 17 text

The insights are often in the details. Those details can come in the form of analytical complexity or subsets of a large dataset. 15

Slide 18

Slide 18 text

Understanding the meaning of visual patterns can be challenging with growing analytical complexity. 16

Slide 19

Slide 19 text

Perceiving visual patterns of small subsets in the larger context can be challenging. 17

Slide 20

Slide 20 text

GOALS 1. Challenges for Biological Data Visualization 2. Role of Visualization in Biology 3. What does that mean for BioVis Software Tools 4. Exciting tools to address the challenges 18

Slide 21

Slide 21 text

The Role of Data Visualization 19

Slide 22

Slide 22 text

Ben Shneiderman, 2019 “The purpose of visualization is insight, not pictures” 20 “The goal of visualization in computing is to gain insight by using our visual machinery.” McCormick et al., 1987 “The primary objective in data visualization is to gain insight into an information space by mapping data onto graphical primitives.” Senay and Ignatius, 1990 “[Visualization facilitates] the use of computer- supported, interactive, visual representations of abstract data to amplify cognition.” Card et al, 1999

Slide 23

Slide 23 text

Ben Shneiderman, 2019 “The purpose of visualization is insight, not pictures” 21 “The goal of visualization in computing is to gain insight by using our visual machinery.” McCormick et al., 1987 “The primary objective in data visualization is to gain insight into an information space by mapping data onto graphical primitives.” Senay and Ignatius, 1990 “[Visualization facilitates] the use of computer- supported, interactive, visual representations of abstract data to amplify cognition.” Card et al, 1999

Slide 24

Slide 24 text

Definitions of Insights • A deep understanding of something • Understanding the true nature of a thing • An underlying truth • Information that is obvious but unknown • Unknown but useful information (that enables decision making) 22

Slide 25

Slide 25 text

Min Chen, Luciano Floridi, and Rita Bordgo (2013) “Saving time in accomplishing a user’s task is the most fundamental objective [of visualization]” 23

Slide 26

Slide 26 text

Min Chen, Luciano Floridi, and Rita Bordgo (2013) “Saving time in accomplishing a user’s task is the most fundamental objective [of visualization]” 24

Slide 27

Slide 27 text

Purpose of Visualization Help user surface and understand useful information fast. 25

Slide 28

Slide 28 text

Purpose of Visualization Help user surface and understand useful information fast. I.e., gain insights. 26

Slide 29

Slide 29 text

The Role of Data Visualization In BioMedical Research 27

Slide 30

Slide 30 text

28

Slide 31

Slide 31 text

29

Slide 32

Slide 32 text

30

Slide 33

Slide 33 text

31

Slide 34

Slide 34 text

32

Slide 35

Slide 35 text

33

Slide 36

Slide 36 text

34 Apply Visualization Broadly

Slide 37

Slide 37 text

Purpose of Visualization for Biology 1. Help surface and understand unknown but useful information 2. Ensure efficient human-in-the-loop pipelines 3. Apply broadly and frequently 35

Slide 38

Slide 38 text

Purpose of Visualization for Biology 1. Help surface and understand unknown but useful information 2. Ensure efficient human-in-the-loop analysis pipelines 3. Apply broadly and frequently Challenges in BioVis 1. Analytical complexity complicates understanding of visual patterns 2. Visual patterns of subsets can be hard to perceive in broader context 36

Slide 39

Slide 39 text

What does this mean for BioVis Software Tools? 37

Slide 40

Slide 40 text

BioVis tools need to... 1. Be where the computation & data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 38

Slide 41

Slide 41 text

BioVis tools need to... 1. Be where the computation & data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 39

Slide 42

Slide 42 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40

Slide 43

Slide 43 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40

Slide 44

Slide 44 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " "

Slide 45

Slide 45 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Slows analysis down Removes you from the analysis context

Slide 46

Slide 46 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Slows analysis down Removes you from the analysis context

Slide 47

Slide 47 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Slows analysis down Removes you from the analysis context

Slide 48

Slide 48 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Apply Visualization Directly Stay in the context

Slide 49

Slide 49 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 41 Lex et al., 2014. IEEE Transactions on Visualization and Computer Graphics. | Conway et al., 2017. Bioinformatics. https://upset.app/implementations/ Original UpSet UpSetR

Slide 50

Slide 50 text

BioVis tools need to be where the computation & data are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 42 https://github.com/higlass/higlass-python Original HiGlass HiGlass Python Kerpedjiev et al., 2018. Genome Biology. https://higlass.io

Slide 51

Slide 51 text

BioVis tools need to... 1. Be where the computation & data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 43

Slide 52

Slide 52 text

BioVis tools need to be composable! Why? Because we often need multiple visualization tools to explain complex patterns. 44

Slide 53

Slide 53 text

BioVis tools need to be composable! Why? Because we often need multiple visualization tools to explain complex patterns. 44 Hard to correlate visual patterns Need to move back and forth

Slide 54

Slide 54 text

BioVis tools need to be composable! Why? Because we often need multiple visualization tools to explain complex patterns. 44 Hard to correlate visual patterns Need to move back and forth

Slide 55

Slide 55 text

BioVis tools need to be composable! Why? Because we often need multiple visualization tools to explain complex patterns. 44 Compose and Interlink Visualizations

Slide 56

Slide 56 text

BioVis tools need to be composable! Why? Because we often need multiple visualization tools to explain complex patterns. 45 Keller et al., 2021 OSF Preprint. | Kerpedjiev et al., 2018. Genome Biology | Lekschas et al., 2024. arXiv. http://vitessce.io Vitessce HiGlass + Jupyter Scatter https://github.com/flekschas/jupyter-scatter-tutorial

Slide 57

Slide 57 text

BioVis tools need to be composable! Why? Because we often need multiple visualization tools to explain complex patterns. 45 Keller et al., 2021 OSF Preprint. | Kerpedjiev et al., 2018. Genome Biology | Lekschas et al., 2024. arXiv. http://vitessce.io Vitessce HiGlass + Jupyter Scatter https://github.com/flekschas/jupyter-scatter-tutorial Also a great example of being where the computation and data is.

Slide 58

Slide 58 text

BioVis tools need to... 1. Be where the computation & data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 46

Slide 59

Slide 59 text

BioVis tools need to be scalable! Why? Data and datasets are only ever going to increase in size. 47 Presumed Context Presumed Context

Slide 60

Slide 60 text

BioVis tools need to be scalable! Why? Data and datasets are only ever going to increase in size. 47 Might prevent future tool usage Presumed Context Presumed Context

Slide 61

Slide 61 text

BioVis tools need to be scalable! Why? Data and datasets are only ever going to increase in size. 47 Render on the GPU and Embrace Tiled/Aggregated/Streamed Data

Slide 62

Slide 62 text

BioVis tools need to be scalable! Why? Data and datasets are only ever going to increase in size. 48 Presumed Use Case: Hundreds of Samples Eventual Use Case: Thousands of Samples

Slide 63

Slide 63 text

BioVis tools need to be scalable! Why? Data and datasets are only ever going to increase in size. 48 Presumed Use Case: Hundreds of Samples Eventual Use Case: Thousands of Samples SVG Rendering Slows Everything Down WebGL Rendering Scales to Millions!

Slide 64

Slide 64 text

BioVis tools need to be scalable! Why? Data and datasets are only ever going to increase in size. 49 Lavikka et al., 2023. bioRxiv. | Manz et al., 2022. Nature Methods. http://viv.gehlenborglab.org GenomeSpy VIV https://genomespy.app

Slide 65

Slide 65 text

BioVis tools need to... 1. Be where the computation & data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 50

Slide 66

Slide 66 text

BioVis tools need to offer bidirectional interaction! Why? We need ML to guide the visual exploration and surface interesting patterns. 51

Slide 67

Slide 67 text

BioVis tools need to offer bidirectional interaction! Why? We need ML to guide the visual exploration and surface interesting patterns. 51 Surfacing all insights manually takes time

Slide 68

Slide 68 text

BioVis tools need to offer bidirectional interaction! Why? We need ML to guide the visual exploration and surface interesting patterns. 51 Surfacing all insights manually takes time Methodological context might not be accessible

Slide 69

Slide 69 text

BioVis tools need to offer bidirectional interaction! Why? We need ML to guide the visual exploration and surface interesting patterns. 52 Lekschas et al., 2020. Computer Graphics Forum. | Manz et al., 2024. OSF Preprint. https://peax.lekschas.de HiGlass + ML = Peax Gosling (plus AI as in GenoRec) https://gosling-lang.org

Slide 70

Slide 70 text

BioVis tools need to offer bidirectional interaction! Why? We need ML to guide the visual exploration and surface interesting patterns. 53 Lekschas et al., 2020. Computer Graphics Forum. | Manz et al., 2024. OSF Preprint. Comparative Embedding Vis + LLM github.com/OzetteTech/comparative-embedding-visualization

Slide 71

Slide 71 text

BioVis tools need to offer bidirectional interaction! Why? We need ML to guide the visual exploration and surface interesting patterns. 53 Lekschas et al., 2020. Computer Graphics Forum. | Manz et al., 2024. OSF Preprint. Comparative Embedding Vis + LLM github.com/OzetteTech/comparative-embedding-visualization Uses ipylangchat from Nezar Abdennur's lab. https://github.com/abdenlab/ipylangchat

Slide 72

Slide 72 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint. Prototype v0.0.1-alpha!

Slide 73

Slide 73 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 74

Slide 74 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 75

Slide 75 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 76

Slide 76 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 77

Slide 77 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 78

Slide 78 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 79

Slide 79 text

55 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 80

Slide 80 text

54 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 81

Slide 81 text

56 Data: Mair et al., 2022. Nature. | Tool: https://github.com/OzetteTech/comparative-embedding-visualization | Paper: Manz et al., 2024. OSF Preprint.

Slide 82

Slide 82 text

BioVis tools need to offer bidirectional interaction! Why? We need ML to guide the visual exploration and surface interesting patterns. 57 Synced (View + Question) Synced (Natural + Structured) Grounded

Slide 83

Slide 83 text

How can we accomplish all this? 58

Slide 84

Slide 84 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 59

Slide 85

Slide 85 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 60

Slide 86

Slide 86 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 60 anywidget Create interactive widgets in minutes with joy! https://anywidget.dev

Slide 87

Slide 87 text

61

Slide 88

Slide 88 text

61 How do I connect it to my data?

Slide 89

Slide 89 text

62

Slide 90

Slide 90 text

62

Slide 91

Slide 91 text

62 This is all the code we need to integrate Observable Plot into Jupyter-Like Notebooks!

Slide 92

Slide 92 text

62

Slide 93

Slide 93 text

63

Slide 94

Slide 94 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 64 Traitlets Observable state in Python https://github.com/ipython/traitlets Observable & Declarative View State Signals An approach for observable state https://github.com/tc39/proposal-signals Modular Visualization Components anywidget Create interactive widgets in minutes with joy! https://anywidget.dev

Slide 95

Slide 95 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 64 Traitlets Observable state in Python https://github.com/ipython/traitlets Observable & Declarative View State Signals An approach for observable state https://github.com/tc39/proposal-signals Modular Visualization Components anywidget Create interactive widgets in minutes with joy! https://anywidget.dev

Slide 96

Slide 96 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 64 Traitlets Observable state in Python https://github.com/ipython/traitlets RELATED TALK "Breaking the silo: composable bioinformatics through cross-disciplinary open standards" By Nezar Abdennur at BOSC, Tuesday, 9am Observable & Declarative View State Signals An approach for observable state https://github.com/tc39/proposal-signals Modular Visualization Components anywidget Create interactive widgets in minutes with joy! https://anywidget.dev

Slide 97

Slide 97 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 65 CandyGraph Like D3 but WebGL instead of SVG rendering https://github.com/wwwtyro/candygraph deck.gl WebGL-based spatial data visualization https://deck.gl Extremely scalable rendering in Python https://datashader.org

Slide 98

Slide 98 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 66 Duck DB-backed scalable data visualization framework https://idl.uw.edu/mosaic/ In-process analytics database https://duckdb.org/

Slide 99

Slide 99 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering

Slide 100

Slide 100 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering

Slide 101

Slide 101 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering

Slide 102

Slide 102 text

Useful technology / tools 1. Integration 2. Composable 3. Scalable 4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering

Slide 103

Slide 103 text

Thanks! @flekschas lekschas.de July 14, 2024 ISMB - BioVis