Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Insight's in the Details: Challenges and Op...

The Insight's in the Details: Challenges and Opportunities for BioVis Software Tools

Slides from my invited keynote presentation at ISMB BioVis 2024 (http://biovis.net/2024/program_ismb/).

Biological data visualizations often deal with datasets resulting from complex analytical workflows or large-scale experiments. These aspects complicate visual exploration, as insights are frequently found in the details. To surface these insights effectively, BioVis software tools must address several key challenges: integrating closely with computation and data, being composable, scaling to the task, and offering bi-directional interactions for AI/ML guidance.

Fortunately, software best-practices and new frameworks make it easier than ever to overcome these challenges. In this talk, Fritz Lekschas will discuss these practices and frameworks using examples from his research in genomics and single-cell biology and the broader BioVis community.

Fritz Lekschas

July 14, 2024
Tweet

More Decks by Fritz Lekschas

Other Decks in Research

Transcript

  1. July 14, 2024 The Insight's in the Details. Fritz Lekschas

    @flekschas lekschas.de 1 ISMB - BioVis
  2. Challenges and Opportunities for BioVis Software Tools 2 Fritz Lekschas

    @flekschas lekschas.de July 14, 2024 ISMB - BioVis
  3. ! MASSIVE SHOUT OUTS! Trevor Manz for feedback and thoughts

    on the talk and developing anywidget Ashley Wilson for providing examples–––––––– and talk feedback–––––––– Nezar Abdennur for talk feedback and ipylangchat Arpan Neupane for helping me better–––––––– understand immunology–––––––– 3
  4. 4 EDUCATION PhD '21 in CS from Harvard University MSc

    '16 in Bioinformatics from Freie Universität Berlin RESEARCH Visualization Human-Centered ML Design WORK Head of Visualization Research at Ozette
  5. 6

  6. 6 Data from Mair et al., 2022. Nature. From To

    General Cell Types High-Resolution Cell Phenotypes Well-Resolved High-Resolution Cell Phenotypes Cytotoxic T Cells T Helper Cells B Cells Naïve T Cells
  7. 7 Data from Mair et al., 2022. Nature. Cytotoxic T

    Cells T Helper Cells B Cells Naïve T Cells From To How are cell types characterized? How are cells clustered into phenotypes? How are cell phenotypes visually resolved?
  8. 8

  9. 9

  10. 10

  11. 11

  12. 12

  13. 13 PhD Bachelor Master Ontology-Guided Visual Exploration of BioMedical Data

    Repositories Visualization Tools for Epigenomic Data CellFinder Semantic Body Browser Satori HiGlass Peax ABC Enhancer-Gene
  14. 14 PhD Ozette Visualization Tools for Epigenomic Data HiGlass Peax

    ABC Enhancer-Gene Abundance Embeddings Visual Exploration of Single-Cell Data Jupyter Scatter High-Dim Data Exploration
  15. 14 PhD Bachelor Master Ozette Ontology-Guided Visual Exploration of BioMedical

    Data Repositories Visualization Tools for Epigenomic Data CellFinder Semantic Body Browser Satori HiGlass Peax ABC Enhancer-Gene Abundance Embeddings Visual Exploration of Single-Cell Data Jupyter Scatter High-Dim Data Exploration Find a matching dataset in large repository Correlate motif patterns in genome wide context Identify & understand rare cell phenotypes
  16. The insights are often in the details. Those details can

    come in the form of analytical complexity or subsets of a large dataset. 15
  17. GOALS 1. Challenges for Biological Data Visualization 2. Role of

    Visualization in Biology 3. What does that mean for BioVis Software Tools 4. Exciting tools to address the challenges 18
  18. Ben Shneiderman, 2019 “The purpose of visualization is insight, not

    pictures” 20 “The goal of visualization in computing is to gain insight by using our visual machinery.” McCormick et al., 1987 “The primary objective in data visualization is to gain insight into an information space by mapping data onto graphical primitives.” Senay and Ignatius, 1990 “[Visualization facilitates] the use of computer- supported, interactive, visual representations of abstract data to amplify cognition.” Card et al, 1999
  19. Ben Shneiderman, 2019 “The purpose of visualization is insight, not

    pictures” 21 “The goal of visualization in computing is to gain insight by using our visual machinery.” McCormick et al., 1987 “The primary objective in data visualization is to gain insight into an information space by mapping data onto graphical primitives.” Senay and Ignatius, 1990 “[Visualization facilitates] the use of computer- supported, interactive, visual representations of abstract data to amplify cognition.” Card et al, 1999
  20. Definitions of Insights • A deep understanding of something •

    Understanding the true nature of a thing • An underlying truth • Information that is obvious but unknown • Unknown but useful information (that enables decision making) 22
  21. Min Chen, Luciano Floridi, and Rita Bordgo (2013) “Saving time

    in accomplishing a user’s task is the most fundamental objective [of visualization]” 23
  22. Min Chen, Luciano Floridi, and Rita Bordgo (2013) “Saving time

    in accomplishing a user’s task is the most fundamental objective [of visualization]” 24
  23. 28

  24. 29

  25. 30

  26. 31

  27. 32

  28. 33

  29. Purpose of Visualization for Biology 1. Help surface and understand

    unknown but useful information 2. Ensure efficient human-in-the-loop pipelines 3. Apply broadly and frequently 35
  30. Purpose of Visualization for Biology 1. Help surface and understand

    unknown but useful information 2. Ensure efficient human-in-the-loop analysis pipelines 3. Apply broadly and frequently Challenges in BioVis 1. Analytical complexity complicates understanding of visual patterns 2. Visual patterns of subsets can be hard to perceive in broader context 36
  31. BioVis tools need to... 1. Be where the computation &

    data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 38
  32. BioVis tools need to... 1. Be where the computation &

    data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 39
  33. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40
  34. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40
  35. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " "
  36. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Slows analysis down Removes you from the analysis context
  37. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Slows analysis down Removes you from the analysis context
  38. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Slows analysis down Removes you from the analysis context
  39. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 40 " " " Apply Visualization Directly Stay in the context
  40. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 41 Lex et al., 2014. IEEE Transactions on Visualization and Computer Graphics. | Conway et al., 2017. Bioinformatics. https://upset.app/implementations/ Original UpSet UpSetR
  41. BioVis tools need to be where the computation & data

    are! Why? Because the purpose of visualization is to help user surface & understand useful information fast. 42 https://github.com/higlass/higlass-python Original HiGlass HiGlass Python Kerpedjiev et al., 2018. Genome Biology. https://higlass.io
  42. BioVis tools need to... 1. Be where the computation &

    data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 43
  43. BioVis tools need to be composable! Why? Because we often

    need multiple visualization tools to explain complex patterns. 44
  44. BioVis tools need to be composable! Why? Because we often

    need multiple visualization tools to explain complex patterns. 44 Hard to correlate visual patterns Need to move back and forth
  45. BioVis tools need to be composable! Why? Because we often

    need multiple visualization tools to explain complex patterns. 44 Hard to correlate visual patterns Need to move back and forth
  46. BioVis tools need to be composable! Why? Because we often

    need multiple visualization tools to explain complex patterns. 44 Compose and Interlink Visualizations
  47. BioVis tools need to be composable! Why? Because we often

    need multiple visualization tools to explain complex patterns. 45 Keller et al., 2021 OSF Preprint. | Kerpedjiev et al., 2018. Genome Biology | Lekschas et al., 2024. arXiv. http://vitessce.io Vitessce HiGlass + Jupyter Scatter https://github.com/flekschas/jupyter-scatter-tutorial
  48. BioVis tools need to be composable! Why? Because we often

    need multiple visualization tools to explain complex patterns. 45 Keller et al., 2021 OSF Preprint. | Kerpedjiev et al., 2018. Genome Biology | Lekschas et al., 2024. arXiv. http://vitessce.io Vitessce HiGlass + Jupyter Scatter https://github.com/flekschas/jupyter-scatter-tutorial Also a great example of being where the computation and data is.
  49. BioVis tools need to... 1. Be where the computation &

    data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 46
  50. BioVis tools need to be scalable! Why? Data and datasets

    are only ever going to increase in size. 47 Presumed Context Presumed Context
  51. BioVis tools need to be scalable! Why? Data and datasets

    are only ever going to increase in size. 47 Might prevent future tool usage Presumed Context Presumed Context
  52. BioVis tools need to be scalable! Why? Data and datasets

    are only ever going to increase in size. 47 Render on the GPU and Embrace Tiled/Aggregated/Streamed Data
  53. BioVis tools need to be scalable! Why? Data and datasets

    are only ever going to increase in size. 48 Presumed Use Case: Hundreds of Samples Eventual Use Case: Thousands of Samples
  54. BioVis tools need to be scalable! Why? Data and datasets

    are only ever going to increase in size. 48 Presumed Use Case: Hundreds of Samples Eventual Use Case: Thousands of Samples SVG Rendering Slows Everything Down WebGL Rendering Scales to Millions!
  55. BioVis tools need to be scalable! Why? Data and datasets

    are only ever going to increase in size. 49 Lavikka et al., 2023. bioRxiv. | Manz et al., 2022. Nature Methods. http://viv.gehlenborglab.org GenomeSpy VIV https://genomespy.app
  56. BioVis tools need to... 1. Be where the computation &

    data are 2. Be composable 3. Be scalable 4. Offer bidirectional interaction 50
  57. BioVis tools need to offer bidirectional interaction! Why? We need

    ML to guide the visual exploration and surface interesting patterns. 51
  58. BioVis tools need to offer bidirectional interaction! Why? We need

    ML to guide the visual exploration and surface interesting patterns. 51 Surfacing all insights manually takes time
  59. BioVis tools need to offer bidirectional interaction! Why? We need

    ML to guide the visual exploration and surface interesting patterns. 51 Surfacing all insights manually takes time Methodological context might not be accessible
  60. BioVis tools need to offer bidirectional interaction! Why? We need

    ML to guide the visual exploration and surface interesting patterns. 52 Lekschas et al., 2020. Computer Graphics Forum. | Manz et al., 2024. OSF Preprint. https://peax.lekschas.de HiGlass + ML = Peax Gosling (plus AI as in GenoRec) https://gosling-lang.org
  61. BioVis tools need to offer bidirectional interaction! Why? We need

    ML to guide the visual exploration and surface interesting patterns. 53 Lekschas et al., 2020. Computer Graphics Forum. | Manz et al., 2024. OSF Preprint. Comparative Embedding Vis + LLM github.com/OzetteTech/comparative-embedding-visualization
  62. BioVis tools need to offer bidirectional interaction! Why? We need

    ML to guide the visual exploration and surface interesting patterns. 53 Lekschas et al., 2020. Computer Graphics Forum. | Manz et al., 2024. OSF Preprint. Comparative Embedding Vis + LLM github.com/OzetteTech/comparative-embedding-visualization Uses ipylangchat from Nezar Abdennur's lab. https://github.com/abdenlab/ipylangchat
  63. BioVis tools need to offer bidirectional interaction! Why? We need

    ML to guide the visual exploration and surface interesting patterns. 57 Synced (View + Question) Synced (Natural + Structured) Grounded
  64. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 60 anywidget Create interactive widgets in minutes with joy! https://anywidget.dev
  65. 61

  66. 62

  67. 62

  68. 62 This is all the code we need to integrate

    Observable Plot into Jupyter-Like Notebooks!
  69. 62

  70. 63

  71. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 64 Traitlets Observable state in Python https://github.com/ipython/traitlets Observable & Declarative View State Signals An approach for observable state https://github.com/tc39/proposal-signals Modular Visualization Components anywidget Create interactive widgets in minutes with joy! https://anywidget.dev
  72. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 64 Traitlets Observable state in Python https://github.com/ipython/traitlets Observable & Declarative View State Signals An approach for observable state https://github.com/tc39/proposal-signals Modular Visualization Components anywidget Create interactive widgets in minutes with joy! https://anywidget.dev
  73. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 64 Traitlets Observable state in Python https://github.com/ipython/traitlets RELATED TALK "Breaking the silo: composable bioinformatics through cross-disciplinary open standards" By Nezar Abdennur at BOSC, Tuesday, 9am Observable & Declarative View State Signals An approach for observable state https://github.com/tc39/proposal-signals Modular Visualization Components anywidget Create interactive widgets in minutes with joy! https://anywidget.dev
  74. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 65 CandyGraph Like D3 but WebGL instead of SVG rendering https://github.com/wwwtyro/candygraph deck.gl WebGL-based spatial data visualization https://deck.gl Extremely scalable rendering in Python https://datashader.org
  75. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 66 Duck DB-backed scalable data visualization framework https://idl.uw.edu/mosaic/ In-process analytics database https://duckdb.org/
  76. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering
  77. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering
  78. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering
  79. Useful technology / tools 1. Integration 2. Composable 3. Scalable

    4. Bidirectional interaction 67 Declarative State-Based Rendering Imperative Rendering