Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Fresh Look at Genomics Data: Grammar-Based Visualization

A Fresh Look at Genomics Data: Grammar-Based Visualization

Visualization of genomics data for exploration and communication has a long history in molecular biology. Over the years, dozens of techniques and hundreds of tools to view and explore genomics data have been developed. This rich set of tools and techniques demonstrates the importance of data visualization in genomics. However, it also poses significant challenges for data analysts, who often need to convert between different data formats and use multiple tools for their analysis tasks. To address these challenges, we designed the Gosling visualization grammar (http://gosling-lang.org) that can be used to generate virtually any previously described interactive visualization technique for genome-mapped data. I will explain how we developed Gosling and introduce the tool ecosystem that we built to support Gosling-based visualizations. Finally, I will propose opportunities for future research in genomics data visualization.

Keynote from IBMI Tuebingen Kick Off Event June 2022: https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/interfakultaere-einrichtungen/ibmi/veranstaltungen/kick-off-meeting/

Nils Gehlenborg

July 01, 2022
Tweet

More Decks by Nils Gehlenborg

Other Decks in Science

Transcript

  1. A Fresh Look at Genomics Data: Grammar-Based Visualization @ngehlenborg ∙

    http://gehlenborglab.org ∙ nils@hms.harvard.edu Nils Gehlenborg Biomedical Informatics Harvard Medical School
  2. http://gehlenborglab.org

  3. http://gehlenborglab.org

  4. http://gehlenborglab.org 0100101010100110 1010101010101011 1111001000001110 0101110100101011 1110100011111010 0001101010100101 1010101000101110 1000010101010101 0101010111010100

  5. 0100101010100110 1010101010101011 1111001000001110 0101110100101011 1110100011111010 0001101010100101 1010101000101110 1000010101010101 0101010111010100

  6. None
  7. None
  8. None
  9. None
  10. HiGlass HiPiler Cistrome Explorer Gosling Gos

  11. Vitessce

  12. Vitessce Avivator Vizarr

  13. Halyos Discovery Discovery Mobile Visual Consent Periphery Plots

  14. StratomeX OncoThreads ThreadStates UpSet Plots

  15. StratomeX OncoThreads ThreadStates UpSet Plots 4CE Explorer

  16. None
  17. None
  18. None
  19. Tool Catalogs http://genocat.tools https://cmdcolin.github.io/awesome-genome-visualization 475 Tools 100 Tools Awesome Genome

    Visualization GenoCAT
  20. Why are there so many genomics visualization tools?

  21. Challenges in Genomics Data Visualization - Everything is connected -

    Need to integrate different data types: sequence, expression levels, metabolites, phenotype information, etc. - Need to load many different types of data into a single software - Large space with sparse distribution of patterns across multiple scales - Many types of patterns along the genome: SNPs, epigenomic peaks, genomic rearrangements, etc. Nusrat, Harbig & Gehlenborg, 2019
  22. Can we design a tool to build effective genomics visualizations

    efficiently?
  23. Yes we can!

  24. Yes we can! Map the design space of genomics visualizations

    Define a language to describe interactive genomics visualizations Implement a framework for visualization of real world data 1 2 3
  25. 1 Map the design space of genomics visualizations

  26. Taxonomies for Genomics Data Visualization - Only considered data that

    is visualized in the sequence context, i.e., genomic location is represented in the visualization - Treat genome/sequence as a coordinate system - Address concerns separately: - Data Taxonomy What data types can be mapped to the genome? - Visualization Taxonomy How can the coordinate system be laid out? How can the mapped data be encoded? - Task Taxonomy What kind of tasks are users trying to address with genomic data? Nusrat, Harbig & Gehlenborg, 2019
  27. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  28. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  29. Coordinate System Mapping of genomic coordinates into the visualization space

    Track Dataset and mapping of features to visual properties View Genomic region and one or more tracks View Configurations Relationship of two or more views to each other Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  30. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  31. Layout EpiViz

  32. Layout EpiViz: Linear

  33. Layout MizBee

  34. Layout MizBee: Circular

  35. Layout Hilbert Curve

  36. Layout Hilbert Curve: Space-filling

  37. Layout HiC3D Viewer

  38. Layout HiC3D Viewer: Spatial

  39. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  40. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  41. Arrangement Cinteny

  42. Arrangement Cinteny: Parallel

  43. Arrangement Synteny Explorer

  44. Arrangement Synteny Explorer: Serial

  45. Arrangement HiGlass

  46. Arrangement HiGlass: Orthogonal

  47. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  48. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  49. Alignment EpiViz

  50. Alignment EpiViz: Parallel

  51. Alignment EpiViz

  52. Alignment EpiViz: Overlaid

  53. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  54. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  55. View Configuration Synteny Explorer: Multi-View + Single-Scale + Single-Focus

  56. View Configuration MizBee: Multi-View + Multi-Scale + Single-Focus

  57. View Configuration IGV Multi-View + Single-Scale + Multi-Focus

  58. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  59. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  60. Yes we can! Map the design space of genomics visualizations

    Define a language to describe interactive genomics visualizations Implement a framework for visualization of real world data 1 2 3
  61. Define a language to describe interactive genomics visualizations 2

  62. Gosling Grammar Of Scalable, Linked, Interactive Nucleotide Graphics L’Yi, Wang,

    Lekschas & Gehlenborg, 2021
  63. Gosling Grammar Encoding

  64. Gosling Grammar Encoding

  65. Gosling Grammar Encoding

  66. Gosling Grammar Encoding

  67. Gosling Grammar Encoding

  68. Gosling Grammar Encoding Overlay

  69. Gosling Grammar Overlay

  70. Gosling Tracks and Views L’Yi, Wang, Lekschas & Gehlenborg, 2021

  71. Gosling Track Alignment: Overlay L’Yi, Wang, Lekschas & Gehlenborg, 2021

  72. Gosling Grammar Overlay

  73. Gosling Layout and Arrangement L’Yi, Wang, Lekschas & Gehlenborg, 2021

  74. Gosling Expressiveness L’Yi, Wang, Lekschas & Gehlenborg, 2021

  75. None
  76. None
  77. None
  78. None
  79. None
  80. None
  81. Gosling Semantic Zooming L’Yi, Wang, Lekschas & Gehlenborg, 2021 |

    https://gosling.js.org - Visualizations often best represent information at a particular scale - Semantic zoom allows switching of visual encodings at different scales
  82. Gosling Semantic Zooming L’Yi, Wang, Lekschas & Gehlenborg, 2021 |

    https://gosling.js.org
  83. Gosling Semantic Zooming L’Yi, Wang, Lekschas & Gehlenborg, 2021 |

    https://gosling.js.org
  84. Interaction Techniques Multiple Linked Views L’Yi, Wang, Lekschas & Gehlenborg,

    2021 | https://gosling.js.org
  85. Gosling Responsive Visualization Design L’Yi and Gehlenborg, 2022 | https://osf.io/pd7vq/

  86. Yes we can! Map the design space of genomics visualizations

    Define a language to describe interactive genomics visualizations Implement a framework for visualization of real world data 1 2 3
  87. 3 Implement a framework for visualization of real world data

  88. Gosling.js Implementation Gosling JSON Specification + HiGlass Rendering and Data

    Access L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org
  89. Gosling.js Implementation Integration into Exploratory Tools L’Yi, Wang, Lekschas &

    Gehlenborg, 2021 | https://gosling.js.org
  90. Gosling.js Implementation Integration into Exploratory Tools: Structural Variant Exploration L’Yi

    and Gehlenborg, Work in Progress
  91. Can we apply this to genomic data visualization in Python?

  92. Gos A declarative genomics visualization library for Python • Author

    Gosling visualizations in with Python scripts github.com/gosling-lang/gos • Concise syntax with shorthand for common Gosling patterns • Integrates in Jupyter Notebooks for analysis and visualization of large datasets Manz, L’Yi, and Gehlenborg, https://osf.io/yn3ce/
  93. Gos A declarative genomics visualization library for Python Gos Manz,

    L’Yi, and Gehlenborg, https://osf.io/yn3ce/
  94. Gos Example Composition

  95. Gos Local and in-memory datasets • Data sources for Gosling

    must be accessible via HTTP • This requirement allows Gosling visualizations to be easily shared, but it can be challenging to load local datasets • Gos transparently handles local and remote datasets • Gos supports DataFrames and other in-memory objects as data sources Manz, L’Yi, and Gehlenborg, https://osf.io/yn3ce/
  96. Where do we go from here?

  97. Future Directions Recommendation System: GenoREC Pandey, L’Yi, Wang, Borkin &

    Gehlenborg, 2022 | https://osf.io/rscb4/ Output Compiler A Input Data Task Identify Contiguous Point B Knowledge-based Recommendation C D Alternate Visualizations GenoREC’s Recommendation GenoREC UI GenoREC Backend GenoREC UI Input Compiler GenoREC Model Algorithm rect+interval hue rect+interval saturation rect+interval length overlayed segregated parallel adjacent orthogonal parallel adjacent coordinated interaction focus + context Encoding Alignment Layout Partition Arrangement Interactivity rect length rect saturation line position point position rect hue ABC XYZ text linear circular space filling stacked contiguous ABCXYZ
  98. Future Directions - Recommendation systems - Moving away from custom

    tools towards a platform for visualization - GUI-based construction of visualizations - Collaborative visual exploration of genome-mapped data
  99. Team Trevor Manz Peter Kerpedjiev Nezar Abdennur Fritz Lekschas Aditeya

    Pandey Qianwen Wang Sehi L’Yi Sabrina Nusrat Theresa Harbig
  100. ? Team Trevor Manz Peter Kerpedjiev Nezar Abdennur Fritz Lekschas

    Aditeya Pandey Qianwen Wang Sehi L’Yi Sabrina Nusrat Theresa Harbig You? Postdoctoral Fellow Senior) UI/UX Developer (Senior) Software Developer Join us as a Email nils@hms.harvard.edu if interested!
  101. Project Website http://gosling-lang.org Online Editor https://gosling.js.org Postdoctoral Fellow Senior) UI/UX

    Developer (Senior) Software Developer Join us as a Email nils@hms.harvard.edu if interested!
  102. A Fresh Look at Genomics Data: Grammar-Based Visualization @ngehlenborg ∙

    http://gehlenborglab.org ∙ nils@hms.harvard.edu Nils Gehlenborg Biomedical Informatics Harvard Medical School