Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥

A Fresh Look at Genomics Data: Grammar-Based Vi...

A Fresh Look at Genomics Data: Grammar-Based Visualization

Visualization of genomics data for exploration and communication has a long history in molecular biology. Over the years, dozens of techniques and hundreds of tools to view and explore genomics data have been developed. This rich set of tools and techniques demonstrates the importance of data visualization in genomics. However, it also poses significant challenges for data analysts, who often need to convert between different data formats and use multiple tools for their analysis tasks. To address these challenges, we designed the Gosling visualization grammar (http://gosling-lang.org) that can be used to generate virtually any previously described interactive visualization technique for genome-mapped data. I will explain how we developed Gosling and introduce the tool ecosystem that we built to support Gosling-based visualizations. Finally, I will propose opportunities for future research in genomics data visualization.

Keynote from IBMI Tuebingen Kick Off Event June 2022: https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/interfakultaere-einrichtungen/ibmi/veranstaltungen/kick-off-meeting/

Nils Gehlenborg

July 01, 2022
Tweet

More Decks by Nils Gehlenborg

Other Decks in Science

Transcript

  1. A Fresh Look at Genomics Data: Grammar-Based Visualization @ngehlenborg ∙

    http://gehlenborglab.org ∙ [email protected] Nils Gehlenborg Biomedical Informatics Harvard Medical School
  2. Challenges in Genomics Data Visualization - Everything is connected -

    Need to integrate different data types: sequence, expression levels, metabolites, phenotype information, etc. - Need to load many different types of data into a single software - Large space with sparse distribution of patterns across multiple scales - Many types of patterns along the genome: SNPs, epigenomic peaks, genomic rearrangements, etc. Nusrat, Harbig & Gehlenborg, 2019
  3. Yes we can! Map the design space of genomics visualizations

    Define a language to describe interactive genomics visualizations Implement a framework for visualization of real world data 1 2 3
  4. Taxonomies for Genomics Data Visualization - Only considered data that

    is visualized in the sequence context, i.e., genomic location is represented in the visualization - Treat genome/sequence as a coordinate system - Address concerns separately: - Data Taxonomy What data types can be mapped to the genome? - Visualization Taxonomy How can the coordinate system be laid out? How can the mapped data be encoded? - Task Taxonomy What kind of tasks are users trying to address with genomic data? Nusrat, Harbig & Gehlenborg, 2019
  5. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  6. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  7. Coordinate System Mapping of genomic coordinates into the visualization space

    Track Dataset and mapping of features to visual properties View Genomic region and one or more tracks View Configurations Relationship of two or more views to each other Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  8. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  9. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  10. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  11. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  12. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  13. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  14. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  15. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  16. Lt Locate Layout Partition Abstraction Arrangement Tracks Encoding View Configurations

    Views Scales Foci Feature Sets Type Sparse Contiguous Interconnection None Within Between Data Taxonomy Visualization Taxonomy Coordinate System One Many Alignment Task Taxonomy Positions Feature Sets S Lt E C Lp I B S S E B Browse C Compare S Summarize Lp Lookup I Identify E Explore Tasks Query Search Mapping Nusrat, Harbig & Gehlenborg, 2019 Taxonomies for Genomics Data Visualization
  17. Yes we can! Map the design space of genomics visualizations

    Define a language to describe interactive genomics visualizations Implement a framework for visualization of real world data 1 2 3
  18. Gosling Semantic Zooming L’Yi, Wang, Lekschas & Gehlenborg, 2021 |

    https://gosling.js.org - Visualizations often best represent information at a particular scale - Semantic zoom allows switching of visual encodings at different scales
  19. Yes we can! Map the design space of genomics visualizations

    Define a language to describe interactive genomics visualizations Implement a framework for visualization of real world data 1 2 3
  20. Gosling.js Implementation Gosling JSON Specification + HiGlass Rendering and Data

    Access L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org
  21. Gos A declarative genomics visualization library for Python • Author

    Gosling visualizations in with Python scripts github.com/gosling-lang/gos • Concise syntax with shorthand for common Gosling patterns • Integrates in Jupyter Notebooks for analysis and visualization of large datasets Manz, L’Yi, and Gehlenborg, https://osf.io/yn3ce/
  22. Gos A declarative genomics visualization library for Python Gos Manz,

    L’Yi, and Gehlenborg, https://osf.io/yn3ce/
  23. Gos Local and in-memory datasets • Data sources for Gosling

    must be accessible via HTTP • This requirement allows Gosling visualizations to be easily shared, but it can be challenging to load local datasets • Gos transparently handles local and remote datasets • Gos supports DataFrames and other in-memory objects as data sources Manz, L’Yi, and Gehlenborg, https://osf.io/yn3ce/
  24. Future Directions Recommendation System: GenoREC Pandey, L’Yi, Wang, Borkin &

    Gehlenborg, 2022 | https://osf.io/rscb4/ Output Compiler A Input Data Task Identify Contiguous Point B Knowledge-based Recommendation C D Alternate Visualizations GenoREC’s Recommendation GenoREC UI GenoREC Backend GenoREC UI Input Compiler GenoREC Model Algorithm rect+interval hue rect+interval saturation rect+interval length overlayed segregated parallel adjacent orthogonal parallel adjacent coordinated interaction focus + context Encoding Alignment Layout Partition Arrangement Interactivity rect length rect saturation line position point position rect hue ABC XYZ text linear circular space filling stacked contiguous ABCXYZ
  25. Future Directions - Recommendation systems - Moving away from custom

    tools towards a platform for visualization - GUI-based construction of visualizations - Collaborative visual exploration of genome-mapped data
  26. Team Trevor Manz Peter Kerpedjiev Nezar Abdennur Fritz Lekschas Aditeya

    Pandey Qianwen Wang Sehi L’Yi Sabrina Nusrat Theresa Harbig
  27. ? Team Trevor Manz Peter Kerpedjiev Nezar Abdennur Fritz Lekschas

    Aditeya Pandey Qianwen Wang Sehi L’Yi Sabrina Nusrat Theresa Harbig You? Postdoctoral Fellow Senior) UI/UX Developer (Senior) Software Developer Join us as a Email [email protected] if interested!
  28. A Fresh Look at Genomics Data: Grammar-Based Visualization @ngehlenborg ∙

    http://gehlenborglab.org ∙ [email protected] Nils Gehlenborg Biomedical Informatics Harvard Medical School