Upgrade to Pro — share decks privately, control downloads, hide ads and more …

A Fresh Look at Genomics Data: Grammar-Based Visualization

A Fresh Look at Genomics Data: Grammar-Based Visualization

Visualization of genomics data for exploration and communication has a long history in molecular biology. Over the years, dozens of techniques and hundreds of tools to view and explore genomics data have been developed. This rich set of tools and techniques demonstrates the importance of data visualization in genomics. However, it also poses significant challenges for data analysts, who often need to convert between different data formats and use multiple tools for their analysis tasks. To address these challenges, we designed the Gosling visualization grammar (http://gosling-lang.org) that can be used to generate virtually any previously described interactive visualization technique for genome-mapped data. I will explain how we developed Gosling and introduce the tool ecosystem that we built to support Gosling-based visualizations. Finally, I will propose opportunities for future research in genomics data visualization.

Keynote from IBMI Tuebingen Kick Off Event June 2022: https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/interfakultaere-einrichtungen/ibmi/veranstaltungen/kick-off-meeting/

Nils Gehlenborg

July 01, 2022
Tweet

More Decks by Nils Gehlenborg

Other Decks in Science

Transcript

  1. A Fresh Look at Genomics Data:
    Grammar-Based Visualization
    @ngehlenborg ∙ http://gehlenborglab.org ∙ [email protected]
    Nils Gehlenborg
    Biomedical Informatics
    Harvard Medical School

    View Slide

  2. http://gehlenborglab.org

    View Slide

  3. http://gehlenborglab.org

    View Slide

  4. http://gehlenborglab.org
    0100101010100110
    1010101010101011
    1111001000001110
    0101110100101011
    1110100011111010
    0001101010100101
    1010101000101110
    1000010101010101
    0101010111010100

    View Slide

  5. 0100101010100110
    1010101010101011
    1111001000001110
    0101110100101011
    1110100011111010
    0001101010100101
    1010101000101110
    1000010101010101
    0101010111010100

    View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. View Slide

  10. HiGlass HiPiler Cistrome Explorer Gosling Gos

    View Slide

  11. Vitessce

    View Slide

  12. Vitessce Avivator Vizarr

    View Slide

  13. Halyos Discovery Discovery Mobile Visual Consent Periphery Plots

    View Slide

  14. StratomeX OncoThreads ThreadStates UpSet Plots

    View Slide

  15. StratomeX OncoThreads ThreadStates UpSet Plots 4CE Explorer

    View Slide

  16. View Slide

  17. View Slide

  18. View Slide

  19. Tool Catalogs
    http://genocat.tools
    https://cmdcolin.github.io/awesome-genome-visualization
    475 Tools 100 Tools
    Awesome Genome Visualization GenoCAT

    View Slide

  20. Why are there so
    many genomics
    visualization tools?

    View Slide

  21. Challenges in Genomics Data Visualization
    - Everything is connected
    - Need to integrate different data types: sequence, expression levels,
    metabolites, phenotype information, etc.
    - Need to load many different types of data into a single software
    - Large space with sparse distribution of patterns across multiple scales
    - Many types of patterns along the genome: SNPs, epigenomic peaks, genomic
    rearrangements, etc.
    Nusrat, Harbig & Gehlenborg, 2019

    View Slide

  22. Can we design a tool to
    build effective genomics
    visualizations efficiently?

    View Slide

  23. Yes we can!

    View Slide

  24. Yes we can!
    Map the design space of
    genomics visualizations
    Define a language to describe
    interactive genomics visualizations
    Implement a framework for
    visualization of real world data
    1
    2
    3

    View Slide

  25. 1
    Map the design space of
    genomics visualizations

    View Slide

  26. Taxonomies for Genomics Data Visualization
    - Only considered data that is visualized in the sequence context, i.e., genomic location is
    represented in the visualization
    - Treat genome/sequence as a coordinate system
    - Address concerns separately:
    - Data Taxonomy
    What data types can be mapped to the genome?
    - Visualization Taxonomy
    How can the coordinate system be laid out? How can the mapped data be encoded?
    - Task Taxonomy
    What kind of tasks are users trying to address with genomic data?
    Nusrat, Harbig & Gehlenborg, 2019

    View Slide

  27. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  28. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  29. Coordinate System
    Mapping of genomic coordinates into the visualization space
    Track
    Dataset and mapping of features to visual properties
    View
    Genomic region and one or more tracks
    View Configurations
    Relationship of two or more views to each other
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  30. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  31. Layout
    EpiViz

    View Slide

  32. Layout
    EpiViz: Linear

    View Slide

  33. Layout
    MizBee

    View Slide

  34. Layout
    MizBee: Circular

    View Slide

  35. Layout
    Hilbert Curve

    View Slide

  36. Layout
    Hilbert Curve: Space-filling

    View Slide

  37. Layout
    HiC3D Viewer

    View Slide

  38. Layout
    HiC3D Viewer: Spatial

    View Slide

  39. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  40. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  41. Arrangement
    Cinteny

    View Slide

  42. Arrangement
    Cinteny: Parallel

    View Slide

  43. Arrangement
    Synteny Explorer

    View Slide

  44. Arrangement
    Synteny Explorer: Serial

    View Slide

  45. Arrangement
    HiGlass

    View Slide

  46. Arrangement
    HiGlass: Orthogonal

    View Slide

  47. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  48. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  49. Alignment
    EpiViz

    View Slide

  50. Alignment
    EpiViz: Parallel

    View Slide

  51. Alignment
    EpiViz

    View Slide

  52. Alignment
    EpiViz: Overlaid

    View Slide

  53. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  54. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  55. View Configuration
    Synteny Explorer: Multi-View + Single-Scale + Single-Focus

    View Slide

  56. View Configuration
    MizBee: Multi-View + Multi-Scale + Single-Focus

    View Slide

  57. View Configuration
    IGV Multi-View + Single-Scale + Multi-Focus

    View Slide

  58. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  59. Lt Locate
    Layout Partition Abstraction Arrangement
    Tracks
    Encoding
    View Configurations
    Views Scales Foci
    Feature Sets
    Type
    Sparse
    Contiguous
    Interconnection
    None
    Within
    Between
    Data Taxonomy Visualization Taxonomy
    Coordinate System
    One
    Many
    Alignment
    Task Taxonomy
    Positions
    Feature Sets
    S
    Lt
    E
    C
    Lp I
    B S
    S E
    B Browse
    C Compare
    S Summarize
    Lp Lookup I Identify
    E Explore
    Tasks
    Query
    Search
    Mapping
    Nusrat, Harbig & Gehlenborg, 2019
    Taxonomies for Genomics Data Visualization

    View Slide

  60. Yes we can!
    Map the design space of
    genomics visualizations
    Define a language to describe
    interactive genomics visualizations
    Implement a framework for
    visualization of real world data
    1
    2
    3

    View Slide

  61. Define a language to describe
    interactive genomics visualizations
    2

    View Slide

  62. Gosling
    Grammar Of Scalable, Linked, Interactive Nucleotide Graphics
    L’Yi, Wang, Lekschas & Gehlenborg, 2021

    View Slide

  63. Gosling Grammar
    Encoding

    View Slide

  64. Gosling Grammar
    Encoding

    View Slide

  65. Gosling Grammar
    Encoding

    View Slide

  66. Gosling Grammar
    Encoding

    View Slide

  67. Gosling Grammar
    Encoding

    View Slide

  68. Gosling Grammar
    Encoding
    Overlay

    View Slide

  69. Gosling Grammar
    Overlay

    View Slide

  70. Gosling
    Tracks and Views
    L’Yi, Wang, Lekschas & Gehlenborg, 2021

    View Slide

  71. Gosling
    Track Alignment: Overlay
    L’Yi, Wang, Lekschas & Gehlenborg, 2021

    View Slide

  72. Gosling Grammar
    Overlay

    View Slide

  73. Gosling
    Layout and Arrangement
    L’Yi, Wang, Lekschas & Gehlenborg, 2021

    View Slide

  74. Gosling
    Expressiveness
    L’Yi, Wang, Lekschas & Gehlenborg, 2021

    View Slide

  75. View Slide

  76. View Slide

  77. View Slide

  78. View Slide

  79. View Slide

  80. View Slide

  81. Gosling
    Semantic Zooming
    L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org
    - Visualizations often
    best represent
    information at a
    particular scale
    - Semantic zoom
    allows switching of
    visual encodings at
    different scales

    View Slide

  82. Gosling
    Semantic Zooming
    L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org

    View Slide

  83. Gosling
    Semantic Zooming
    L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org

    View Slide

  84. Interaction Techniques
    Multiple Linked Views
    L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org

    View Slide

  85. Gosling
    Responsive Visualization Design
    L’Yi and Gehlenborg, 2022 | https://osf.io/pd7vq/

    View Slide

  86. Yes we can!
    Map the design space of
    genomics visualizations
    Define a language to describe
    interactive genomics visualizations
    Implement a framework for
    visualization of real world data
    1
    2
    3

    View Slide

  87. 3
    Implement a framework for
    visualization of real world data

    View Slide

  88. Gosling.js Implementation
    Gosling JSON Specification + HiGlass Rendering and Data Access
    L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org

    View Slide

  89. Gosling.js Implementation
    Integration into Exploratory Tools
    L’Yi, Wang, Lekschas & Gehlenborg, 2021 | https://gosling.js.org

    View Slide

  90. Gosling.js Implementation
    Integration into Exploratory Tools: Structural Variant Exploration
    L’Yi and Gehlenborg, Work in Progress

    View Slide

  91. Can we apply this to genomic
    data visualization in Python?

    View Slide

  92. Gos
    A declarative genomics visualization library for Python
    • Author Gosling visualizations in with Python scripts github.com/gosling-lang/gos

    • Concise syntax with shorthand for common Gosling patterns

    • Integrates in Jupyter Notebooks for analysis and visualization of large
    datasets
    Manz, L’Yi, and Gehlenborg, https://osf.io/yn3ce/

    View Slide

  93. Gos
    A declarative genomics visualization library for Python
    Gos
    Manz, L’Yi, and Gehlenborg, https://osf.io/yn3ce/

    View Slide

  94. Gos
    Example Composition

    View Slide

  95. Gos
    Local and in-memory datasets
    • Data sources for Gosling must be accessible via HTTP

    • This requirement allows Gosling visualizations to be easily shared, but it can
    be challenging to load local datasets

    • Gos transparently handles local and remote datasets

    • Gos supports DataFrames and other in-memory objects as data sources
    Manz, L’Yi, and Gehlenborg, https://osf.io/yn3ce/

    View Slide

  96. Where do we go from here?

    View Slide

  97. Future Directions
    Recommendation System: GenoREC
    Pandey, L’Yi, Wang, Borkin & Gehlenborg, 2022 | https://osf.io/rscb4/
    Output Compiler
    A Input
    Data
    Task
    Identify
    Contiguous
    Point
    B Knowledge-based Recommendation
    C
    D Alternate Visualizations
    GenoREC’s Recommendation
    GenoREC UI GenoREC Backend GenoREC UI
    Input Compiler GenoREC Model
    Algorithm
    rect+interval
    hue
    rect+interval
    saturation
    rect+interval
    length
    overlayed
    segregated
    parallel
    adjacent
    orthogonal
    parallel
    adjacent
    coordinated
    interaction
    focus
    +
    context
    Encoding Alignment Layout Partition Arrangement Interactivity
    rect
    length
    rect
    saturation
    line
    position
    point
    position
    rect
    hue
    ABC XYZ
    text
    linear
    circular
    space
    filling
    stacked
    contiguous
    ABCXYZ

    View Slide

  98. Future Directions
    - Recommendation systems
    - Moving away from custom tools towards a platform for visualization
    - GUI-based construction of visualizations
    - Collaborative visual exploration of genome-mapped data

    View Slide

  99. Team
    Trevor Manz
    Peter Kerpedjiev
    Nezar Abdennur
    Fritz Lekschas
    Aditeya Pandey
    Qianwen Wang
    Sehi L’Yi
    Sabrina Nusrat
    Theresa Harbig

    View Slide

  100. ?
    Team
    Trevor Manz
    Peter Kerpedjiev
    Nezar Abdennur
    Fritz Lekschas
    Aditeya Pandey
    Qianwen Wang
    Sehi L’Yi
    Sabrina Nusrat
    Theresa Harbig You?
    Postdoctoral Fellow
    Senior) UI/UX Developer
    (Senior) Software Developer
    Join us as a
    Email [email protected] if interested!

    View Slide

  101. Project Website
    http://gosling-lang.org
    Online Editor
    https://gosling.js.org
    Postdoctoral Fellow
    Senior) UI/UX Developer
    (Senior) Software Developer
    Join us as a
    Email [email protected] if interested!

    View Slide

  102. A Fresh Look at Genomics Data:
    Grammar-Based Visualization
    @ngehlenborg ∙ http://gehlenborglab.org ∙ [email protected]
    Nils Gehlenborg
    Biomedical Informatics
    Harvard Medical School

    View Slide