Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Gene search discoverability with WikiPathways and IdeogramJS

Eric Weitz
June 09, 2021

Gene search discoverability with WikiPathways and IdeogramJS

A presentation to the WikiPathways group, discussing technical details and future ideas for the gene search recommendation UI at https://eweitz.github.io/ideogram/related-genes.

Eric Weitz

June 09, 2021
Tweet

More Decks by Eric Weitz

Other Decks in Science

Transcript

  1. Overview 1. Search UI design and genomics 2. Why and

    how Ideogram.js uses WikiPathways 3. How WikiPathways could improve gene search 4. Plans for Ideogram and gene search
  2. An ideogram is a drawing of a karyotype. A karyotype

    is a photograph of an organism’s genome. You can draw genomes with Ideogram.js. https://github.com/eweitz/ideogram What is an ideogram?
  3. Biomedical applications often support searching by gene symbol, like “BRCA1”

    or “ACE2”. Gene search usually returns only exact matches. Displaying related results is common in search UIs outside genomics. Mapping data to the genome enriches user experience. Showing related genes in ideograms improves scientific exploration. Gene search
  4. Help users explore and discover genomic data by widening search

    results. Challenges: • Users must know genes a priori that are related to a gene they search • Scarce screen real estate for new UI components • Visualizing search results in genomic space is useful but hard Design goal
  5. Related genes For a given gene, it is easy to

    find two kinds of related genes: • Interacting genes: Adjacent nodes in the same biochemical pathway. • Paralogs: evolutionarily similar genes in same species. Often have comparable roles in different pathways.
  6. Why chromosomes? • Gene expression often correlates in genomic neighborhood.

    • Knowing where something is not is useful and standard in search result maps. • Cytogenetic features (e.g. centromeres, stalks) can explain null results.
  7. Where the data comes from • Genomic coordinates for genes:

    Ensembl via MyGene.info • Interacting genes: WikiPathways • Paralogs: Ensembl All are free and open source REST APIs.
  8. Design considerations Benefits of showing related genes in an ideogram:

    • Space efficient. Short, wide row of chromosomes uses empty page real estate. • Easy interaction. Plotting genes as large features makes them easy to see and click. • Domain specific. Users engage and recall more with tailored graphics than generic lists.
  9. Big, easily clickable features. Triggers new genomic search. Brief, noticeable

    legend and call to action Short and wide Show many genomic search results in the space of a few rows. Design at a glance
  10. Built with WikiPathways Ideogram.js uses WikiPathways REST API to get

    interactions for a given gene. • E.g. https://webservice.wikipathways.org/findInteractions?query=RAD51&format=json • Algorithm: ◦ Find interactions ◦ Filter by organism ◦ Omit non-genes ◦ Group pathways for each interacting gene • Explore the code: related-genes.js
  11. WikiPathways API is awesome • Familiar: Docs are in a

    conventional format -- Swagger -- and easily findable • Fast: returns interactions for a given gene in < 2 seconds • Feature-rich: easily filter by organism, immediately adjacent nodes
  12. Non-human pathways are few and rough • # pathways in

    mouse is 21% that of human ....in Arabidopsis is 7% ... • Only 35 organisms • Often mapped from human, then not updated ◦ Yields less useful pathways due to disconnected interactions, etc.
  13. Interacting genes are uncommon in non-human searches Non-human Human %

    gene searches 68% 32% With 0 pathway genes 82% 61% But such searches are 2x more frequent. Better orthology support would fix this.
  14. Improve orthologous pathways with links Link orthologous pathways both ways:

    • Target -> source (mouse -> human) is common but not 100%. • Source -> targets (human -> mouse, chicken, etc.) would be useful.
  15. Propagate changes from source to targets 1. Bot watches all

    recent pathway changes 2. User edits a source pathway 3. Bot reads list of target pathways from links in source pathway 4. Bot updates each target pathway, using fresh map of orthologous genes
  16. Plans for Ideogram.js and gene search • Fix edge cases

    for gene labels • Overlay heatmap of gene density on chromosomes • Responsive ideogram dimensions • Search all genes in pathway
  17. Thank you! Eric Weitz ([email protected], https://github.com/eweitz) More about Ideogram.js: •

    Source code: https://github.com/eweitz/ideogram • Gene search recommendation UI: https://eweitz.github.io/ideogram/related-genes • 20+ examples: https://eweitz.github.io/ideogram